Important details to bear in mind when optimising converted hosts file. Please, read on…
Consolidating hosts file for Privoxy use can be carried out via one of the two following methods:
- An accepted convention to limit all addresses to a specific subdomain level, which can be set to 1—by convention—when its corresponding subdomain sits directly next to the domain name. The lower is the number, the stricter is the blocking for the corresponding domain name, the lighter is the converted hosts file size. For example, increasing blocking power by setting subdomain level to 1: entries ztu.xyz.abc.com and wrz.ztu.xyz.abc.com will be turned into .xyz.abc.com; in such a scenario, caution is to be taken when dealing with CDN and static-based level 1 subdomains. Same thing for cloud-based addresses… Example: those entries: rack1.adserver.cdn.goodsite.com, rack2.ad.static.goodsite.com and track.ad.goodsite.s3.amazonaws.com; compressing them respectively to .cdn.goodsite.com, .static.goodsite.com and .s3.amazonaws.com is really not a good thing to do, since legitimate resources can be served as well.
So, for those who want to limit addresses to subdomain level 1—like myself for “classic†addresses—it's advised to create exceptions for those entries:
Code:
\.(cdn|static)[0-9]*\.[domain_name_pattern]\.[generic_extension_pattern]$
\.s3\.amazonaws\.com$
Those exceptions could be moved out to other files. For example, one that contains CDN and static-based addresses, another one that contains s3.amazonaws.com addresses. Those exceptions could in turn be optimised to limit address lengths to subdomain level 2 for CDN and static-based and level 3 for the Amazon's cloud. Above entries to be turned then into .adserver.cdn.goodsite.com, .ad.static.goodsite.com and .ad.goodsite.s3.amazonaws.com. Note there are other clould-based servers: limiting edgesuite.net addresses up to level 3, rackcdn.com addresses up to level 3 or 4…
A developer can make the choice to either leave those entries in one or more separate files from the main “classic†one—which limits addresses to subdomain level 1—or put all those entries back into the main one—in order to keep just one converted hosts file—after the latter has been optimised, of course… Personally, I would opt for limiting “classic†addresses to subdomain level 1 and make exceptions while optimising them too in one or several separate files. Of course, all these files would need to be included in the Privoxy config file.
- Simply comparing different entries with same domain names, but different subdomain levels. This is a lighter approach where only the shortest writing is kept when comparing same domain name entries. For example: entries ztu.xyz.abc.com and wrz.ztu.xyz.abc.com will be simply turned into .ztu.xyz.abc.com for Privoxy use; however, this choice is less restrictive since there might be some evil subdomain that cannot be taken into account. Privoxy will certainly block wrz.ztu.xyz.abc.com—the new shortened entry .ztu.xyz.abc.com will do that as well!—but won't enforce blocking, for example, on rst.xyz.abc.com address, like it would have done it in the first approach discussed above, the one providing .xyz.abc.com for Privoxy use.
To conclude, maybe it would be nice to give the user the choice: a fast and light compression using last configuration or using the first version via subdomain level limitation. If the latter case, give the user the choice to keep default settings (e.g.: level 1 for classic URL patterns) or increase by 1 or 2 all default level settings, which include cloud-based addresses as well.