It's time to revise the whole
section 8.4 of the Privoxy user manual, which talks about filtering through the use of metacharacters. Offending domain names often contain recurring terms. Why not compressing your list using metacharacters then. Your blacklist probably contains a ton of the word “analyticâ€, for example. What you could do then is suppressing all entries containing the word “analyticâ€, then adding only…
to your blacklist.
Attached to this post, an archive to help you eliminate a lot of the redundant entries in your blacklist. Inside the archive, the files “meta_col.txt†and “meta_col.rgx†contain the same entries; the second one is only an extended REGEX version of the first one. You're going to use “meta_col.rgx†to compress your blacklist. Once you're done with this task, you simply add the content of “meta_col.txt†file to your blacklist.
Before, we start the process of compressing your blacklist, I suggest you revise the content of “meta_col.txt†in order to add or suppress entries. Then, you replicate your changes to “meta_col.rgx†file.
If your blacklist is really big, then I suggest you install the tool
Parallel to speed things up. Also, why not installing
pv as well—a progress monitor.
The following command will compress your blacklist at maxed out CPU usage:
Code:
cp your_blacklist your_blacklist.bak
cat your_blacklist | \
parallel -j +0 --pipe grep -vf meta_col.rgx | pv -bt >> your_blacklist_compressed
(successfully tested with GNU
grep version 2.27)
If you prefer the “classic way†instead of the above variation…
Code:
grep -vf meta_col.rgx your_blacklist > your_blacklist_compressed
Then, you can append the content of “meta_col.txt†to your new compressed blacklist.
Code:
cat meta_col.txt >> your_blacklist_compressed
Job done!
Note: if you have a huge blacklist (hundreds of thousands entries) the “classic way†to compact it can be a very slow process as you're not using all of your CPU cores.
-–—
Some other generic terms you could use as well in your blacklist:
Code:
.*casino*.
.*doctor*.
.*generic*.
.*luxury*.
.*pharmac*.
.*poker*.
.*replica*.
.*viagra*.
Of course, you'd have first eliminated the numerous entries containing those words beforehand.
-–—
Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM