Post Reply 
external filters
Jan. 29, 2017, 03:21 PM
Post: #1
external filters
hey guys,

since privoxy provides the option to use external filters: can someone pleas ehelp me to understand this? i have limited resources to start privoxy. when all my files are bigger then a couple MB my privoxy just does not start.

what i want to do is to load the blacklisted domains from hp hosts database and to block those domains. since the list is about 18mb large it is not possible to load that. can i use the external filter option for it?

best,

chris
Add Thank You Quote this message in a reply
Jan. 29, 2017, 11:41 PM (This post was last modified: Dec. 08, 2017 10:24 PM by Faxopita.)
Post: #2
RE: external filters
It's time to revise the whole section 8.4 of the Privoxy user manual, which talks about filtering through the use of metacharacters. Offending domain names often contain recurring terms. Why not compressing your list using metacharacters then. Your blacklist probably contains a ton of the word “analytic”, for example. What you could do then is suppressing all entries containing the word “analytic”, then adding only…
Code:
.*analytic*.
to your blacklist.

Attached to this post, an archive to help you eliminate a lot of the redundant entries in your blacklist. Inside the archive, the files “meta_col.txt” and “meta_col.rgx” contain the same entries; the second one is only an extended REGEX version of the first one. You're going to use “meta_col.rgx” to compress your blacklist. Once you're done with this task, you simply add the content of “meta_col.txt” file to your blacklist.

Before, we start the process of compressing your blacklist, I suggest you revise the content of “meta_col.txt” in order to add or suppress entries. Then, you replicate your changes to “meta_col.rgx” file.

If your blacklist is really big, then I suggest you install the tool Parallel to speed things up. Also, why not installing pv as well—a progress monitor.

The following command will compress your blacklist at maxed out CPU usage:
Code:
cp your_blacklist your_blacklist.bak
cat your_blacklist | \
parallel -j +0 --pipe grep -vf meta_col.rgx | pv -bt >> your_blacklist_compressed
(successfully tested with GNU grep version 2.27)

If you prefer the “classic way” instead of the above variation…
Code:
grep -vf meta_col.rgx your_blacklist > your_blacklist_compressed

Then, you can append the content of “meta_col.txt” to your new compressed blacklist.
Code:
cat meta_col.txt >> your_blacklist_compressed

Job done!

Note: if you have a huge blacklist (hundreds of thousands entries) the “classic way” to compact it can be a very slow process as you're not using all of your CPU cores.

-–—

Some other generic terms you could use as well in your blacklist:
Code:
.*casino*.
.*doctor*.
.*generic*.
.*luxury*.
.*pharmac*.
.*poker*.
.*replica*.
.*viagra*.

Of course, you'd have first eliminated the numerous entries containing those words beforehand.

-–—

Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM


Attached File(s)
.zip  meta.zip (Size: 5.05 KB / Downloads: 584)
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to Faxopita for this post:
kik0s
Jan. 30, 2017, 05:13 AM
Post: #3
RE: external filters
(Jan. 29, 2017 03:21 PM)kik0s Wrote:  hey guys,

since privoxy provides the option to use external filters: can someone pleas ehelp me to understand this? i have limited resources to start privoxy. when all my files are bigger then a couple MB my privoxy just does not start.

what i want to do is to load the blacklisted domains from hp hosts database and to block those domains. since the list is about 18mb large it is not possible to load that. can i use the external filter option for it?

best,

chris

external-filter is not what you really want in your case, external-filter is a way for Privoxy to use external application to parse, edit, save content, do many things that Privoxy cannot.

A host file with 18MB file size is not really effective, I think you should use EasyList, just convert it into Privoxy's format using adblock2privoxy https://projects.zubr.me/wiki/adblock2privoxy
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to cattleyavns for this post:
kik0s
Jan. 30, 2017, 01:46 PM (This post was last modified: Jan. 30, 2017 06:55 PM by kik0s.)
Post: #4
RE: external filters
(Jan. 30, 2017 05:13 AM)cattleyavns Wrote:  
(Jan. 29, 2017 03:21 PM)kik0s Wrote:  hey guys,

since privoxy provides the option to use external filters: can someone pleas ehelp me to understand this? i have limited resources to start privoxy. when all my files are bigger then a couple MB my privoxy just does not start.

what i want to do is to load the blacklisted domains from hp hosts database and to block those domains. since the list is about 18mb large it is not possible to load that. can i use the external filter option for it?

best,

chris

external-filter is not what you really want in your case, external-filter is a way for Privoxy to use external application to parse, edit, save content, do many things that Privoxy cannot.

A host file with 18MB file size is not really effective, I think you should use EasyList, just convert it into Privoxy's format using adblock2privoxy https://projects.zubr.me/wiki/adblock2privoxy

actually thats what i was using but the problem is that that tool creates a lot more files but adding all of the action and filter files kills my privoxy. limited ram to load so no way.

@faxtopia thanks for that. will try it. maybe thats the key Smile!

edit:
is it possible to block like buttons with hosts file?
Add Thank You Quote this message in a reply
Jan. 31, 2017, 03:37 AM
Post: #5
RE: external filters
(Jan. 30, 2017 01:46 PM)kik0s Wrote:  edit:
is it possible to block like buttons with hosts file?

I'm pretty sure it is not possible with only the hosts file, "facebook.com/plugins/", hosts file can only block "facebook.com".
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to cattleyavns for this post:
kik0s
Jan. 31, 2017, 08:31 AM (This post was last modified: Feb. 13, 2017 02:23 PM by Faxopita.)
Post: #6
RE: external filters
(Jan. 31, 2017 03:37 AM)cattleyavns Wrote:  
(Jan. 30, 2017 01:46 PM)kik0s Wrote:  edit:
is it possible to block like buttons with hosts file?

I'm pretty sure it is not possible with only the hosts file, "facebook.com/plugins/", hosts file can only block "facebook.com".

Years ago, when I wanted to use Privoxy alongside the hosts file, the latter was ignored. Apparently, it's either hosts file or Privoxy. Not both at the same time.

I confirm hosts file only blocks domains. However, you can use, at the same time, Privoxy and Unbound, for example, a local DNS resolver. It can be used to block domains too. That's your definite solution if my first one above offers you limited results. Note that like hosts file Unbound does not accept regular expressions.

Under this scenario, you could use Unbound to block domains (just like hosts) and Privoxy to block requests based on the path side of the URL.

To block a path (Privoxy):
Code:
{ + block{plug-ins} }
.facebook.com/plugins/

To block a domain (Unbound):
Code:
local-zone: "touchbymediametrie.com" redirect
local-data: "touchbymediametrie.com A 127.0.0.1"

Now, you should have the best of both worlds and enjoy a renewed experience with your computer.

-–—

Converting a hosts File into Unbound local-data

To convert StevenBlack's hosts file, for example, into Unbound local-data, you could issue the following command:
Code:
wget -O - https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts | grep '^0\.0\.0\.0' | \
awk '{print "local-zone: \""$2"\" redirect\nlocal-data: \""$2" A 0.0.0.0\""}' > unbound_blacklist.conf

Taken from here. I tested it and it works. Of course, you repeat with other source lists and amend the script accordingly.

An alternative way to do the conversion is provided by the script `unbound-block-hosts`.

Beware! Unbound hates duplicates; be sure to remove any duplicates. If you need to whitelist an entry, simply remove it from `unbound_blacklist.conf` and restart Unbound.

—–-

Another great way to block ad/tracking domains: using Unbound's void zones; explained here. The main advantage being that there's no need to input every subdomain of the ad/tracking domain.
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to Faxopita for this post:
kik0s
May. 07, 2017, 04:08 AM
Post: #7
RE: external filters
Good help and suggestions, as always, found on this forum.
Instead of hp hosts file, which is bloated, or any other large host file, I use a smaller list that is basically a dns block list.
Found here:

https://github.com/AdguardTeam/AdguardDN...filter.txt

Easy to convert into a privoxy action file. (edit in any notepad or editor).

Cheers!
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to oldsod for this post:
Faxopita
May. 07, 2017, 09:49 AM
Post: #8
RE: external filters
dns blocking nowdays is a bad idea. converting the list into an action file is a good choice thats true. regarding dns there are a lot of problems because of the ad services switching to ssl and then you will have troubles with loading times. privoxy though works fine and has nor problems by blocking those hosts even when you are not using proxhttps.
Add Thank You Quote this message in a reply
May. 07, 2017, 04:08 PM
Post: #9
RE: external filters
Yes. Privoxy is fast and very smooth.

Not using proxhttps.
SSL or TLS not an issue in regards to speed or capability.
But again not using proxhttps, and proxhttps that could be factor in prioxy using large domain block list .

But would ".example.com" (as seen in a large domain list) have any influence in privoxy speed with SSl/TLS filtering by proxhttps? I do not know.
Add Thank You Quote this message in a reply
May. 07, 2017, 04:19 PM
Post: #10
RE: external filters
As to your original post, probably no or not.
The file(s) is still too large.

More resources would probably resolve the issue.
Add Thank You Quote this message in a reply
May. 07, 2017, 04:28 PM
Post: #11
RE: external filters
if you use an action file with a blacklist with hosts this one should beloaded first since privoxy works through the list ls in a sequence one by one. proxhttps will just help when the host isn't blacklisted and contains something with ads after the /. privoxy will only see the host but proxhttps will encrypt the request and then you will also be able to block some additional stuff.
Add Thank You Quote this message in a reply
Jul. 16, 2018, 10:40 PM
Post: #12
RE: external filters
(Jan. 29, 2017 11:41 PM)Faxopita Wrote:  
Code:
grep -vf meta_col.rgx your_blacklist > your_blacklist_compressed

I find your code very useful with the exception that my rules are prepared with the script that prepares them with the use of adblock2privoxy therefore they contain a bunch of element hiding "##" and whitelist "@@||" rules which to get removed with the use of this regex. One has to temporally move all those rules to other file and then append them the one prepared by your script.

My scripts for preparation and adblock2privoxy conversion are still work in progress, but definitely would welcome the addition of regex deduplication.
Add Thank You Quote this message in a reply
Sep. 12, 2018, 10:41 AM
Post: #13
RE: external filters
I investigated for some time why msn.com page layout broke for me and it turns out it was due to one of the filters supplied by the meta_col.txt It had among them .local which made privoxy catch hyperlink such as locale= which was not what was intended. I suggest that those rules after the dot should end with / as they are meant for hosts anyways. For me changing it and other similar rules to .local/ fixed the issue.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: