The Un-Official Proxomitron Forum

Full Version: Another Filtering Proxy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
I have been working on "Another Filtering Proxy" for several days, as I need one on a 24h running linux gateway and I don't like Privoxy's filtering syntax.

This might not be the right place to post but it is based on ProxHTTPSProxy. It even hasn't a formal name yet. Any suggestions?

I want it to filter URL, http headers and webpages.

So far only basic URL filtering is achieved:

- URL blocking, via Block.txt
- URL redirecting, via Redirect.txt
- filtering bypass, via Bypass.txt, for now it only sets a flag to a request

Those files are like Proxomitron blockfiles, but supports only regex syntax for now.

Above functions are achieved in, where you can add your own functions via writing python classes. For example, you can write a class to parse the ADB blocking rules to block URL.

Run to start the program.

Happy filtering!
Here comes the version 0.2.

+ Now it gets a name, AFProxy
+ Privoxy style URL patterns for block, bypass and filters URL matching
+ Basic header filtering:
+ Basic web page filtering:
+ Config auto reload

You need a little Python knowledge to write your own fitlers, but python code is very good at being readable.

Header filter:

class PrintReferer(HeaderFilter):
    "Print Referrer if it differs from the request domain"
    name = "Print Referrer"
    In = False
    Out = True

    def action(cls, req):
        domain = re.match(r"(?:.*?\.)?([^.]+\.[^.]+)$", req.headers['Host']).group(1)
        referer = req.headers['referer']
        if referer:
            referer_host = referer.split('//')[1].split('/', maxsplit=1)[0]
            if not referer_host.endswith(domain):
      'Referer: %s', referer)

Web page filter:

class NoIframeJS(PageFilter):
    name = "Remove All JS and Iframe"
    active = True
    urls = ('',
    regex_subs = ((br'(?si)<(script|iframe)\s+.*?</\1>', br'<!-- \1 removed by FILTER: no-js-iframe -->'),

regex_subs is for regex find replace. string_subs is for string find replace.

Python version attached. EXE version download link:
Thank you, I will try it and will report bugs if I find some.
Off-Topic: I, myself think Privoxy syntax is really good, it is fast to write filter, maintain or share.
Privoxy is pretty good. I just don't like that you have to put the filters in a filter file, then apply them in another action file. I like them to be together.
After some looks, yeah, pretty good but I'm unable to run the python version, just exe version.
I personally like Python.

Did i need to install other component for Python, I only have all component that ProxHTTPSProxy need, but here is what I get when I tried to run
Traceback (most recent call last):
  File "D:\gg\AFProxy 0.2\", line 350,
in <module>
    config = LoadConfig(CONFIG)
  File "D:\gg\AFProxy 0.2\", line 41,
in __init__
    self.PORT = int(self.config['GENERAL'].get('Port'))
TypeError: int() argument must be a string or a number, not 'NoneType'
Please use the provided config file.
Okay, I found my problem, problem is arround Working Dir, because my bat file to run python + AFProxy is not at the same folder, so working dir is bat file's folder and the result AFProxy miss config.ini file, I just force it use AFProxy folder as Working Dir and okay.
Hi whenever, do you think that we can block image based on width and height ? Is there a way to know image w/h without fullly downloaded or maybe we have to download whole image then use window API to detect and then block ?
I know that we can write filter to match width height from HTML Source code, but this way is really limited, there is a lot image that is ad and did'nt have width and height attribute in HTML code, so we cannot have universal solution.

And plus, we also have a list of image resolution that always use for ads.
(Dec. 04, 2014 03:27 PM)GunGunGun Wrote: [ -> ]Is there a way to know image w/h without fullly downloaded?

I think it doesn't exist a way like that. That's why major Ad blocking software like ABP or Privoxy have to maintain a blacklist for blocking.

BTW, Version 0.3

+ now supports multiple list files for each filter
+ Parse Privoxy actions files (default.action, user.action) for URL blocking
* List files moved to <Lists> directory
(Dec. 05, 2014 08:36 AM)whenever Wrote: [ -> ]I think it doesn't exist a way like that. That's why major Ad blocking software like ABP or Privoxy have to maintain a blacklist for blocking.

After some tests I finally figured that image cannot get blocked if it is not fully downloaded, partial download will show image as 0x0 resolution. I try to download the image with my browser:
And ESC to stop download, View image info and image size is 0x0, so i think yes, no way.

Edit: Oh no, seem my above test is wrong, I tried to download that image with my Browser, and Pause download and after I Open the image with window image viewer, It show the image with correct width and height but it still not fully downloaded yet.

10109x4542 but I only download 154KB from that image:

Maybe it is possible ? If we can then I think we can use argothirm something like this: Download a part of an image first, and then try to use OS API to detect image width/height, if match then block, if not match then download full image and send to browser.

Update: GDI Library can get image size, I tested and worked, great! Here is my test, please try it Big Teeth So much hype at this time! This may open a new future for AdBlock software because we can block ad much more effective!

I included source code inside that archive. I think I will request Privoxy's author add this feature, I think it is possible. And I think clearly Python do get image size too, because there is lots of way to call OS API:

Update2: Holy cow, it is so simple and POSSIBLE for SURE, we can did that from a long time ago:

Quote:As noted below, getimagesize will download the entire image before it checks for the requested information. This is extremely slow on large images that are accessed remotely. Since the width/height is in the first few bytes of the file, there is no need to download the entire file. I wrote a function to get the size of a JPEG by streaming bytes until the proper data is found to report the width and height:
And demo using CURL PHP:
Quote:I managed to answer my own question and I've included the PHP code snippet.

The only downside (for me at least) is that this writes the partial image download to the file-system prior to reading in the dimensions with getImageSize.

For me 10240 bytes is the safe limit to check for jpg images that were 200 to 400K in size.

Seem very exciting. What we will achieve if we can block image based on width height ?
- Reliable banner blocking method.
- Kill all webbugs that is 0x0 or 1x1 or 2x2, web developer use webbugs to track us, and almost no way to block them all.

UPDATE3: Works on PNG too, I will report gif later, test:

Update4: WORKS ON GIF TOO! Big Teeth, test:

What I've done, Google "big image" "big image "png"" "big image "gif"".

Fully download image and then detect image's width/height is not effective because it waste bandwidth but acceptable I think, because still better than display an gif banner that increase browser resource. I think OS API will allow us to detect image width and height after it get fully downloaded

Quote: + Parse Privoxy actions files (default.action, user.action) for URL blocking
Thank you very much, very nice!
I think this can be done in a form of a URL or header filter, but I'm sorry for now I don't have time to work on it.

There are still something on my to do list to improve the framework of AFProxy, which I think is more important for the limited spare time of mine.
I found a software named WebCleaner, write using Python too, hope you can analyze some useful feature from it and hope this save time for you:
I know it. In fact, I had looked at all the python filtering proxies I could find before reinventing my own wheel.

I know AFProxy is not good at many aspects and I'm pretty sure that it is beyond my ability to make AFProxy a full functional program that meet everybody's requirements. It's more like a personal toy. :-)
Version 0.4 (20141221)

* List files not bundled and inited in any more, now globle available to other filters
+ Unfiltered content is streamed to client, while not cached before sending
* Fix config auto reload
* Fix Privoxy parse (replace '.*' in the host regex with '[^/]*' so it won't match the path string)

Python version attached. EXE version download link:
Can you release a new version for urllib3 1.10.4 ? I got a bug when trying to load a page using proxy:
My Python version: 3.4.2
My urllib3 version: 1.10.4
Problem: maybe this:, expect
Config: new line:

The problem can be solved by replacing headers with self.headers ( that mean url's HTTPDict to BaseHTTPServer/http.server headers ? )

r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=headers, retries=1, redirect=False, preload_content=False, decode_content=False)

r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=self.headers, retries=1, redirect=False, preload_content=False, decode_content=False)

I can load that page through proxy, but I cannot load all page that don't use any proxy server, even worse.

[13:15:24] [P] "GET" 200 11765

But all other page:
File "D:\Downloads\Compressed\AFProxy_py 0.4\", line 128, in handl
  File "D:\Downloads\Compressed\AFProxy_py 0.4\", line 242, in do_METH
    retries=1, redirect=False, preload_content=False, decode_content=False)
----------------------------------------  File "C:\Python34\lib\site-packages\ur
llib3-1.10.4-py3.4.egg\urllib3\", line 161, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "C:\Python34\lib\http\", line 386, in handle_one_request

  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectio", line 523, in urlopen
    headers = headers.copy()
AttributeError: 'HTTPMessage' object has no attribute 'copy'
  File "D:\Downloads\Compressed\AFProxy_py 0.4\", line 242, in do_METH
    retries=1, redirect=False, preload_content=False, decode_content=False)
----------------------------------------  File "C:\Python34\lib\site-packages\ur
llib3-1.10.4-py3.4.egg\urllib3\", line 161, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)

  File "C:\Python34\lib\site-packages\urllib3-1.10.4-py3.4.egg\urllib3\connectio", line 523, in urlopen
    headers = headers.copy()
AttributeError: 'HTTPMessage' object has no attribute 'copy'

I cannot fix this problem, I already tried to fix this problem but it is not perfect, still partial fix which is not really good in long terms use, I skip using proxy through config.ini but use this way, I added these line right above r =..:

            if "ghacks" in
                self.pool = urllib3.proxy_from_url('')
                headers = self.headers

Work, but not perfect, you see for proxy I use http.server headers feature, but for other I use HTTPDict from urllib3, that will cause more trouble in the future, so I hope you can fix it because I'm not the author of the software so I cannot really understand the codebase, or can you point me what I should do is okay, thank!

And with AFProxy as proxy server I cannot load this page:
This page is taking way too long to load.

Sorry about that. Please try refreshing and contact us if the problem persists.
Contact Support — GitHub Status — @githubstatus

Without AFProxy, I can.
Pages: 1 2 3
Reference URL's