Another Filtering Proxy
|
Nov. 22, 2014, 09:35 AM
Post: #1
|
|||
|
|||
Another Filtering Proxy
I have been working on "Another Filtering Proxy" for several days, as I need one on a 24h running linux gateway and I don't like Privoxy's filtering syntax.
This might not be the right place to post but it is based on ProxHTTPSProxy. It even hasn't a formal name yet. Any suggestions? I want it to filter URL, http headers and webpages. So far only basic URL filtering is achieved: - URL blocking, via Block.txt - URL redirecting, via Redirect.txt - filtering bypass, via Bypass.txt, for now it only sets a flag to a request Those files are like Proxomitron blockfiles, but supports only regex syntax for now. Above functions are achieved in URLFilter.py, where you can add your own functions via writing python classes. For example, you can write a class to parse the ADB blocking rules to block URL. Run Proxy.py to start the program. Happy filtering! |
|||
Nov. 29, 2014, 11:24 AM
(This post was last modified: Dec. 28, 2014 10:48 AM by whenever.)
Post: #2
|
|||
|
|||
RE: Another Filtering Proxy
Here comes the version 0.2.
+ Now it gets a name, AFProxy + Privoxy style URL patterns for block, bypass and filters URL matching + Basic header filtering: HeaderFilter.py + Basic web page filtering: PageFilter.py + Config auto reload You need a little Python knowledge to write your own fitlers, but python code is very good at being readable. Header filter: Code: class PrintReferer(HeaderFilter): Web page filter: Code: class NoIframeJS(PageFilter): regex_subs is for regex find replace. string_subs is for string find replace. Python version attached. EXE version download link: http://proxfilter.net/afproxy/AFProxy%200.2.zip |
|||
Nov. 29, 2014, 02:15 PM
Post: #3
|
|||
|
|||
RE: Another Filtering Proxy
Thank you, I will try it and will report bugs if I find some.
Off-Topic: I, myself think Privoxy syntax is really good, it is fast to write filter, maintain or share. |
|||
Nov. 30, 2014, 12:35 PM
Post: #4
|
|||
|
|||
RE: Another Filtering Proxy
Privoxy is pretty good. I just don't like that you have to put the filters in a filter file, then apply them in another action file. I like them to be together.
|
|||
Dec. 03, 2014, 09:07 PM
Post: #5
|
|||
|
|||
RE: Another Filtering Proxy
After some looks, yeah, pretty good but I'm unable to run the python version, just exe version.
I personally like Python. Did i need to install other component for Python, I only have all component that ProxHTTPSProxy need, but here is what I get when I tried to run AFProxy.py: Code: Traceback (most recent call last): |
|||
Dec. 04, 2014, 01:09 AM
Post: #6
|
|||
|
|||
RE: Another Filtering Proxy
Please use the provided config file.
|
|||
Dec. 04, 2014, 02:33 AM
Post: #7
|
|||
|
|||
RE: Another Filtering Proxy
Okay, I found my problem, problem is arround Working Dir, because my bat file to run python + AFProxy is not at the same folder, so working dir is bat file's folder and the result AFProxy miss config.ini file, I just force it use AFProxy folder as Working Dir and okay.
|
|||
Dec. 04, 2014, 03:27 PM
Post: #8
|
|||
|
|||
RE: Another Filtering Proxy
Hi whenever, do you think that we can block image based on width and height ? Is there a way to know image w/h without fullly downloaded or maybe we have to download whole image then use window API to detect and then block ?
I know that we can write filter to match width height from HTML Source code, but this way is really limited, there is a lot image that is ad and did'nt have width and height attribute in HTML code, so we cannot have universal solution. And plus, we also have a list of image resolution that always use for ads. |
|||
Dec. 05, 2014, 08:36 AM
(This post was last modified: Dec. 05, 2014 08:42 AM by whenever.)
Post: #9
|
|||
|
|||
RE: Another Filtering Proxy
(Dec. 04, 2014 03:27 PM)GunGunGun Wrote: Is there a way to know image w/h without fullly downloaded? I think it doesn't exist a way like that. That's why major Ad blocking software like ABP or Privoxy have to maintain a blacklist for blocking. BTW, Version 0.3 + URLFilter.py now supports multiple list files for each filter + Parse Privoxy actions files (default.action, user.action) for URL blocking * List files moved to <Lists> directory |
|||
Dec. 05, 2014, 09:05 AM
(This post was last modified: Dec. 05, 2014 02:56 PM by GunGunGun.)
Post: #10
|
|||
|
|||
RE: Another Filtering Proxy
(Dec. 05, 2014 08:36 AM)whenever Wrote: I think it doesn't exist a way like that. That's why major Ad blocking software like ABP or Privoxy have to maintain a blacklist for blocking. And ESC to stop download, View image info and image size is 0x0, so i think yes, no way. Edit: Oh no, seem my above test is wrong, I tried to download that image with my Browser, and Pause download and after I Open the image with window image viewer, It show the image with correct width and height but it still not fully downloaded yet. http://i.imgur.com/0TONiBe.png 10109x4542 but I only download 154KB from that image: http://i.imgur.com/aPGfz41.png Maybe it is possible ? If we can then I think we can use argothirm something like this: Download a part of an image first, and then try to use OS API to detect image width/height, if match then block, if not match then download full image and send to browser. Update: GDI Library can get image size, I tested and worked, great! Here is my test, please try it So much hype at this time! This may open a new future for AdBlock software because we can block ad much more effective! https://app.box.com/s/gtat69ntsdqksrcoy9sg I included source code inside that archive. I think I will request Privoxy's author add this feature, I think it is possible. And I think clearly Python do get image size too, because there is lots of way to call OS API: http://www.google.com/search?q=get+image...e=0&nord=1 Update2: Holy cow, it is so simple and POSSIBLE for SURE, we can did that from a long time ago: http://php.net/manual/en/function.getima....php#88793 Quote:As noted below, getimagesize will download the entire image before it checks for the requested information. This is extremely slow on large images that are accessed remotely. Since the width/height is in the first few bytes of the file, there is no need to download the entire file. I wrote a function to get the size of a JPEG by streaming bytes until the proper data is found to report the width and height:And demo using CURL PHP: http://stackoverflow.com/a/7476094/3763937 Quote:I managed to answer my own question and I've included the PHP code snippet. Seem very exciting. What we will achieve if we can block image based on width height ? - Reliable banner blocking method. - Kill all webbugs that is 0x0 or 1x1 or 2x2, web developer use webbugs to track us, and almost no way to block them all. UPDATE3: Works on PNG too, I will report gif later, test: http://groups.csail.mit.edu/graphics/cla...MapBig.png Update4: WORKS ON GIF TOO! , test: http://www.physics.usyd.edu.au/~gekko/wr...8_2620.gif What I've done, Google "big image" "big image "png"" "big image "gif"". Quote: + Parse Privoxy actions files (default.action, user.action) for URL blockingThank you very much, very nice! |
|||
Dec. 08, 2014, 03:30 AM
Post: #11
|
|||
|
|||
RE: Another Filtering Proxy
I think this can be done in a form of a URL or header filter, but I'm sorry for now I don't have time to work on it.
There are still something on my to do list to improve the framework of AFProxy, which I think is more important for the limited spare time of mine. |
|||
The following 1 user says Thank You to whenever for this post: GunGunGun |
Dec. 08, 2014, 09:09 AM
Post: #12
|
|||
|
|||
RE: Another Filtering Proxy
I found a software named WebCleaner, write using Python too, hope you can analyze some useful feature from it and hope this save time for you: http://webcleaner.sourceforge.net/
|
|||
Dec. 08, 2014, 12:11 PM
Post: #13
|
|||
|
|||
RE: Another Filtering Proxy
I know it. In fact, I had looked at all the python filtering proxies I could find before reinventing my own wheel.
I know AFProxy is not good at many aspects and I'm pretty sure that it is beyond my ability to make AFProxy a full functional program that meet everybody's requirements. It's more like a personal toy. :-) |
|||
Dec. 28, 2014, 10:50 AM
Post: #14
|
|||
|
|||
RE: Another Filtering Proxy
Version 0.4 (20141221)
-------------- * List files not bundled and inited in URLFilter.py any more, now globle available to other filters + Unfiltered content is streamed to client, while not cached before sending * Fix config auto reload * Fix Privoxy parse (replace '.*' in the host regex with '[^/]*' so it won't match the path string) Python version attached. EXE version download link: http://proxfilter.net/afproxy/AFProxy%200.4.zip |
|||
May. 26, 2015, 06:22 AM
(This post was last modified: May. 26, 2015 12:46 PM by cattleyavns.)
Post: #15
|
|||
|
|||
RE: Another Filtering Proxy
Hi!
Can you release a new version for urllib3 1.10.4 ? I got a bug when trying to load a page using proxy: My Python version: 3.4.2 My urllib3 version: 1.10.4 Log: http://pastebin.com/g2mK1CW4 Site: http://www.ghacks.net Problem: maybe this: https://github.com/shazow/urllib3/pull/544, expect https://github.com/ml31415/urllib3/commi...2b19edad4e Config: new line: Code: [PROXY http://127.0.0.1:7777] The problem can be solved by replacing headers with self.headers ( that mean url's HTTPDict to BaseHTTPServer/http.server headers ? ) Code: r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=headers, retries=1, redirect=False, preload_content=False, decode_content=False) Code: r = self.pool.urlopen(self.command, self.url, body=self.postdata, headers=self.headers, retries=1, redirect=False, preload_content=False, decode_content=False) I can load that page through proxy, but I cannot load all page that don't use any proxy server, even worse. Code: [13:15:24] [P] "GET http://www.ghacks.net/" 200 11765 But all other page: Code: File "D:\Downloads\Compressed\AFProxy_py 0.4\ProxyTool.py", line 128, in handl I cannot fix this problem, I already tried to fix this problem but it is not perfect, still partial fix which is not really good in long terms use, I skip using proxy through config.ini but use this way, I added these line right above r =..: Code: if "ghacks" in self.host: Work, but not perfect, you see for proxy I use http.server headers feature, but for other I use HTTPDict from urllib3, that will cause more trouble in the future, so I hope you can fix it because I'm not the author of the software so I cannot really understand the codebase, or can you point me what I should do is okay, thank! And with AFProxy as proxy server I cannot load this page: https://github.com/shazow/urllib3/pull/544 Error: Code: This page is taking way too long to load. Without AFProxy, I can. |
|||
« Next Oldest | Next Newest »
|