Post Reply 
Filtering ajax content
Feb. 17, 2009, 11:29 AM
Post: #1
Filtering ajax content
Hello Sidki, would you consider filtering of ajax content (that is XML)?
HTML like <img src=adsimage.gif> in XMLwould usually become
&lt;img src=adsimage.gif&gt;
The current config set does not filter this kind of ads.
Add Thank You Quote this message in a reply
Feb. 17, 2009, 12:28 PM
Post: #2
RE: Filtering ajax content
you can enable by type via MIME-List.ptxt...
Add Thank You Quote this message in a reply
Feb. 17, 2009, 09:17 PM (This post was last modified: Feb. 17, 2009 09:23 PM by whileloop.)
Post: #3
RE: Filtering ajax content
Many feeds do not have extension (eg. gawker.com, feedburner.com)

If they are treated as html, nodes(tags) will be added or removed. That may break the parsing of xml. We should avoid adding or removing any node(tag) to xml. But we can modify the text content enclosed between tag pairs.
So xml should be treated separately.

Moveover server return them as "text/plain", "text/xml", "application/xml" or something else. Which increase the difficulty of indenfying them. Maybe we need a set of rules specifically for filting xml content.
Maybe I should change the title to Improve filtering of xml.
Add Thank You Quote this message in a reply
Feb. 17, 2009, 11:22 PM (This post was last modified: Feb. 18, 2009 12:02 AM by whileloop.)
Post: #4
RE: Filtering ajax content
Maybe "URL: Block Ad URLs filter" is enough.

But the URL blocker need improvement also. The URL blocker use "\k" to kill connect, which return killed.gif or killed.html depending on the file extension in URL in outgoing header. If a image src URL do not contain a file extension or a extension other than image file extension (eg. cgi), it will return killed.html. And the browser will treat the image as missing (a red cross in IE6)
I will try to create a Ads URLs filter rule that also look at the Content-Type incoming header
and use $RDIR instead of \k for images. I hope I can can do that.

Content filter for XML will be cleaner and better, as it can change object/embed into toggles and won't leave empty area for blocked images. But it more difficult.
I will try if URL filter is enough.
Sorry for disturbing.
Add Thank You Quote this message in a reply
Feb. 18, 2009, 01:19 AM
Post: #5
RE: Filtering ajax content
keep us posted, i'm curious on the results...
Add Thank You Quote this message in a reply
Feb. 18, 2009, 02:49 AM
Post: #6
RE: Filtering ajax content
Welcome Whileloop!

Some of my filters in the Base config could help you. Exactly 'Enable filtering by Content-Type'

Also this not published before filter:
Code:
[HTTP headers]
In = FALSE
Out = FALSE
Key = "URL :I_3.4.1 Block images/flash/javascripts from AD-SRC Host {ln}090207 WIP"
URL = "(^local.ptron)$TST(ContentType=image*|application/(x-|)(shockwave-flash|javascript))"
Match = "(*(.|/)($LST(AD-SRC))\9*)\1"
Replace = "$LOG(!R$DTM(c),I_3.4.1 ***BLOCK $GET(ContentType) from AD-SRC Host: \9 *** \u) $JUMP(http://local.ptron/red.gif\?\1)"

The value of the var ContentType is very similar to $IHDR(Content-Type: *). Feel free to modify them Wink


ADVICE: DON'T USE "URL:", USE "URL :" INSTEAD

Edit 090218: If you use "url:" this will cause a constant disordering of some header filters. So use the notation followed in the sidki config, or use the space at your own risk.
Add Thank You Quote this message in a reply
Feb. 18, 2009, 10:54 AM
Post: #7
RE: Filtering ajax content
Regarding the original question:
Ajax is just the name for a method to change page content in real-time. It may come as JS, JSON, HTML, or XML. The config does filter HTML and JS content, as long as the content-type is one of those filtered by default, or text/plain. It does filter JSON, if it looks filter-worthy.

It does not filter HTML islands in XML. These islands consist of HTML code, escaped in one of various ways, not just "&lt;img src=adsimage.gif&gt;" (e.g. Google uses "\x3cimg src=adsimage.gif\x3e").
Add Thank You Quote this message in a reply
Feb. 18, 2009, 10:55 AM
Post: #8
RE: Filtering ajax content
(Feb. 18, 2009 02:49 AM)lnminente Wrote:  ADVICE: DON'T USE "URL:", USE "URL :" INSTEAD

Your notation *might* confuse people in the long run. Wink
Add Thank You Quote this message in a reply
Feb. 18, 2009, 01:57 PM
Post: #9
RE: Filtering ajax content
It could be Sidki, but the only thing they would need to remember is that space, working in the same way than all the other header filters. They shouldn't have to learn to put all the code in the URL bar, or workarounds for commands not working or working different there. Or remembering \k would not work and should have to use RDIR or JUMP instead. Anyway this is a new notation i keep on testing Wink
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: