Post Reply 
firefox, xhtml & gzip
Dec. 11, 2004, 11:22 PM
Post: #1
 
Hi all

I run firefox 1.0 and use the following header filter to enable web filtering of xhtml with proxo:

Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: 4. Filter XHTML (in)"
Match = "(application/xhtml\+xml*)\0"
Replace = "\0$FILTER(true)"

Normally this seems to work well. For example, http://www.w3.org/MarkUp/ filters without any problems.

However, I ran across a couple of sites where proxo/firefox had trouble.

http://sidesh0w.com/

http://jessey.net/

http://golem.ph.utexas.edu/~distler/blog/

Going to these sites, firefox only displayed the following message:

Code:
This XML file does not appear to have any style information associated with it. The document tree is shown below.

and the displayed document tree was empty.

After debugging a while, I noticed that these sites served the pages with gzip content encoding. I found that if I enabled this filter:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-encoding: 2. None (out)"
Match = "*"

then filtering worked ok.

So it would seem that proxo is having an issue using zlib with xhtml & gzip. Watching the log window, filtering just seems to "stop" after the first 2 or 3 matches. I'm using the latest version of zlib but I also tried the zlib that came with proxo and it did the same thing.

In the mean time, after reading this:

http://www.mozilla.org/docs/web-develope...tml#accept

I think I'll modify my xhtml header filter like so as a temporary solution:

Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: 4. Filter XHTML (in)"
Match = "(application/xhtml\+xml*)\0"
Replace = "text/html"

If somebody can cofirm this problem or has a better solution, I'd like to hear about it.

Mike
Add Thank You Quote this message in a reply
Dec. 12, 2004, 02:27 AM
Post: #2
 
Hi Mike,

Right, that's a problem. Prox doesn't inflate documents with content-types that are "filter-forced".

As for application/xhtml\+xml you have two choices:
Either change the Content-Type to text/html (what i do).
Or re-request the document if gzipped and block the Accept-Encoding header on the second request by writing the URL to a temporary list (i do that for filter-forced text/plain).

The second choice would go like this (change text/plain, of course):

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-Encoding: 1 Block URLs in Mem-Encode     4.05.14 [mona] (d.0) (Out)"
URL = "$LST(Mem-Encode)$SET(volat=$GET(volat)encoded:1.)"

In = FALSE
Out = TRUE
Key = "Accept-Encoding: 2 gzip     4.11.22 [srl] (d.1) (Out)"
URL = "^$TST(volat=*.encoded:1.*)|$TST(keyword=*.a_web.*)"
Replace = "gzip, x-gzip, deflate"

In = TRUE
Out = FALSE
Key = "Content-Encoding: 2 text/plain to Mem-Encode     4.10.30 [mona] (d.0) (In)"
URL = "^$RESP(3)"
Match = "(?*)\1&$IHDR(Content-Type: text/plain)($TST(volat=*.encoded:1.*)$SET(1=killed: 2nd request!\k)|$ADDLST(Mem-Encode,$WESC(\h\p\q)(^?))$JUMP(\u))"
Replace = "\1"

sidki
Add Thank You Quote this message in a reply
Dec. 12, 2004, 07:02 AM
Post: #3
 
Quote:However, I ran across a couple of sites where proxo/firefox had trouble.

http://sidesh0w.com/

http://jessey.net/

http://golem.ph.utexas.edu/~distler/blog/

Going to these sites, firefox only displayed the following message:

CODE

This XML file does not appear to have any style information associated with it. The document tree is shown below.


and the displayed document tree was empty.

After debugging a while, I noticed that these sites served the pages with gzip content encoding. I found that if I enabled this filter:

CODE

[HTTP headers]
In = FALSE


I think I'll modify my xhtml header filter like so as a temporary solution:

If somebody can cofirm this problem or has a better solution, I'd like to hear about it.
I use Firefox with Prox, and I had no problem with those websites you mention. My filter for XHTML is a bit different from yours, however. I was more concerned with filtering the file with a certain extension than with the content-type. Here is the filter I have:
Code:
In = TRUE
Out = FALSE
Key = "Content-Type: Filter .js/.vbs/.xml/.xsl/.xhtml (in)"
URL = "*.(php|js|jse|vbs|vbe|x(htm|m|s)l)"
Match = "\0"
Replace = "\0$FILTER(True)"
Incidentally, I use the "Accept-encoding: Allow webpage encoding (out)" filter.
Add Thank You Quote this message in a reply
Dec. 12, 2004, 09:11 PM
Post: #4
 
Hi sidki

Well, that explains it. I haven't run into it before since thats the only filter I had that forced filtering, and that content-type isn't very common.

In light of this, I've decided to add some filtering to handle this. The filters you posted were very interesting. It took me a while to figure out how mona's content-encoding filter worked. Smile! Also, using a in-memory list is something that never occured to me before, hmm....

After a little bit of hacking, heres what I ended up with:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-Encoding: 1. Block URLs in Mem-Encode 4.05.14 [mona] (d.0) (Out)"
URL = "$LST(Mem-Encode)$SET(volat=$GET(volat)encoded:1.)"

[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-Encoding: 2. gzip 4.11.22 [srl] (d.1) r1 (Out)"
URL = "^$TST(volat=*encoded:1.(^?))"
Replace = "gzip, x-gzip, deflate"

[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: 4. Filter XHTML (in)"
Match = "(application/xhtml\+xml*)\0"
Replace = "\0$FILTER(true)$SET(bForced=1)"

[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: 8. Check if filtering forced [mona] r1 (In)"
URL = "^$RESP(3)"
Match = "(?*)\1&($IHDR(Content-Encoding: *)$TST(bForced=1))($TST(volat=*.encoded:1.*)$SET(1=killed: 2nd request!\k)|$ADDLST(Mem-Encode,$WESC(\h\p\q)(^?))$JUMP(\u))"
Replace = "\1"

Filter Changes:
1. modified the url match on Accept-Encoding 2. to match the first time volat is set & removed keyword check.
2. added $set to content-type 4. to indicate page filtering is being forced.
3. changed mona's content-encoding filter to use the content-type header & bForced variable. This way, I could check to see if filtering was forced after the other content-type header filters have had a chance to run.

Things seem to be working well with the links I posted. Now I need to find a site that won't disable the content-encoding on the 2nd request to see if the url is killed. If you know of any, let me know so I can test it.

I still might change the xhtml to text/plain and just force filtering when mathml is being used (by url). I haven't decided on this yet.

By the way, is there a filter naming convention to follow when you've modified someone elses filter?

Thanks for the insight

Mike
Add Thank You Quote this message in a reply
Dec. 12, 2004, 09:18 PM
Post: #5
 
Hi Siamesecat

I suspect that the reason you had no trouble with those sites is because filtering wasn't being forced.

I suppose you could verify that by checking the log window and seeing if you got any web filters to match.

Also, if your filters changed the "content-type" to "text/html" there wouldn't be a problem since that doesn't have to be forced to filter.

Thanks for checking

Mike
Add Thank You Quote this message in a reply
Dec. 13, 2004, 03:09 AM
Post: #6
 
z12 Wrote:Also, using a in-memory list is something that never occured to me before, hmm....
If i may suggest something, join prox-list. Some of the veterans are still alive. Also, the solution to above problem has been developed there.

Quote:Things seem to be working well with the links I posted.  Now I need to find a site that won't disable the content-encoding on the 2nd request to see if the url is killed.  If you know of any, let me know so I can test it.
I got into a redirect loop once because of that - before adding this check, but it was for gzipped text/plain and i don't remember where. No Expression

Quote:By the way, is there a filter naming convention to follow when you've modified someone elses filter?
I don't think so. There was something like that when altosax and Jor where still posting filters. We placed the original author first and added others if they did major modifications, like "[srl z12]". I still handle it that way.

Quote:Thanks for the insight
My pleasure. I'm actually quite happy that there is someone that runs into the same trouble than i do. In case of the caching header filters you've been there first and i learned. Smile!

sidki
Add Thank You Quote this message in a reply
Dec. 13, 2004, 06:15 AM
Post: #7
 
Quote:checking the log window and seeing if you got any web filters to match.

Also, if your filters changed the "content-type" to "text/html" there wouldn't be a problem since that doesn't have to be forced to filter.
I looked at the log window. Lots of filters are being used at those sites. If you look again at my filter, you will see that it does not do anything to the content-type except force filtering. The only other filter that would affect the content-type at those sites is one that compensates for missing content-type field by creating one with type = text/html. I doubt that all three sites' headers would be missing the content-type field. Where is the xhtml exactly? Except for stylesheets and javascript files, I saw type text/html on the root pages.
Add Thank You Quote this message in a reply
Dec. 13, 2004, 02:35 PM
Post: #8
 
Hi Siamesecat

Well, we'll have to get to the bottom of this! Smile!

Siamesecat Wrote:Where is the xhtml exactly?

Ok, heres some log window snippets from each site:

Code:
+++GET 114+++
GET / HTTP/1.1
Host: sidesh0w.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip, x-gzip, deflate
Connection: close
JumpTo: http://sidesh0w.com/

+++RESP 114+++
HTTP/1.1 200 OK
Date: Mon, 13 Dec 2004 13:07:33 GMT
Server: Apache
X-Powered-By: PHP/4.3.8, The blood, sweat and tears of the fine, fine TextDrive staff
X-Pingback: http://sidesh0w.com/xmlrpc.php
Content-Encoding: gzip
Served-By: TextDrive
Connection: close
Transfer-Encoding: chunked
Content-Type: application/xhtml+xml; charset=utf-8
+++CLOSE 114+++

Code:
+++GET 136+++
GET / HTTP/1.1
Host: jessey.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip, x-gzip, deflate
Connection: close
JumpTo: http://jessey.net/

+++RESP 136+++
HTTP/1.1 200 OK
Date: Mon, 13 Dec 2004 13:14:57 GMT
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c
X-Powered-By: PHP/4.3.5
Connection: close
Content-Type: application/xhtml+xml;charset=utf-8
Content-Encoding: gzip
Content-Length: 8364
+++CLOSE 136+++

Code:
+++GET 149+++
GET /~distler/blog/ HTTP/1.1
Host: golem.ph.utexas.edu
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip, x-gzip, deflate
Connection: close
JumpTo: http://golem.ph.utexas.edu/~distler/blog/

+++RESP 149+++
HTTP/1.1 200 OK
Date: Mon, 13 Dec 2004 13:16:29 GMT
Server: Apache/2.0.52 (Unix) DAV/2 mod_ssl/2.0.52 OpenSSL/0.9.7d PHP/5.0.0
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Length: 48574
Connection: close
Content-Type: application/xhtml+xml
+++CLOSE 149+++

Siamesecat Wrote:If you look again at my filter, you will see that it does not do anything to the content-type except force filtering.

I meant one of your other content-type filters. Smile! I assumed you had more than one. Personally, I now have 9 of them (4 of them are disabled though).

After looking at these log snippets, it occured to me that there are a couple reasons why your not getting "Content-Type: application/xhtml+xml".

1. The "User-Agent" header. Some sites will serve different content based on this.
2. The "Accept" header. Again, some sites will serve different content based on this.

Perhaps you could include a log window snippet showing your request & response.

Once the content-type issue is sorted out, the next thing is to make sure your sending an "Accept-Encoding" header. By default firefox sends "gzip, deflate" which will cause these sites to send the page gzip encoded.

Now by default, proxo will not filter "application/xhtml+xml", so it is at this point that filtering needs to be forced.

Mike
Add Thank You Quote this message in a reply
Dec. 13, 2004, 02:37 PM
Post: #9
 
Hi sidki

For reasons I've long ago forgotten, I have typically avoided yahoo (except news). Smile!

I have seen you recomend prox-list before. Since you recomend it & I can't think of any reason not to, I now see no reason that I shouldn't join. However, as a precaution, I think I'll first setup another sneakemail account, just in case. Smile!

Also, with the upcoming server change here, I suppose thats another reason to check it out.

See you there

Mike
Add Thank You Quote this message in a reply
Dec. 13, 2004, 06:50 PM
Post: #10
 
sidki;
Quote:If i may suggest something, join [Yahoo's] prox-list.
Actually, the real question is: why aren't those people over here?????

<span style='color:AD0008'>[Disclaimer]</span>
Most of you know that I try to make music, at least for fun, certainly not for profit. I bought a gizmo for my guitar, so I decided to join a Yahoo list to see if I could learn anything beyond what I could figure out for myself. That list membership quailifies me to make the following statements.
<span style='color:AD0008'>[/disclaimer]</span>

It takes a certain mindset to disentangle the various topics out of the chaotic mish-mash that Yahoo imposes on list contributors. This very website right here (and absolutely all other BBS/Forum sites) does the exact same thing as Yahoo's list, meaning that contributors are empowered to communicate with great zeal. The difference, and for me, the killing point, is that Yahoo does not allow topics to be categorized.

Some folks just don't want to be bothered by deeply technical esoterica, others can't find what they want quickly, still others (like myself) don't want to spend on-line time sifting out all the BS chatter of a list. For us, the alternative is much more palatable - the BBS. Plus, BBS's are run by real down-to-earth people, not faceless profit centers. If I have a problem or a question, I know right where to go for an answer. Try getting a response like that out of Yahoo. Sad

However, that shouldn't stop somebody (hint, hint) from going over there and evangelizing the UOPF. Eventually, they'll get the message, and bring the expertise home where it belongs. Big Teeth And with Drow's new server due to take over in early January, reliability issues should pretty drop down to the 'net's average, if not disappear altogether. I know for a fact that Yahoo doesn't have a 100% uptime record either. Not only do I have that list I mentioned above, but my regular email provider is Yahoo's web-based service (as seen by Proxo, of course!).

Them's my thoughts for the morning. How's your day? Wink Smile!


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Dec. 13, 2004, 08:56 PM
Post: #11
 
Hi Oddysey

Oddysey Wrote:Actually, the real question is: why aren't those people over here?????

Thats a good question.

sidki Wrote:I'm actually quite happy that there is someone that runs into the same trouble than i do.

I feel the same way that sidki does, and is why I enjoy the forums. However, I noticed that after Scott declared he was no longer going to support proxo, there seemed to be a large drop in activity in the forums I lurked in. I suppose that Scott was the point about which everything proxo related was centered on.

Having said that, after using proxo, I can't imagine not using it, no matter if it had "official" (scott) support or not. I know that I'm not the only one who feels the same way, this forum is proof of that. However, I think a lot of people have been "holding back", thinking that they might be wasting their time on something that was just going to fade away.

This is why I'm very enthusiastic about the proximodo project. I'm thinking that once the project shows that it will work as good as proxo, things will pick up again in the forums. I think Kye-U made a good decision when he decided to have a forum for proximodo. Indeed, most of the activity here recently has been in that forum from what I can tell. Hopefully, with proximodo being open source, that in the long run, it won't be so dependent as proxomitron was, on just one person.

Meanwhile, I'm still working on my filters & learning. Smile!

Oddysey Wrote:How's your day?

sigh... I have 4 cars and they're all a pain in the [email protected]#.

Mike
Add Thank You Quote this message in a reply
Dec. 14, 2004, 12:42 AM
Post: #12
 
Mike Wrote:
Oddysey Wrote:How's your day?

sigh... I have 4 POS cars and they're all a pain in the wallet.

f1x0r3d! Whistling

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Dec. 14, 2004, 06:19 AM
Post: #13
 
z12,

Here are some headers from my setup. I never thought about the sites giving a different page depending on user-agent. I just gave them my usual chopped (fake) Mozilla UA. As for the Accept: field, I never realized that a site might give it a lot of attention. So what did I miss by their giving me html instead?

Code:
GET http://sidesh0w.com/ HTTP/1.1
Host: sidesh0w.com
User-Agent: Mozilla/5.0 (Windows; U; Win32)
Accept: text/xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
Accept-Language: en
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Referer: http://sidesh0w.com/
Connection: keep-alive

+++RESP 590+++
HTTP/1.1 200 OK
Date: Tue, 14 Dec 2004 00:06:18 GMT
Server: Apache
X-Powered-By: PHP/4.3.8, The blood, sweat and tears of the fine, fine TextDrive staff
X-Pingback: http://sidesh0w.com/xmlrpc.php
Content-Encoding: gzip
Served-By: TextDrive
Content-Type: text/html; charset=utf-8
X-Cache: MISS from localhost
Transfer-Encoding: chunked


GET / HTTP/1.1
Host: jessey.net
User-Agent: Mozilla/5.0 (Windows; U; Win32)
Accept: text/xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
Accept-Language: en
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Referer: http://jessey.net/
Connection: keep-alive

+++RESP 600+++
HTTP/1.1 200 OK
Date: Tue, 14 Dec 2004 06:06:59 GMT
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c
X-Powered-By: PHP/4.3.5
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html;charset=utf-8
Content-Encoding: gzip
Content-Length: 8489


GET /~distler/blog/ HTTP/1.1
Host: golem.ph.utexas.edu
User-Agent: Mozilla/5.0 (Windows; U; Win32)
Accept: text/xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
Accept-Language: en
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Referer: http://golem.ph.utexas.edu/
Connection: keep-alive

+++RESP 604+++
HTTP/1.1 200 OK
Date: Tue, 14 Dec 2004 06:08:01 GMT
Server: Apache/2.0.52 (Unix) DAV/2 mod_ssl/2.0.52 OpenSSL/0.9.7d PHP/5.0.0
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Add Thank You Quote this message in a reply
Dec. 14, 2004, 10:23 AM
Post: #14
 
Hi Siamesecat

In your case, I think your getting text/html because the accept header is missing application/xhtml+xml.

Siamesecat Wrote:So what did I miss by their giving me html instead?

Basically nothing except for some grief. You can read more about it here:

http://www.mozilla.org/docs/web-develope...tml#accept

In essence, unless the page is using MathML, your better off with text/html.

http://golem.ph.utexas.edu/~distler/blog/ does use MathML, but it's the only one I've found that does. I don't have all the fonts required for MathML, but I could see that the formulas he was showing didn't display correctly with text/html.

In fact, it took me quite a while to find any sites that send a content-type of application/xhtml+xml.

What was news to me was this:

sidki Wrote:Right, that's a problem. Prox doesn't inflate documents with content-types that are "filter-forced".

I just happened to stumble upon this problem because with application/xhtml+xml, the effect was quite obvious for me. But anytime you force filtering, $FILTER(true), then you have the potential of running into this problem.

Mike
Add Thank You Quote this message in a reply
Dec. 15, 2004, 04:43 PM
Post: #15
 
Oddysey:
Oddysey Wrote:Actually, the real question is: why aren't those people over here?????
I think there will always be people - especially those that grew up with newsgroups - that prefer mailing lists.
Then those that won't miss the comfort of bulletin boards.
Then those that don't mind using both - think multifaceted. Wink

Another thing to keep in mind: Prox-list - hosted by EGroups back then - is where the Proxomitron community started.
Forums may rise, move (leaving all old posts behind!), fall, reopen -- We had that...


Mike:
That MathML page is interesting. Changing the Content-Type to text/html indeed sort of breaks it.

Thing is that i have a bunch of filters that target both HTML code and scripts.
I resorted to omit quotes in replacements, like "<a id=foo>[/url]" because i couldn't find a way to reliably tell if ",',\", or \' would be the appropriate quote. But quotes are required for application/xhtml+xml. No Expression

So, regarding filter-forcing only MathML (and changing standard xhtml to text/html in my case):
z12 Wrote:I still might change the xhtml to text/plain and just force filtering when mathml is being used (by url). I haven't decided on this yet.
Do you mean an include list or something else? Because i thought of re-requesting the document if the doctype points to MathML. Something similar works pretty well for the HTML sniffer i use. Basically it looks for HTML code within the first few bytes of filter-forced $TYPE(oth) docs and - if found - prepends a:
Code:
<script type="text/javascript">document.location="\u?prx-sniff:html&prx-ref:\1";</script>\r\n\k
Two header filters are then inserting the proper Content-Type and Referer.

Although, in the long run i would probably feel more comfortable finding a solution to that quote problem and then filter-force application/xhtml+xml instead of changing it to text/html. I do see a massive increase of XHTML 1.1 doctypes since firefox became popular, so we may see xhtml/xml content-types more often in the near future.


sidki
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: