Post Reply 
Problem with "\h"
Jul. 29, 2004, 12:58 AM
Post: #1
 
Hi all,

Unless I'm doing something wrong, \h seems to have an issue when working with domain names that have a two letter country code fot the TLD. That is it cant tell the difference between "theage.com.au" and "news.com.au" .

In particular one of my header filter is having problems not detecting that the host has changed.

If I goto:

http://www.cnn.com/

and click on one the article links for time magazine at the bottom, either filter works.

Also at the same site, clicking on the Navigation link to sports, your redirected, but its to a .cnn.com domain so the filter allows it, (which is what I want).

Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Location: 2. Show HTML Redirects (in)"
Match = "\1$IHDR(Content-Type: text/(html|plain))$RESP(30*)&\w://(^\h)"
Replace = "http://local.ptron/redir.html?dest=\1"

[HTTP headers]
In = FALSE
Out = FALSE
Key = "Location: 3. Show HTML Redirects (in)"
Match = "$RESP(30*)$IHDR(Content-Type: text/(html|plain))(https+://)\1(^\h) \2"
Replace = "http://local.ptron/redir.html?dest=\1\2"

However, if I go here:

http://www.theage.com.au/

and click on any article there, the filter doesn't catch the redirection. Clicking on an article there redirects you to a different host to login. I have tried various versions of these filters and they all have a problem with that site.

So my conclusion is that \h doesn't work right. I hope I'm wrong and somebody can show me what I'm doing wrong.

I will attempt to post the htlm code that the filter uses in the replacement section, if you'd like to check it out. You'll have to rename to redir.html and put it in proxo's html directory to make it work.

Mike

P.S. As a side note, when I went to "theage.com.au" I got javascript cookie with the domain of ".com.au"
Firefox will happily send that cookie from "theage" to "news.com.au"
Doh
Add Thank You Quote this message in a reply
Jul. 29, 2004, 02:33 AM
Post: #2
 
You are correct, \h doesn't always do the trick...
Here is the code that I use to catch the host:
Code:
($AV($SET(host=\h)&$AV((^http://$TST(host))*)))
Perhaps a modification of that will get you to where you are going...


edit: That is the line I use to block "off-site" images...
Could you please keep us posted if a similar method works for your filters - I'd like to incorporate them into my config, but header filters are not my specialty...


edit2: I actually have that first filter of yours in my config from some time ago...
Works quite well from the best I can tell...
What's that second filter "add to the mix"?
Add Thank You Quote this message in a reply
Jul. 29, 2004, 03:32 AM
Post: #3
 
Hi ProxRocks

That 2nd filter is just a variation of the 1st. It was just something I was trying out as it specifically catches the first domain name.

I'll give your tip a try & post back,

Thanks
Mike
Add Thank You Quote this message in a reply
Jul. 29, 2004, 07:31 AM
Post: #4
 
Code:
($AV($SET(host=\h)&$AV((^http://$TST(host))*)))
If the test for \h itself is faulty with domains with country codes, how will that test be an improvement? I tried it anyway with a referer filter and it didn't work any better than my original filter. Maybe a better way would be to test the content before the first slash after the "http://" and compare it with the Host: field?
Add Thank You Quote this message in a reply
Jul. 29, 2004, 11:27 AM
Post: #5
 
That test was from Arne's forum - perhaps from sidki or JD, can't say as I remember...

Point being - it DOES work...
I shall wait for a header filter expert to pass this way to explain the details - for I did not create the thing...

I just use it because it DOES work...
Add Thank You Quote this message in a reply
Jul. 29, 2004, 01:54 PM
Post: #6
 
Hi ProxRocks

Ok, I tried this:

Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Location: 6. Show HTML Redirects (in)"
Match = "$RESP(30*)$IHDR(Content-Type: text/(html|plain))$SET(host=\h)(https+://)\1(^$TST(host)) \2"
Replace = "http://local.ptron/redir.html?dest=\1\2$GET(host)"

and it does indeed work there.

However, it's not quite what I was trying to get. Basically, if the host name is not an exact match, the filter will match. For some filters, this is exactly what I wanted (thanks for the tip). I was hoping that it wouldn't match, if say, I was going from "www.cnn.com" to "money.cnn.com".

From the way "\h" seems to be working, if the top 2 domains match, then "\h" will match. This is ok unless a country code domain is used. I don't think there is any way around this behavior for "\h".

I think I might run with this modified filter for a while and see how it goes. If there's not too many sites that I regularly visit that redirect me, I just may add them to the url match to exclude them.

Meanwhile, I still hope theres a way to make it work as intended.

Mike
Add Thank You Quote this message in a reply
Jul. 29, 2004, 02:32 PM
Post: #7
 
Hi z12,

Here is a URL splicer that handles 2-level TLDs. You just need what \2 is grabbing (or use \h instead) and the \2 test.
The domain is then stored in "dom", you can test like eg. "://([^/]++.|)$TST(dom)".

sidki
Add Thank You Quote this message in a reply
Jul. 29, 2004, 03:59 PM
Post: #8
 
Pretty slick...
I don't see that referenced in the 4/11 config...
Is that going to be included in the next release?
Add Thank You Quote this message in a reply
Jul. 29, 2004, 08:18 PM
Post: #9
 
I tried writing a referer filter to deal with the problem of \h not working on hostnames ending with country codes. It doesn't work. What am I doing wrong?
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Hide previous host v2 (out)"
Match = "((*|)(http(s|)://)\0\1/*)($OHDR(Host:\s\2))(^$TST(\1=\2))"
Replace = "\0\2"

This does not work either. Why??
Code:
In = FALSE
Out = TRUE
Key = "Referer: Hide previous host v2 (out)"
Match = "?&(($OHDR(Host:\s\1))&((*|)(http(s|)://)\0$TST(^\1*)))"
Replace = "\0\1"

This resulted in a referer whose label is there, but nothing is visible after it.
Code:
In = FALSE
Out = TRUE
Key = "Referer: Hide previous host v2 (out)"
Match = "?&(^($OHDR(Host:\s\1))&((*|)(http(s|)://)\0$TST(\1*)))"
Replace = "\0\1"

This removes the referer whether the host names are the same or not.
Code:
In = FALSE
Out = TRUE
Key = "Referer: Hide previous host v2 (out)"
Match = "?&($URL((http(s|)://)\0\1/*))&(^($OHDR(Host:\s$TST(\1))))"
How do you make a match work only if a test fails? Obviously, I haven't understood how to do that.
Add Thank You Quote this message in a reply
Jul. 30, 2004, 02:26 AM
Post: #10
 
Hi

Well, I played around with the URL-Splicer (very cool ) but I didn't have any luck getting my filter to work right. <_<

It seems that no matter how you cut it, "\h" can't be used reliably with country codes.

When I used the URL-Splicer, I couldn't really do anything with the dom (domain) as you have nothing reliable to compare it to, since "\h" won't do. Also $LST(URL-Splicer) has a tendency to do funny things to variables that follow it, so I had trouble capturing the destination.

Hopefully, I'm not doing something right when using URL-Splicer. I'm probably overlooking the obvious, but my brain needs a rest.

Mike
Add Thank You Quote this message in a reply
Jul. 30, 2004, 08:23 AM
Post: #11
 
I may have discovered a stopgap solution to the problem of matching referers of URLs with a country code. It is working so far, anyway.

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Fake if country code (out)"
Match = "(*|)(http(s|)://*.??(/|(^?)))"
Replace = "www.dkdfufj.com/blah/"

In = FALSE
Out = TRUE
Key = "Referer: Hide previous host (out)"
Match = "?&(^(*|)(http(s|)|ftp)://\h)"
Replace = "\h/"
The first filter forces a nonsense referer if the suffix on the hostname has only two characters. The second one sees a difference between this referer hostname and the current host and changes the referer to the current hostname. The only potential problem is that some servers do not take kindly to altered referers when a client is navigating within their domain.

Perhaps a similar matching technique could be used for other header filters as well. Getting something like this to work does seem to require more than one filter.
Add Thank You Quote this message in a reply
Jul. 30, 2004, 10:57 AM
Post: #12
 
ProxRocks Wrote:Is that going to be included in the next release?
Yep Smile!

z12 Wrote:It seems that no matter how you cut it, "\h" can't be used reliably with country codes.
\h has two functions: to store the hostname and to serve as a domain test.
The first function always works, eg: Match = "*" Replace = "\h" .
The second function fails on 2-level TLDs, eg: Match = "://(^\h)" .

If you store the hostname like "$SET(host=\h)", you are using the first function.
If you are storing the real domain in "dom", the intention is to replace the second \h function with eg:
Match = "://(^([^/]++.|)$TST(dom))"

So here is a test filter, reference site is http://bbc.co.uk/. Result is displayed upper left:
Code:
[Patterns]
Name = "Domain Test"
Active = TRUE
Multi = TRUE
Limit = 1
Match = "(?)\1$STOP()"
"$SET(dref=http://bbc.co.uk/)"
"$SET(dhost=\h)"
""
"$TST(dhost="
"([0-9.]+)\9(^?)"
"|"
"(|*.)"
"("
"[^.]+."
"((com|net|edu|gov|org|mil|info|??).??|[^.]+)"
")\9(^?)"
")"
"$SET(dom=\9)"
""
"($TST(dref=*://([^/]++.|)$TST(dom)*)$SET(2=onsite)|$SET(2=offsite))"
Replace = "<div>\2: \9</div\r\n\1$SET(dref=)$SET(dhost=)"


Siamesecat: For most sites outside the US the country code is the top level domain.


sidki
Add Thank You Quote this message in a reply
Jul. 30, 2004, 11:16 AM
Post: #13
 
Hi Siamesecat

Here's a thing I've noticed with header filters. If you have more than filter enabled for a particular header, which one gets called first can be unpredictable. I have seen where the displayed filters are not in the same order as they are in the config file. Proxo seems to sort by the name the filter has, so changing a name can change the calling order.

What seems to work for me is to put a number after the name of the header , like so:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: 1. Match This (out)"
Match = "this"
Replace = "foo"

In = FALSE
Out = TRUE
Key = "Referer: 2. Match This (out)"
Match = "that"
Replace = "bar"

Thought you might be interested,

Mike

P.S. It seems to me you should reverse your referer filter order.
Add Thank You Quote this message in a reply
Jul. 30, 2004, 11:39 AM
Post: #14
 
Hi sidki

Thanks, I think my problem was in my $TST expression. As usual, your filter works nice. Now if I can only fix mine. [rolleyes]

I'm gonna suck down another cup of coffee or two & give it a go.

Thanks again, It's nice too have help,
Mike
Add Thank You Quote this message in a reply
Jul. 30, 2004, 01:36 PM
Post: #15
 
Hi all,

Ok, I have been playing around with using the URL-Splicer with the referer, since they are easier to get then redirects.

The problem I was having was due to (multiple) problems on my $TST expression. Reviewing sidki's 2nd filter, I realized what I was doing wrong. Now, all seems well while using the URL-Splicer on the referer.

Since Siamesecat pointed out that the referer was also not working right, I'm going to redo some of my header filters so that I only need to make 1 call to URL-Splicer for use in the Location & referer filters ( & elsewhere if needed).

I'll post back later.

Mike
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: