The Un-Official Proxomitron Forum

Full Version: Stop a site from catching fake referers
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Here's a challenge for you: haaretz.co.il usually only has previews of its articles. But it allows you reading full articles if you found them via an external site (e.g. Google).
Alas, it's smart enough to catch you if you only fake coming from an external site:
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Fake haaretz (Out)"
Match = "?"
Replace = "https://www.google.com"

If you try this, the article's source code contains a code that seems to load http://hrz.haaretz.co.il/bots.js, which pops up a message and forwards you away from the article. Note it's not the bots.js that's found in the source code. And it loads even if you block it.

You can test it on this article:
http://www.haaretz.co.il/opinions/.premium-1.2001807
Which works when you enter it via Google:
http://www.google.com/url?sa=t&rct=j&q=&...vEdabu1M2w
Or via Twitter:
http://t.co/dVLOjy5znp
Or via Facebook (you must be logged in):
http://www.facebook.com/l.php?u=http%3A%...anp2Pe&s=1

So how does it differ real referrers from fake ones? How can bots.js be loaded when you block it?
(Apr. 24, 2013 10:53 PM)bugmenot Wrote: [ -> ]So how does it differ real referrers from fake ones?
for example with JavaScript. but that's not the case on that site. your filter leads to the full article.

(Apr. 24, 2013 10:53 PM)bugmenot Wrote: [ -> ]How can bots.js be loaded when you block it?
if a file is loaded it is not blocked.

your Proxomitron configuration seems to have massive problems.
I managed to find the behavior.

A real referer header looks like

Referer: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDEQqQIwAA&url=http%3A%2F%2Fwww.haaretz.co.il%2Fopinions%2F.premium-1.2001807&ei=pVJ4UcLlEciu4AT1-oGIAQ&usg=AFQjCNEYmboF_ZsMMtXZWteevEdabu1M2w

Your fake referer looks like

Referer: https://www.google.com

I suspect there is a script looking for something like "url=http%3A%2F%2Fwww.haaretz.co.il%2Fopinions%2F.premium" in the referer.

However, I can see the full article after faking a simple referer. My guess is my set disables or breaks the script that evaluates the referer string.

I had no problems blocking bots.js, header or webpage filter. Perhaps, your browser had cached the file before you tried to block it.

HTH
Using the long referer made no difference.
Keeping bots.js from loading or blanking it made no difference.
Turning off Javascript completely still showed just the preview (though it disabled the forwarding).

So if turning Javascript off made no difference, how did you get the full article?
(Apr. 25, 2013 07:24 AM)bugmenot Wrote: [ -> ]how did you get the full article?

by just using your filter the site http://www.haaretz.co.il/opinions/.premium-1.2001807 contains the full article.

some possibilities:
1. your configuration is defective. fix it or use another one.
2. you didn't cleared the browser cache. alternatively press CTRL + F5 for ~3 seconds on that site to force a hard reload.
3. the file .htaccess on the server evaluates the client IP and the response depends on it. so try using a proxy with a location from a different country.
the file evaluates the header referer. otherwise your filter would not work (for us).
Trust me, it's not about cache. I completely clear it (and cookies) each time I try this.

My IP is fine as long I use a real referer. That is, if I actually come through Google, etc. then I see the full article.

I realize another filterset may fix it, but I want to know exactly which filter can fix this.
I think your filters will have to provide acceptable referer headers.

After I create a new cfg with only these two filters

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Fake haaretz (Out)"
URL = "www.haaretz.co.il/opinions/.premium-1.2001807"
Match = "?"
Replace = "http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDEQqQIwAA&url=http%3A%2F%2Fwww.haaretz.co.il%2Fopinions%2F.premium-1.2001807&ei=pVJ4UcLlEciu4AT1-oGIAQ&usg=AFQjCNEYmboF_ZsMMtXZWteevEdabu1M2w"

In = FALSE
Out = TRUE
Key = "Referer: Fake haaretz (Out)"
URL = "www.haaretz.co.il/opinions/.premium-1.2001807"
Replace = "http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDEQqQIwAA&url=http%3A%2F%2Fwww.haaretz.co.il%2Fopinions%2F.premium-1.2001807&ei=pVJ4UcLlEciu4AT1-oGIAQ&usg=AFQjCNEYmboF_ZsMMtXZWteevEdabu1M2w"

and load www.haaretz.co.il/opinions/.premium-1.2001807, I see the subscribe overlay over the full article. The bot alert is not sent. I can dismiss the overlay and see the full article. So, the site appears to be looking for incorrect referer headers. A google referer is correct for www.haaretz.co.il/opinions/.premium-1.2001807 but not pages linked from www.haaretz.co.il/opinions/.premium-1.2001807. Your filters always provided a google referer.

I haven't tested beyond the above header filters and it was a quick test.

HTH
It works! Even if I just use Replace = "https://www.google.com" without any parameters. I've also globalized the URL and combined your filters into one:
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Fake haaretz (Out)"
URL = "www.haaretz.co.il/\w/.premium-"
Match = "(?|)"
Replace = "https://www.google.com"

Do you think newer filtersets somehow make fake referers only work in the specific pages they're on? Or maybe they somehow only affect HTML pages (even though $TYPE is not supported for header filters)?
(Apr. 25, 2013 07:14 PM)bugmenot Wrote: [ -> ]Do you think newer filtersets somehow make fake referers only work in the specific pages they're on? Or maybe they somehow only affect HTML pages (even though $TYPE is not supported for header filters)?

Initially, I added
$SET(0=f_refer.)$SET(sReferF=http://news.google.com/)
to a list. This says: fake the referer everywhere with http://news.google.com/. With this entry the haaretz page opened without overlay or bot alert. Same behavior for
www.haaretz.co.il/ $SET(0=f_refer.)$SET(sReferF=http://news.google.com/)
but I have not clicked a lot of links.

The set blocks or modifies a number of scripts and overrides some javascript elements. Cookies are also blocked and modified. I'm not sure which filters are providing the page but the site does appear to use headers, scripts, and cookies to restrict access.

To see what you reported, I loaded the Proxomitron's original default filter set, disabled some filters, and added your filter. I did not add a url match, since I hadn't needed one before (sorry it was late). Incorrectly guessed that the referer needed more details, reported findings, and went to bed.

Although $TYPE is not available in header filters, the file's extension (say .htm) and content-type header (via $IHDR) may be available.

HTH'
Reference URL's