Post Reply 
Cnet blocked by Sidki's filters
Nov. 08, 2008, 01:32 AM
Post: #1
Cnet blocked by Sidki's filters
Why do Sidki's filters block Cnet? I get the comments page for an article but cannot load the article at all. I have to either bypass Proxo, disable web filters temporarily, or switch to the old default Proxo filter set by Scott. Which filter is the likely culprit?
Add Thank You Quote this message in a reply
Nov. 08, 2008, 02:33 AM
Post: #2
RE: Cnet blocked by Sidki's filters
how 'bout a link, please?
Add Thank You Quote this message in a reply
Nov. 08, 2008, 06:42 AM
Post: #3
RE: Cnet blocked by Sidki's filters
Sorry. It's been this way for awhile. I figured anyone using Sidki's set had the problem.

Anyhow, http://www.webmonkey.com/blog/Windows_7_...Every_Move
In this blog, click on "saw a demo of the new geo features" which is a link to the CNet article but does not load the article if Sidki's configs are running. I tried Sidki's configs all the way back to first one I have from June 2005 and had the problem with all of them. I haven't been able to go to CNet at all for quite sometime unless I load Scott's last config.
Add Thank You Quote this message in a reply
Nov. 08, 2008, 12:39 PM
Post: #4
RE: Cnet blocked by Sidki's filters
the article is loading for me, no prob's at all, none, naughta, zip...

sidki 1/2/08 beta plus Kye-U's updates...
Add Thank You Quote this message in a reply
Nov. 08, 2008, 01:00 PM
Post: #5
RE: Cnet blocked by Sidki's filters
ProxRocks Wrote:the article is loading for me, no prob's at all, none, naughta, zip...

Not for me, it's totally broken using unmodified sidki 1/2/08 beta.

http://news.cnet.com/8301-13860_3-100843...7-1_3-0-20

Here's the problem:
Code:
BlockList 1188: in AdComments, line 283
<Match: <!> Remove: Comment-Block Ads I     7.01.05 [sd] (d.2) >

Note that the actual list matching expression is on line 306:

Code:
|MAC ad - *- MAC \[ ?++{0,130}-- >$SET(2=CNET MAC)

My experience with cnet is that matching on "MAC ad" is not reliable.
Cnet sometimes fails to put in the closing comment tag, or puts it in at the wrong place.
When this happens, huge chunks of code can match, badly breaking the page.

Personally, I would edit the list to break the "MAC ad" match to prevent this.
"xMAC ad" should do the trick.

Another alternative would be to exclude the filter from matching on cnet.
Keyword a_adcomm1 will prevent the filter from matching.

z12
Add Thank You Quote this message in a reply
Nov. 08, 2008, 02:24 PM (This post was last modified: Nov. 08, 2008 02:35 PM by ProxRocks.)
Post: #6
RE: Cnet blocked by Sidki's filters
ah, my bad...
you are correct, all i was getting was the comment/blog at the bottom...
not being a "cnet'r", i guess i thought that was all i was SUPPOSED to be getting, lol...
(Nov. 08, 2008 01:00 PM)z12 Wrote:  Cnet sometimes fails to put in the closing comment tag, or puts it in at the wrong place.
When this happens, huge chunks of code can match, badly breaking the page.

i'm wondering if there's a way to fix "failed or improperly-placed closing tags" as opposed to making this a "site-specific" fix...

which puts me at a total loss, i didn't think "comment tags" had any "closing"...


any thoughts ???
Add Thank You Quote this message in a reply
Nov. 08, 2008, 07:53 PM
Post: #7
RE: Cnet blocked by Sidki's filters
ProxRocks Wrote:i didn't think "comment tags" had any "closing"...

Well, maybe "closing" tag isn't the right term.
But a lot of ads are surrounded by some sort of "ad begin/end" html comment.
The cnet ad comments are a variation of this.

ProxRocks Wrote:i'm wondering if there's a way to fix "failed or improperly-placed closing tags"

Personally, I gave up on generic matching based on ad comment pairs (start|end).
I just had too many problems related to the end comment, even with a relatively small byte limit.

In general, I'm leery of filtering based on comment content.
However, I do like the approach taken by the "Remove: Comment-Block Ads II" filter.
An ad comment followed by a container of some sort.

I wonder how that filter would work if modified to work with the list.

z12
Add Thank You Quote this message in a reply
Nov. 09, 2008, 09:20 AM
Post: #8
RE: Cnet blocked by Sidki's filters
So, I don't see where to change the code to fix it for cnet. I looked in edit file/ad comments.

(Sorry, for being so dumb but until Sidki left, I haven't needed to know how to fix things).

I don't go to cnet a lot but I really do like cnet reviews when I decide to buy something like a new monitor, etc. and in this case I was following a link and wanted to read the article. I recall that over the years I have had Sidki's filters (2005 on) that periodically this would happen and then would mysteriously go away which fits with z12's remark about cnet sometimes forgetting to put in a closing comment tag. I can't recall if I posted in the past when it happened in Sidki's forum.
Add Thank You Quote this message in a reply
Nov. 09, 2008, 01:36 PM
Post: #9
RE: Cnet blocked by Sidki's filters
Thanks for the detective work z12!

(Nov. 09, 2008 09:20 AM)Mele20 Wrote:  So, I don't see where to change the code to fix it for cnet. I looked in edit file/ad comments.

Here's how I resolved this issue...

Code:
Name = "<!> Remove: Comment-Block Ads I     7.01.05 [sd] (d.2)"
Active = TRUE
URL = "$TYPE(htm)(^news.cnet.com/)(^$TST(keyword=*.(a_ads|a_adcomm1).*))"
Limit = 16000
Match = "<!-(^- PROX[:-])$TST(comment=1)[^>\r\n]++{0,32}$LST(AdComments)"
        "(^$TST(volat=*.headok:1.*))( |)"
        "&"
        "$SET(eAdComm=$GET(eAdComm)"
        "%3Cspan class=%22ProxFly-Span%22>$GET(mHead) I :%3C/span>"
        "   $ESC(\2\1)%3Cbr class=%22ProxFly-Br%22 />"
        ")"
        "&"
        "$SET(3=$TST(keyword=(^$TST(tFrameset=*))*.i_level:5.*)"
        "<span class="Prox ProxComment" style="display:$GET(displayD)">"
        "&nbsp;Comment&nbsp;I: \2\1</span>"
        ")"
        "&($TST(volat=*.log:2*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB Comment I\t\2\1 \t\u)|)"
Replace = "$SET(comment=)\3\0"

Close Proxomitron & browser.

Make a backup copy of 'default.cfg' (it's just a text file).

Open 'default.cfg' as a text file. You can temporarily rename it 'default.cfg.txt' if you need to.

Locate the filter:
"Name = "<!> Remove: Comment-Block Ads I 7.01.05 [sd] (d.2)"

Manually edit the following line from:
URL = "$TYPE(htm)(^$TST(keyword=*.(a_ads|a_adcomm1).*))"
- to this -
URL = "$TYPE(htm)(^news.cnet.com/)(^$TST(keyword=*.(a_ads|a_adcomm1).*))"

Save file as 'default.cfg'.

Run Proxo.

By adding (^news.cnet.com/) as shown this filter will now bypass "news.cnet.com/".

The page should now load correctly.
Add Thank You Quote this message in a reply
Nov. 09, 2008, 02:24 PM
Post: #10
RE: Cnet blocked by Sidki's filters
Editing the filter like this only prevents issues on the news domain of cnet:
Code:
(^news.cnet.com/)

All of cnet has this problem from time to time.
Zdnet uses "MAC AD" also as the sites are related.
Probably other related sites also.

Personally, I think your better off editing the list, which can be found here:
Code:
List.AdComments = "..\Lists\sidki_l_2008-01-02\AdComments.ptxt"
The path is relative to proxo's directory.

Usually, I just do quick edits to lists via "Edit Blockfile".
That didn't work for me with sidki's cfg however as the files didn't open.
Probably because I haven't set a file association for .ptxt files as mentioned in sidki's ReadMe.txt.

z12
Add Thank You Quote this message in a reply
Nov. 10, 2008, 02:03 AM
Post: #11
RE: Cnet blocked by Sidki's filters
(Nov. 09, 2008 02:24 PM)z12 Wrote:  Personally, I think your better off editing the list, which can be found here:
Code:
List.AdComments = "..\Lists\sidki_l_2008-01-02\AdComments.ptxt"
The path is relative to proxo's directory.
I've reverted my previous change and substituted your effective solution. Thanks again.
Add Thank You Quote this message in a reply
Nov. 10, 2008, 03:49 AM
Post: #12
RE: Cnet blocked by Sidki's filters
This was discussed before here.
Add Thank You Quote this message in a reply
Nov. 10, 2008, 04:12 PM
Post: #13
RE: Cnet blocked by Sidki's filters
I now realize that this issue was affecting a number of news and information sites that I regularly visit. After the "xMAC ad" fix it is quite obvious.
Add Thank You Quote this message in a reply
Nov. 10, 2008, 06:04 PM
Post: #14
RE: Cnet blocked by Sidki's filters
Gang;
(Nov. 08, 2008 02:24 PM)ProxRocks Wrote:  i didn't think "comment tags" had any "closing"...
To be sure, there are two kinds of "comment tags". Properly speaking, to avoid confusion (such as I first suffered when reading this), the Official HTML comment tag is the one using the less-than and greater-than signs, like this: <!-- and -->. Browsers are supposed to ignore (not render) anything between those two tags.

Of course, this topic is referring to a pair of code comments that are meant to identify a container of code (usually for modification purposes). Just a tiny bit of difference there, but one that should be noted. Wink

(Nov. 08, 2008 02:24 PM)ProxRocks Wrote:  i'm wondering if there's a way to fix "failed or improperly-placed closing tags" as opposed to making this a "site-specific" fix...
That'd be a job and a half! To my simple mind, that'd require a self-learning filter, involving some heuristics and Beyesian techniques, I'm sure. Not for the feint of heart! On the other hand, logic says that if you found the proper text whereupon to insert the closing code comment, then you've already found the offending code in the first place, and can deal with right then and there. Sinister


Further thoughts.......

It's a widely held sentiment that Proxo should be made to filter as generically as possible. I agree with that, but only until it becomes burdensome to "make it so". As z12 notes above, there comes a time when it's easier to ignore the problematic 'generic fix', and start doing things individually, on a site-by-site basis. As it happens, I've been more or less forced into this mode almost from the beginning, for one reason - I don't visit the worlds largest collection of diverse websites, so for me, the generic approach doesn't work. Since I'm not in the "Filter Factory" business, I just make it work for me, and that's good enough.

But that's not to say that generic is bad, simply that it's not always the best solution. Rather, it's a best-case scenario, and that has to be decided by each user for him/herself. And don't get me wrong, I do learn from all the examples posted here by the large number of contributors who each think slightly differently about how to solve a problem. (Which is all a filter is really doing, solving a problem in how one sees/views a web page.) Looked at in that light, it's difficult to imagine a "one-size-fits-all" solution to every problem. Shock

So ends my morning diatribe, thanks for putting up with me! Cheers


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Nov. 10, 2008, 06:33 PM
Post: #15
RE: Cnet blocked by Sidki's filters
Mele;

To put paid to the account (to answer your original question), the reason a given filter might not work is simply that when it was written, it did work, but since that time, the page code has changed. For whatever reason (and believe me, some page authors do know about The Proxomitron, and do write code that attempts to foil it), the code monkeys are constantly making unneeded and unwanted modifications to perfectly good working code. Mad with Teeth Banging Head It's called "job assurance", or something like that. Suspicious But it's really just acting like the first three letters of that word. Whistling

Way back when, in the days of 40Meg hard drives and Extended/Expanded memory above 640KBytes, we called such idiocy "feeping creaturism". Later it was called 'code bloat', but now it's just accepted as common practice, and no one sweats the cost of memory (virtual or real), so they don't carp about it. Bah! Why, when I was a young whippershapper, I used to assemble my own code, right on my very own Altair 8080... uphill.... both ways! Big Teeth

Kids these days! Not talking



Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: