The Un-Official Proxomitron Forum
Need help with Sidki filter causing problem - Printable Version

+- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums)
+-- Forum: Proxomitron Config Sets (/forumdisplay.php?fid=43)
+--- Forum: Sidki (/forumdisplay.php?fid=44)
+--- Thread: Need help with Sidki filter causing problem (/showthread.php?tid=1187)



Need help with Sidki filter causing problem - Mele20 - Dec. 18, 2008 11:31 AM

This filter (Sidki 2008-01-02) is causing problems at dslreports forums:

[Patterns]
Name = "
... Remove/Hide: Ad Containers - Headers 7.10.30 [sd] (d.3 l.3)"
Active = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.(a_ads|a_adcont|a_adcont_h|i_level:[12]).*))"
Limit = 4500
Match = "]+("
"(> ]+)+>"
"( (^]+(>)\4)++{0,1} (^"
"|( )\#"
"))+*,$TST(\0=td$SET(#=)$SET(3=)|*)"
"|"
"(^$TST(\7=*)&$TST(\4=>)|$TST(\6=span)|$TST(\7=*.i_adtag:[12].*))"
"$SET(1=-hide)$SET(3=
"• \0-head\1: \4\2\3"

Here is a link to the thread started by another Proxo user:

http://www.dslreports.com/forum/r21598345-Threads-name-is-gone#21600467

When I disable the above filter, the problem with the odd looking post by lilhurricane in the thread disappears. The main Security forum page at dslr was also displaying very weirdly and I disabled web filters in Proxo and the forum page displayed properly. That was 12 hours ago. Since then it has sporadically displayed correctly even though I reenabled web filters and did not know which filter was the culprit. I have found the culprit but I don't know if the culprit was causing the Security forum main page weird display or not. I am certain though that the display in the bugs forum thread that is weird is caused by this filter.

I believe this is the same filter that about a year or more ago was causing the same problem at another site and I posted about it in Sidki's forum and he referred me to another thread and said it was fixed in the latest filters which I didn't have. So, I wonder if this is a somewhat frequent problem where this filter has to be tweaked?


RE: Need help with filter causing problem - z12 - Dec. 22, 2008 11:02 AM

Bump...


RE: Need help with Sidki filter causing problem - sidki3003 - Dec. 25, 2008 02:52 PM

I've added a character limit (23, can be adjusted down to ~16).
That should fix your issue and similar ones.


RE: Need help with Sidki filter causing problem - z12 - Dec. 26, 2008 10:20 AM

I have to admit I looked at that filter for a long time.
I just couldn't wrap my head around the matching expression.

My only consolation was the html for this type of ad can vary a lot.
I've tried to write a similar filter for quite sometime without success.
Perhaps one day I'll figure it out how your filter works. Smile!

z12


RE: Need help with Sidki filter causing problem - sidki3003 - Dec. 26, 2008 12:30 PM

I think it's not that the code itself is difficult to understand. It's the subroutines which make the core difficult to spot. When i don't understand my own code, i decompose the respective filter.

Code:
// Comment: Only check these three containers
<(div|td|center)\0

// Comment: Don't match within scripts, comments, noscript blocks
(^$TST(script=*)|$TST(comment=1)|$TST(tNoscript=1))

// Comment: Subroutine 1 start -- Where to apply the Core RegExp
[^>]+(

// Comment: Skip certain tags
(> <(font|br+|img|h[1-6]|p|s(mall|pan|trong)|!--[^\n]++--)\6[^>]+)

+>

// Comment: Now test the code after the first tag
// Comment: And the second tag, unless it's a comment or our container is closing
( (^<(!-|/+(div|td|center)))[^>]+(>)\4)++{0,1}

// Comment: Fail on opening tags -- Skip HTML entities and non-characters
(^<)(\&[a-z]+; |[^a-z])+

)\8
// Comment: Subroutine 1 end

// Comment: Core RegExp
(
(a(d(vert(isers|s|)|s|)(^-) |n(nunci|zeigen+ ))|marketplace )
(^[a-z0-9ä_+])
|
(
ad(s\sby\s|vert(enti|isem))
|pubb+lici(dad|t[? eé&])
|(\w |)sponsor(^ed[a-z])
|(from|visit) our (advertiser|partner|sponsor)
)
[a-z0-9 ]+{0,23} (^[a-z0-9])
)\2

// Comment: Subroutine 2 start -- Either write back all open tags, or just hide the matched tag
(
// Comment: Hide-or-remove switch
$TST(keyword=(^*.i_adtag:[#*:0].)\7)

// Comment: Remove
$INEST(<$TST(\0),(*(

// Comment: Push unclosed tags into stack
<(t(able|body|foot|d|r|h)|div)\5$INEST(<$TST(\5),</$TST(\5))</$TST(\5) >
|(<(/|)(t(able|body|foot|d|r|h)|div)*> )\#

))+*,</$TST(\0))</$TST(\0) >$TST(\0=td$SET(#=<td style="height:0;padding:0">)$SET(3=</td>)|*)
|

// Comment: Hide
(^$TST(\7=*)&$TST(\4=>)|$TST(\6=span)|$TST(\7=*.i_adtag:[12].*))
$SET(1=-hide)$SET(3=<\0 style="display:none!important"\8\2)
)
// Comment: Subroutine 2 end

// Comment: Log line
($TST(volat=*.log:2*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB Ad-Head\1 \0 \t\6 \4\2 \t\u)|)



RE: Need help with Sidki filter causing problem - z12 - Dec. 26, 2008 05:09 PM

Thanks for the explanation.

Indeed, I wasn't sure what the intent of the $INEST subroutine was.
Mentally, my biggest problem was determining the scope of the match.
The variable nature of it confused me, and frankly, still does.
I desperately wanted to see an & or &&. It's just the way my brain works.

Armed with your explanation, I plan on doing some testing till I get it.
Thanks again.

z12


RE: Need help with Sidki filter causing problem - sidki3003 - Dec. 27, 2008 11:11 AM

The "write back unmatched tags" subroutine works quite well and is part of several filters.
It's JD's idea (mentally pushing the "thanks JD" button).

(Dec. 26, 2008 05:09 PM)z12 Wrote:  Mentally, my biggest problem was determining the scope of the match.
The variable nature of it confused me, and frankly, still does.

Ahh! The filter has two scopes.
Code:
<mytag> TEXT_NODE ( /* block scope */ $INEST(<mytag>,</mytag)</mytag > | /* tag scope */ )

You can also use it as fall-back if the entire block exceeds the filter's byte limit.
IIRC hpguru came up with it (mentally pushing the "thanks hpguru" button).

Quote:Thanks again.

You're welcome!

OT: So, it seems this forum is now rating its members by their social compatibility, aka thanks given/received stats. I guess it will take me a while to get used to it. Wink

Anyway, let me use this opportunity to thank you for your inventive JavaScript ideas. Quite some of them are implemented in "proxjs-full.js". Smile!


RE: Need help with Sidki filter causing problem - Kye-U - Dec. 27, 2008 08:01 PM

(Dec. 27, 2008 11:11 AM)sidki3003 Wrote:  OT: So, it seems this forum is now rating its members by their social compatibility, aka thanks given/received stats. I guess it will take me a while to get used to it. Wink

OT: Don't worry; I get the feeling you'll rack up "Thanks Received" in no time Cheers It's not really a rating, it's more as a way of people of giving thanks for help without having to post. Perhaps we can think about hiding the stats.


RE: Need help with Sidki filter causing problem - z12 - Dec. 28, 2008 01:05 PM

sidki3003 Wrote:Ahh! The filter has two scopes.

That's what confused me, thanks for the clarification.
I think I can visualize the matching expression now.
As I understand it, the scope of this filter is determined after the "text match".

It's a method I've never really considered before.
It seems a rather clever way to avoid the "byte limit" issue with the outer matching tag.
I need to grok this. Smile!

z12


RE: Need help with Sidki filter causing problem - Mele20 - Jan. 08, 2009 03:11 AM

(Dec. 25, 2008 02:52 PM)sidki3003 Wrote:  I've added a character limit (23, can be adjusted down to ~16).
That should fix your issue and similar ones.

Thank you so much! Smile! Sorry for the tardy reply. After a week went by with only a bump and Christmas came along, I sort of forgot about it. I'll download and install the fix in a few minutes and I'll let the poster at dslr who first brought this up know. They might already know because I gave the link to this thread when I posted it. Maybe they have kept up with it.