Post Reply 
Nail down this bug?
Oct. 13, 2010, 05:57 PM
Post: #1
Nail down this bug?
In case someone has fun (!) with tracking down things further:

Value of positional variable (\9) in non-matching filter (<a>: URL Untangler)
apparently gets evaluated in matching filter (Block/Modify: Sel. JS Methods):
http://n.yam.com/j/mb_publishbutton.js

Minimal test filter:
Code:
[Patterns]
Name = "TEST -- Matches: n.yam.com/j/mb_publishbutton.js - 5"
Active = TRUE
URL = "n.yam.com/j/mb_publishbutton.js"
Limit = 16
Match = "location.href$TST(\9=*)"
Replace = "location.foo1"
Add Thank You Quote this message in a reply
Oct. 13, 2010, 06:30 PM
Post: #2
RE: Nail down this bug?
When I view the source of that JS, line# 12 contains:
Code:
    u=location.href;t=document.title;
Is that what you're asking?
Add Thank You Quote this message in a reply
Oct. 13, 2010, 06:37 PM
Post: #3
RE: Nail down this bug?
Well, no.

A positional variable in Proxomitron's language acts like a local variable in other languages, except that the scope is the respective filter. For that reason, posted test filter that is testing for a "local" variable, without having set it before, should never match.

However, it does match.
Add Thank You Quote this message in a reply
Oct. 14, 2010, 12:34 AM (This post was last modified: Oct. 14, 2010 12:36 AM by JJoe.)
Post: #4
RE: Nail down this bug?
Can confirm that "<a>: URL Untangler" is the trouble maker.

The match for "<a>: URL Untangler" contains
Code:
(^\\$TST(\3))\9
.
Changing that to
Code:
(^\\$TST(\3)) \9
seems to fix this problem, for me.

Also changed
Code:
(^\\$TST(\8))\5

Code:
[Patterns]
Name = "<a>: URL Untangler     09.01.07 (multi) [gz sd] (d.2 l.3) test"
Active = TRUE
Multi = TRUE
URL = "($TST(hCT=*html)|$TYPE(js))(^$TST(keyword=*.(a_code|a_rdlink|i_level:[12]).*))"
Bounds = "<a\s[^>]++href=$AV(*)(^$TST(comment=1)|$TST(tNoscript=1))$INEST(<a\s,>)>"
Limit = 3072
Match = "<a*\shref=("
        "$TST(script=[1s]*)(\\(")\3 (^\\$TST(\3)) \9 \\$TST(\3)|\"(*\' \+&(([^"']+\' \+*\')+{1,*}*)\9\"))"
        "|$AV( ([^&\\]*)\9)"
        ")"
        ""
        "&$TST(\9="
        "(^(javascript|mailto|ed2k):|https+://"|"+ [+);])((??????????((*[^a-z0-9._("])\8"
        "&&(^$TST(\8=;\s*|*(\((^*\))*|config\=|host\=|do(main[a-z]+|ne)\=|r(ef(er)+|ss)\=)))*"
        "))"
        "("
        "((http|ftp)(s|)://)\4(^")"
        "|(desturl|url)\=\=+\[+(^(^[a-z0-9][a-z0-9-]+.[a-z0-9]))$SET(4=http://)"
        ")"
        ")+{1,*}"
        ""
        "("
        "(*\?&&(^*\&)*)((^$TST(script=[1s]*))[^*\[\]]+|[^*]+)"
        "|"
        "(^$TST(script=[1s]*))[^*&\[\]]+"
        "|[^*&]+"
        ")\3"
        ""
        "(\6("(^?|$TST(\6=*"*)))\7|*)"
        ")"
        ""
        "&<a("
        " class=\\+"+Pr0XFl"
        "("
        "Hex(\\+"|)(*\shref=)\2"
        "("
        "(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "*prxlink=(\\(")\8 (^\\$TST(\8)) \5 \\$TST(\8)|$AV((^\\") \5))"
        "(\# onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*)))+\#"
        "&$SET(8="
        " onmouseover=\0prxO.oFly.flShow"
        "(prxO.oFly.flLink(\6Base16-Tracking&#160;Link\6,\6$ESC(\5)\6),event,1);\0"
        " onmouseout=\0prxO.oFly.flDHide();\0"
        ")"
        ")"
        "|[^ ]+$SET(1=-Tracking)\8(\&#160;Link*\shref=)\2"
        "(\\+")+{0,1} $TST(\9)(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "(\# onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*)))+\#"
        ")"
        "|"
        "(\#("
        " onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*))"
        "|(\shref=)\2"
        "(\\+")+{0,1} $TST(\9)(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "))+$SET(8="
        " onmouseover=\0prxO.oFly.flShow"
        "(prxO.oFly.flLink(\6Tracking&#160;Link\6,\6$ESC(\9)\6),event,1);\0"
        " onmouseout=\0prxO.oFly.flDHide();\0"
        ")\#"
        ")"
Replace = "<a class=\0Pr0XFlPref\0\8\1\2\0\4\3\7\0\@"

Another choice would be to initialize the variables in the filter.


I think...?
Add Thank You Quote this message in a reply
Oct. 14, 2010, 05:30 PM (This post was last modified: Oct. 14, 2010 05:35 PM by sidki3003.)
Post: #5
RE: Nail down this bug?
I wasn't intending to work around that issue in the URL Untangler. But you're right, it's the better idea. Such overlapping incidents (there are a few more) are likely to cause trouble elsewhere, where you wouldn't expect it. Thanks for your investigations. Smile!


Out of habit i prefer "(\X)" notation, because matching initial white-space can be important (in other cases).

There are numerous filters containing (^...)\X expressions (it's supposed to be a legal notation). I haven't noticed side-effects there, so i think i'll leave them as is.

Initializing each positional variable would be rather painful for filters that make use of lists, which - as we know - are in-scope from the filter's point of view.


Below filter version probably contains further changes. That's because there also is a 10.09.25 version, which i haven't posted yet. It may be merged with the current beta config only.

Code:
[Patterns]
Name = "<a>: URL Untangler     10.10.14 (multi) [gz sd] (d.2 l.3)"
Active = TRUE
Multi = TRUE
URL = "($TST(hCT=*html)|$TYPE(js))(^$TST(keyword=*.(a_code|a_rdlink|i_level:[12]).*))"
Bounds = "<a\s[^>]++href=$AV(*)(^$TST(comment=1)|$TST(tNoscript=1))$INEST(<a\s,>)>"
Limit = 3072
Match = "<a*\shref=($TST(script=[1s]*)("
        "\\(")\3 (^\\$TST(\3))(\9) \\$TST(\3)"
        "|\"(*\' \+&(([^"']+\' \+*\')+{1,*}*)\9\")|\'(*\" \+&(([^"']+\" \+*\")+{1,*}*)\9\')"
        ")|$AV( ([^&\\]*)\9))"
        ""
        "&$TST(\9="
        "(^(javascript|mailto|ed2k):|https+://"|"+ [+);])((??????????((*[^a-z0-9._("])\8"
        "&&(^$TST(\8=;\s*|*(\((^*\))*|config\=|host\=|do(main[a-z]+|ne)\=|r(ef(er)+|ss)\=)))*"
        "))"
        "("
        "((http|ftp)(s|)://)\4(^")"
        "|(desturl|url)\=\=+\[+(^(^[a-z0-9][a-z0-9-]+.[a-z0-9]))$SET(4=http://)"
        ")"
        ")+{1,*}"
        ""
        "("
        "(*\?&&(^*\&)*)((^$TST(script=[1s]*))[^*\[\]]+|[^*]+)"
        "|"
        "(^$TST(script=[1s]*))[^*&\[\]]+"
        "|[^*&]+"
        ")\3"
        ""
        "(\6("(^?|$TST(\6=*"*)))\7|*)"
        ")"
        ""
        "&<a("
        " class=\\+"+Pr0XFl"
        "("
        "Hex(\\+"|)(*\shref=)\2"
        "("
        "(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "*prxlink=(\\(")\8 (^\\$TST(\8))(\5) \\$TST(\8)|$AV((^\\") \5))"
        "(\# onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*)))+\#"
        "&$SET(8="
        " onmouseover=\0prxO.oFly.flShow"
        "(prxO.oFly.flLink(\6Base16-Tracking&#160;Link\6,\6$ESC(\5)\6),event,1);\0"
        " onmouseout=\0prxO.oFly.flDHide();\0"
        ")"
        ")"
        "|[^ ]+$SET(1=-Tracking)\8(\&#160;Link*\shref=)\2"
        "(\\+")+{0,1} $TST(\9)(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "(\# onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*)))+\#"
        ")"
        "|"
        "(\#("
        " onmouse(down|out|over)=($TST(script=[1s]*)\"*\\\'\' \+*\\\'*\"|$AV(*))"
        "|(\shref=)\2"
        "(\\+")+{0,1} $TST(\9)(\'$SET(6=&#x22;)|$SET(6=&#x27;)(\\+")+{0,1})\0"
        "))+$SET(8="
        " onmouseover=\0prxO.oFly.flShow"
        "(prxO.oFly.flLink(\6Tracking&#160;Link\6,\6$ESC(\9)\6),event,1);\0"
        " onmouseout=\0prxO.oFly.flDHide();\0"
        ")\#"
        ")"
Replace = "<a class=\0Pr0XFlPref\0\8\1\2\0\4\3\7\0\@"
Add Thank You Quote this message in a reply
Oct. 14, 2010, 06:34 PM
Post: #6
RE: Nail down this bug?
I haven't completed my investigations yet. The problem also exists inside filters.

I also prefer "(\X)" notation and have been testing with it since my last post.
Any list call after a "negate string positional variable mess" seems to initialize all positional variables.

When I get a chance I'll put something together and post it so that all can see what we are talking about.
I plan on recommending the use of (\X) when "\X" follows a negate string.
I think the only choice is to standardize a workaround.

Thanks
Add Thank You Quote this message in a reply
Oct. 15, 2010, 05:53 PM
Post: #7
RE: Nail down this bug?
If it turns out (soon) that it's better to replace all occurrences of mentioned notation, i'll do it, of course.
If so, we'd also need a Perl-style RegEx that finds all such expressions (for Notepad++ or UltraEdit, must match nested parens).
Add Thank You Quote this message in a reply
Oct. 16, 2010, 04:48 AM
Post: #8
RE: Nail down this bug?
(Oct. 14, 2010 06:34 PM)JJoe Wrote:  When I get a chance I'll put something together and post it so that all can see what we are talking about.

These filters may explain the problem that I see.
I would not mind being wrong...
I plan to put a bug report in Proxomitron Program later.

Add the filters to the top of your Web Page Filters.

Code:
[Patterns]
Name = "A -- correct: $TST(\0=*) does not match"
Active = FALSE
Limit = 256
Match = "$TST(\0=*)"
Replace = "\k"

Name = "B -- incorrect: $TST(\0=*) does match"
Active = FALSE
Limit = 256
Match = "(^z)\0nevermatch(^)|$TST(\0=*)"
Replace = "\k"

Name = "C -- correct: $TST(\0=*) does not match with workaround"
Active = FALSE
Limit = 256
Match = "(^z)(\0)nevermatch(^)|$TST(\0=*)"
Replace = "\k"

Name = "D -- Create problem for A and C "
Active = FALSE
Limit = 256
Match = "(^z)\0nevermatch(^)"
Replace = "\k"

Test filter "A". Test should fail because \0 should have no value.

Test filter "B". This filter should not match but does!
The left side of the match "(^z)\0nevermatch(^)" always fails. The right side is the same as filter "A"s match.
Filter "B" and filter "A" should test the same.

Test filter "C". Test should fail because \0 should have no value and does fail. Putting the variable in parentheses 'fixes' something.

Enable filter "A" and filter "D" and try to load Google. Google should load but will not!
Filter "D" never matches and should not change the value of the positional variable \0.
However, after filter "D" fails, filter "A" finds "$TST(\0=*)" to be true and matches.

(Oct. 15, 2010 05:53 PM)sidki3003 Wrote:  If it turns out (soon) that it's better to replace all occurrences of mentioned notation, i'll do it, of course.
If so, we'd also need a Perl-style RegEx that finds all such expressions (for Notepad++ or UltraEdit, must match nested parens).

I tend to agree with

sidki3003 Wrote:I haven't noticed side-effects there, so i think i'll leave them as is.

I don't think the perfect expression is possible. Errors could be introduced.
Add Thank You Quote this message in a reply
Oct. 16, 2010, 09:21 AM
Post: #9
RE: Nail down this bug?
(Oct. 16, 2010 04:48 AM)JJoe Wrote:  However, after filter "D" fails, filter "A" finds "$TST(\0=*)" to be true and matches.

Well, finally a serious bug found: a positional var lives between different filters. Thumbs Up

Sidki mentioned another quirk about positional var before, which seems related: http://prxbx.com/forums/showthread.php?t...2#pid11062

(Oct. 15, 2010 05:53 PM)sidki3003 Wrote:  If so, we'd also need a Perl-style RegEx that finds all such expressions (for Notepad++ or UltraEdit, must match nested parens).

Do you have a pattern for all those expressions that should be matched and what should they be replaced to?
Add Thank You Quote this message in a reply
Oct. 16, 2010, 09:53 AM
Post: #10
RE: Nail down this bug?
(Oct. 16, 2010 04:48 AM)JJoe Wrote:  These filters may explain the problem that I see.

Very well done! I'll see what i can do.
Add Thank You Quote this message in a reply
Oct. 16, 2010, 09:55 AM
Post: #11
RE: Nail down this bug?
(Oct. 16, 2010 09:21 AM)whenever Wrote:  Do you have a pattern for all those expressions that should be matched and what should they be replaced to?

Expressions like:
Code:
(^*.i_layout:)\1
(^*.(a_headers.|i_spoof:))\1
(^\\$TST(\4))\0
(^(^;|,|}|\s))\2
(^([^/]++.|)$TST(uDom)(^.))\4

May also be multi-line. If it's proper RegEx, it doesn't mean that it actually works in Notepad++ or UltraEdit. (SciTE seems to work worst.)

It shouldn't be a replacement expression. I just need a list of all such strings. Correction must be done manually.

I'm currently counting 79 such strings, but my RegEx is rusty.
Add Thank You Quote this message in a reply
Oct. 16, 2010, 01:33 PM (This post was last modified: Oct. 16, 2010 01:37 PM by whenever.)
Post: #12
RE: Nail down this bug?
I am not getting better. I got 72 matches given sidki_2010-09-19.ptron via below regex pattern.

Code:
\(\^(?:\((?>[^)]+)\)|(?>[^()]+))+\)\\\d
Add Thank You Quote this message in a reply
Oct. 16, 2010, 02:40 PM
Post: #13
RE: Nail down this bug?
My 79 count also contained false positives.

Which editor were you testing with?
I get zero matches in Notepad++ with your expression, also after updating to v. 5.8.2.
Add Thank You Quote this message in a reply
Oct. 16, 2010, 03:43 PM (This post was last modified: Oct. 16, 2010 03:45 PM by JJoe.)
Post: #14
RE: Nail down this bug?
I have PSPad. I haven't used it much.
\(\^([^()]*\([^()]*\))*[^()]*\)\\\d gets me 67. I don't see any false positives but I'm rushing.
It doesn't appear to match 'multi line'.
It does create a list.
PSPad also has a "text differences" generator.

I suggest that we all start with the same file, make our changes, and compare files with the "text differences" generator.

Got to go.
Add Thank You Quote this message in a reply
Oct. 16, 2010, 05:54 PM (This post was last modified: Oct. 16, 2010 05:55 PM by sidki3003.)
Post: #15
RE: Nail down this bug?
OK. For now i've used an online tool ( http://gskinner.com/RegExr/ ) - which does do multi-line - with whenever's RegEx (ignoring 3 FPs).

It's apparently only about the .ptron file, the lists don't seem to be affected.
Because the respective filter version needs to be updated too, i've gone ahead with creating two MergeMe files: one for the header filters, one for the webfilters (containing also "<a>: URL Untangler", for completeness). They may be merged with the current beta config only.

It isn't necessary that everyone is merging them. But i do need a couple of people that check with me for possible regressions.

Don't forget to do the "Save -> (Configure -> OK -> Save)+{3}" procedure after updating the header filters!


Attached File(s)
.zip  bug_10-16.zip (Size: 11.83 KB / Downloads: 606)
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: