Post Reply 
Correct link targets (site-specific, list-based)
Nov. 03, 2012, 04:35 PM (This post was last modified: Nov. 03, 2012 06:31 PM by duebel13.)
Post: #1
Correct link targets (site-specific, list-based)
Hello,

for some hours I've been fighting with this new thingy:

Code:
Name = "Links: Correct Target (Site-specifc, HTML) [dbl]"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)$LST(CorrectLinkTargets-S)"
Bounds = "$NEST(<a\s,>)"
Limit = 512
Match = "\2( target=$AV(\4)|)(\3)>"
        "&"
        "*href=$AV(($TST(href)*)\5)"
        "&"
        "((^$TST(target=*))($TST(\5=*.(7z|zip|rar|exe|msi|pdf|tgz|gz|rpm|iso)(\?*|))"
        "$SET(1=_self)|$SET(1=_blank))|$SET(1=$GET(target)))"
Replace = "\2\3 target="\1">"

Given initial list:
Code:
www.ebay.(de|com|co.uk)/[^\?]++/i.html         $SET(href=/itm/)
*/forumdisplay.php                             $SET(href=showthread.php)

This will open the matched links in new tabs in eBay item lists and in this forums. I could also add $SET(target=_self) to any line to force page opening in same page. The filter also contains some code to force download links to not open in a new page.

Questions that puzzle me / things I couldn't get to work:

1.) The initial idea was to use positional variables instead of global variables but I've actually given up on this. Basically, what I originally intended was using $SET(0= instead of $SET(href=, and $SET(1= instead of $SET(target= in the list. But positional variables always expanded to nothing, wether I was using \0 or $TST(\0). Any idea what I might be missing?

2.) I'd like to use matching in the href part of the lists. For example:
Code:
*/forumdisplay.php                             $SET(href=*/(show|view)thread.php)

But as soon as I do this, the filter will no longer match.

I' really stuck and would appreciate any help. Thanks.
Add Thank You Quote this message in a reply
Nov. 03, 2012, 09:04 PM
Post: #2
RE: Correct link targets (site-specific, list-based)
duebel13 Wrote:1.) The initial idea was to use positional variables instead of global variables but I've actually given up on this. Basically, what I originally intended was using $SET(0= instead of $SET(href=, and $SET(1= instead of $SET(target= in the list. But positional variables always expanded to nothing, wether I was using \0 or $TST(\0). Any idea what I might be missing?

I don't think positional variables created in the URL Match survive to be used later in the filter. Also browse
http://sidki.proxfilter.net/prox/sidki-e...niques.txt

duebel13 Wrote:2.) I'd like to use matching in the href part of the lists.

Changes.txt Wrote:or also just...
$TST(variable)
can be used in a match to see if the variable's contents match
the current text. For example..
src="http://$TST(myhost)/"
note that this must be a literal match (except for case) - the variable's
value isn't treated as a matching expression with wildcards and such.

duebel13 Wrote:I' really stuck and would appreciate any help. Thanks.

Using your current work, I suspect that where you have "$TST(href)" you will want to call a list or reference an array.

For example, CorrectLinkTargets-S might look like

Code:
www.ebay.de/[^\?]++/i.html            $SET(href=www.ebay.de/)
www.ebay.com/[^\?]++/i.html           $SET(href=www.ebay.com/)
www.ebay.co.uk/[^\?]++/i.html         $SET(href=www.ebay.co.uk/)
/itm/& $TST(href=www.ebay.(de|com|co.uk)/)


*/forumdisplay.php                            $SET(href=*/forumdisplay.php)
*/(show|view)thread.php& $TST(href=*/forumdisplay.php)

and your filter might look like

Code:
Name = "Links: Correct Target (Site-specifc, HTML) [dbl]"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)$LST(CorrectLinkTargets-S)"
Bounds = "$NEST(<a\s,>)"
Limit = 512
Match = "\2( target=$AV(\4)|)(\3)>"
        "&"
        "*href=$AV(($LST(CorrectLinkTargets-S)*)\5)"
        "&"
        "((^$TST(target=*))($TST(\5=*.(7z|zip|rar|exe|msi|pdf|tgz|gz|rpm|iso)(\?*|))"
        "$SET(1=_self)|$SET(1=_blank))|$SET(1=$GET(target)))"
Replace = "\2\3 target="\1">"

An array (using @ to separate possible values) could look something like

Code:
www.ebay.de/[^\?]++/i.html            $SET(href=@/itm/@)
www.ebay.com/[^\?]++/i.html           $SET(href=@/itm/@)
www.ebay.co.uk/[^\?]++/i.html         $SET(href=@/itm/@)

*/forumdisplay.php                            $SET(href=@/showthread.php@/viewthread.php@)

The expression to use the array (if possible), I will have to post later.

I'd probably add something to the filter to clear the "href" variable when work is done. Also, I have not tested the filter and list...



HTH

Changes.txt in the Proxomitron's Docs folder or http://www.proxomitron.info/45/docs/Changes.txt
Add Thank You Quote this message in a reply
Nov. 03, 2012, 11:11 PM (This post was last modified: Nov. 03, 2012 11:35 PM by duebel13.)
Post: #3
RE: Correct link targets (site-specific, list-based)
Thanks JJoe. Using , instead of @, this is the latest version of the filter:

Code:
Name = "Links: Correct Target (Site-specifc, HTML) [dbl]"
Active = TRUE
Multi = TRUE
URL = "($TYPE(htm)|$TST(CType=xhtml))$LST(CorrectLinkTargets-S)"
Bounds = "$NEST(<a\s,>)"
Limit = 512
Match = "\2( target=$AV(\4)|)(\3)>"
        "&"
        "*href=$AV(([^\?]+)\5(\?*|))&($TST(href=*,$TST(\5),*)|"
        "$TST(href=([^,]+{1,*})\6,*)$TST(\5=$TST(\6)*)|$TST(href=(*,([^,]+{1,*})\6)$TST(\5=*$TST(\6)))
        "&"
        "((^$TST(target=*))($TST(\5=*.(7z|zip|rar|exe|msi|pdf|tgz|gz|rpm|iso))"
        "$SET(1=_self)|$SET(1=_blank))|$SET(1=$GET(target)))"
        "&"
        "(^*\starget="\1")
Replace = "\2\3 target="\1">"

These are working list entries:

Code:
(www|encrypted).google.[^/:]+(:443|)/search\?    $SET(href=http,)
www.heise.de/newsticker/                        $SET(href=/newsticker/meldung,)
(www.ebay.(de|com|co.uk))\0/[^\?]++/(i|m).html    $SET(href=http://\0/itm/,)
*/(viewforum|[Ff]orum(s|)/index).php        $SET(href=,./viewtopic.php,./viewforum.php,./memberlist.php,)

I guess this filter isn't one of the fastest and it still lacks regular expressions in the href match, but it offers some kind of flexibility in this version:

- only characters up to (but not including) a question mark are used for the match so that session ids and other arguments are discarded before matching
- everything between two commas must fully and exactly match
- if the first literal after the equal sign has no leading comma, it will be treated as if it had an asterisk wildcard at the end, i.e. "http," stands for http* (any URL starting with http)
- if the last literal has no trailing comma, it will be treated as if it had an asterisk wildcard at the beginning, i.e ",.php" stands for any URL ending with .php (or .php?var=value&...)

The $TST(CType=xhtml) used here is special to M. Buerschgen's filter set, but I noted that sidki's has a similar notation.

Thanks again.

Regards and <Gn(8)> (it's 8 min. past midnight here...)
Add Thank You Quote this message in a reply
Nov. 04, 2012, 04:27 PM
Post: #4
RE: Correct link targets (site-specific, list-based)
duebel13 Wrote:I guess this filter isn't one of the fastest

The posted filter is missing some quotes and maybe a parenthesis. It doesn't import.

See "Blocklist Indexing (hashes)" at http://www.proxomitron.info/45/help/Bloc...ation.html , "Of course, it's better to be hashable.".

Also, I see "[Ff]". The Proxomitron's match is case insensitive.

duebel13 Wrote:and it still lacks regular expressions in the href match,

$TST(variable) is literal.

For "Remove: Specific Functions on sel. Sites", sidki choose to have the user add an entry to create an array.

Code:
## block specific functions                     $SET(sUserFn=§MATCH1§MATCH2§)
##
## "MATCH" targets function names and IF conditions, as well as the initial
## code within script tags.  Each string must be surrounded by section signs
## (§).  You just need to match the beginning of the target string, unless you
## append "$SET(sUserFnR==)".
##
## Example:
##   [^.]+.techtarget.com/      $SET(sUserFn=§HBX_§forMembersOnly§)
##
## Above entry would replace these code blocks on TechTarget sites:
##   function HBX_STRING() {/*...*/}
##   function HBX_ERROR() {/*...*/}
##   if (forMembersOnly && something) {/*...*/}
##   <script>forMembersOnly=true;/*...*/</script>
##
## If you append "$SET(sUserFnR=!)", replaced - hence empty - "if" blocks get
## executed instead of possible "else" blocks, which is the default.
##
## For above logic, plus exact string match only, append "$SET(sUserFnR=!=)".
## ----------------------------------------------------------------------------

Code:
Name = "Remove: Specific Functions on sel. Sites     09.07.04 [sd] (d.0)"
Active = TRUE
URL = "($TST(hCT=*html)|$TYPE(js)|$TYPE(vbs))$TST(sUserFn=*)"
Limit = 32766
Match = "function\s$TST(script=[1s]*)([^()"';]+{1,*})\4"
        "$TST(sUserFn=§(\5§$TST(\4=(($TST(\5)($TST(sUserFnR=*=*)|*))\6|*)))+)$TST(\6=*)"
        " $NEST(\(,\)) $NEST({,})"
        "$SET(1=function \4() { return String())$SET(2=S-Spec Fn: $GET(sUserFnR))"
        "|"
        "if \( $TST(script=[1s]*)($INEST(\( ,\)))\4"
        "$TST(sUserFn=§(\5§$TST(\4=(($TST(\5)($TST(sUserFnR=*=*)|*))\6|*)))+)$TST(\6=*)"
        "\)( //[^\r\n]+| /\**\*/)+ ($NEST({,})|(*;)+{1})"
        "$SET(7=$TST(sUserFnR=*!*)!)$SET(1=if (\70) {)$SET(2=S-Spec If: $GET(sUserFnR))$SET(3= - (\4))"
        "|"
        "<script((*>)+{1})\7( <!--[^\r\n]+)+ ([^<;\r\n]+{1,30})\4"
        "$TST(sUserFn=§(\5§$TST(\4=(($TST(\5)($TST(sUserFnR=*=*)|*))\6|*)))+)$TST(\6=*)"
        "$INEST(<script,</script)$SET(1=<script\7{)$SET(2=S-Spec Sc: $GET(sUserFnR))"
        ""
        "&($TST(volat=*.log:2*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB SiteSpec_JS \2 \t\4 \t\u)|)"
        "$SET(eAdJS=$TST(hCT=*html)$GET(eAdJS)"
        "%3Cspan class=%22Pr0xFly-Span%22%3E$GET(mHead) \2%3C/span%3E"
        "$ESC(\4)%3Cbr class=%22Pr0xFly-Br%22 /%3E"
        ")"
Replace = "\1 /* PROX: \2 Removed\3 */ }"

duebel13 Wrote:Regards and <Gn(8)> (it's 8 min. past midnight here...)

Not a problem.
Have fun.
Add Thank You Quote this message in a reply
Nov. 04, 2012, 05:55 PM (This post was last modified: Nov. 04, 2012 06:03 PM by duebel13.)
Post: #5
RE: Correct link targets (site-specific, list-based)
(Nov. 04, 2012 04:27 PM)JJoe Wrote:  The posted filter is missing some quotes and maybe a parenthesis. It doesn't import.

Hmm, try this one (active in my config so should work):

Code:
Name = "Links: Correct Target (site-specific, (X)HTML) [dbl]"
Active = TRUE
Multi = TRUE
URL = "($TYPE(htm)|$TST(CType=xhtml))$LST(CorrectLinkTargets-S)"
Bounds = "$NEST(<a\s,>)"
Limit = 512
Match = "\2( target=$AV(\4)|)(\3)>"
        "&"
        "*href=$AV(([^\?]+)\5(\?*|))&($TST(href=*,$TST(\5),*)|"
        "$TST(href=([^,]+{1,*})\6,*)$TST(\5=$TST(\6)*)|"
        "$TST(href=*,([^,]+{1,*})\6)$TST(\5=*$TST(\6))|"
        "$TST(\5=*$TST(href)*))"
        "&"
        "((^$TST(target=*))($TST(\5=*.(7z|zip|rar|exe|msi|pdf|tgz|gz|rpm|iso))"
        "$SET(1=_self)|$SET(1=_blank))|$SET(1=$GET(target)))"
        "&"
        "(^*\starget="\1")"
Replace = "\2\3 target="\1">"

If this still doesn't work for you then there's something wrong with this forum's code...

Quote:See "Blocklist Indexing (hashes)" at http://www.proxomitron.info/45/help/Bloc...ation.html , "Of course, it's better to be hashable.".

IMO this is overrated. But I have no figures and thus this is basically just belief.

Quote:Also, I see "[Ff]". The Proxomitron's match is case insensitive.

We're talking about URLs here, not HTML code. The host and domain part is case insensitive, but at least on Unix/Linux based servers, the path for sure is case sensitive...

Quote:$TST(variable) is literal.

ACK - learned this the hard way over the last couple of days...

Quote:For "Remove: Specific Functions on sel. Sites", sidki choose to have the user add an entry to create an array.

Actually I wrote my own functions to add and remove expressions, attributes, and values in scripts, styles, and text. Even some kind of Stylish4Prox...

But, yes, I noted that sidki's filter set (which you are still maintaining if I don't missed anything) is a great source for high quality filters. It's a shame that both sidki and mb turned their back on Proxomitron filter development in 2010. Interest in Proxomitron dramatically dropped over the last couple of years. In the German forums, there is one (!) active user besides me and I wonder why I am still posting filters and list updates there...

Anyway, as long as IPv4 is there, Proxomitron will be there, too - at least for me. There are three tools that I really need on every new pc that I install Windows on, and that are Total Commander, Proxomitron, and Firefox v3. So 2 out of 3 are already dead - do I have to mention that I "updated" to XP from Win2k early this year, lol?

Regards
Add Thank You Quote this message in a reply
Nov. 04, 2012, 10:46 PM
Post: #6
RE: Correct link targets (site-specific, list-based)
duebel13 Wrote:Hmm, try this one (active in my config so should work):

It does. The first needed closing quotes for line 10 and 15. Also had an unmatched parenthesis in line 9.

duebel13 Wrote:IMO this is overrated. But I have no figures and thus this is basically just belief.

We tested. The advantage increases with the size of the list.

duebel13 Wrote:We're talking about URLs here, not HTML code. The host and domain part is case insensitive, but at least on Unix/Linux based servers, the path for sure is case sensitive...

True but my Proxomitron's matching routine doesn't care. f will match f or F. Case is retained for the replacement, however.

Testing

Code:
[Patterns]
Name = "test case match"
Active = FALSE
Limit = 256
Match = "(f+)\1"
Replace = "**\1**"

against fF
yields **fF**

duebel13 Wrote:It's a shame that both sidki and mb turned their back on Proxomitron filter development in 2010. Interest in Proxomitron dramatically dropped over the last couple of years. In the German forums, there is one (!) active user besides me and I wonder why I am still posting filters and list updates there...

This all took too much for sidki to continue. At some point you must consider your needs. Considering his contributions, 'turned his back on' would sound harsh to me. Wink

I did not know about Michael. Did he make a farewell post? Seeing http://wayback.archive.org/web/*/http://...rox/Forum/ and remembering CastleCops, you might want to archive some posts.

I appreciate Scott R. Lemmon's gift. I try to help others feel the same.
Add Thank You Quote this message in a reply
Nov. 05, 2012, 05:24 PM
Post: #7
RE: Correct link targets (site-specific, list-based)
(Nov. 04, 2012 10:46 PM)JJoe Wrote:  
duebel13 Wrote:We're talking about URLs here, not HTML code. The host and domain part is case insensitive, but at least on Unix/Linux based servers, the path for sure is case sensitive...

True but my Proxomitron's matching routine doesn't care. f will match f or F. Case is retained for the replacement, however.

OK, thanks. Good to know.

Quote:I did not know about Michael. Did he make a farewell post?

Kind of. Can you read German?
http://www.buerschgens.de/Prox/Forum/vie...336#p24336

My translation of the first paragraph: "I'm still alive but I don't think that I will ever release a new version. Due to my job, I'm on the road quite often and don't even get managed to read this forums - not to mention to post."

The last sentence translates to: "I don't have any time for community work at the moment."

That's not a real farewell, I guess, but some kind of longer lasting "temporarily unavailable".

Quote: Seeing http://wayback.archive.org/web/*/http://...rox/Forum/ and remembering CastleCops, you might want to archive some posts.

What you mean - the pages could vanish forever?

Quote:I appreciate Scott R. Lemmon's gift.

But I never understood why his family don't release the source code to the public - this would immortalize him.

Quote:I try to help others feel the same.

I appreciate that - thanks, JJoe.
Add Thank You Quote this message in a reply
Nov. 05, 2012, 06:41 PM (This post was last modified: Nov. 05, 2012 07:02 PM by JJoe.)
Post: #8
RE: Correct link targets (site-specific, list-based)
(Nov. 05, 2012 05:24 PM)duebel13 Wrote:  Kind of. Can you read German?

Many years ago "kind of". These days I struggle.

(Nov. 05, 2012 05:24 PM)duebel13 Wrote:  
Quote: Seeing http://wayback.archive.org/web/*/http://...rox/Forum/ and remembering CastleCops, you might want to archive some posts.

What you mean - the pages could vanish forever?

Yes. http://www.dslreports.com/forum/r2163033...ops-Closed

(Nov. 05, 2012 05:24 PM)duebel13 Wrote:  
JJo Wrote:I try to help others feel the same.

I suspect that this is also part of why you continue to post filters and list updates at Deutsches Proxomitron-Supportforum.

Thank you
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: