The Un-Official Proxomitron Forum
sidki's config set: 2005-06-09 - Printable Version

+- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums)
+-- Forum: Proxomitron Config Sets (/forumdisplay.php?fid=43)
+--- Forum: Sidki (/forumdisplay.php?fid=44)
+--- Thread: sidki's config set: 2005-06-09 (/showthread.php?tid=358)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20


- sidki3003 - May. 15, 2005 01:14 PM

sidki3003 Wrote:
Quote:All the requests [552-556] were identical and generated by clicking on a link to the article one time.
By clicking on what link exactly? I don't see them here.
Ahh, i know what you mean! I was suspicious b/c of "exit_polls", but that was just the name of that article. Smile!

Those requests are for those little HTML docs that appear if you hover over an article link for a few seconds. HTML -> 1s cache -- hover -> new request. Since the server isn't sending any caching headers (nor is accepting any, for that matter), they get re-fetched each time.

sidki


- z12 - May. 15, 2005 01:19 PM

I see whats going on with the yahoo news link. There appears to be some javascript running when you hover over the link. I guess thats part of their redesign, but it doesn't seem very efficient.

Here's the link:

Democrats Consider Revamping Primaries
http://news.yahoo.com/s/ap/20050515/ap_on_...rimary_scramble

But I see all of their article links are doing this.

Mike

edit: lol, you beat me to it Smile!


- sidki3003 - May. 15, 2005 01:46 PM

I was wondering why those little docs aren't cached, even if i set them to "cache 1 day".
But it's the way the script works:
http://us.i1.yimg.com/news.yahoo.com/v10/u...js?v=1116017632

They are doing an XMLHttpRequest and Firefox 1.0 has a bug to always re-fetch documents for such requests. Sad
It's fixed in 1.1 nightlies from what i've heard.

sidki


- z12 - May. 15, 2005 02:22 PM

ah..I see.

I've been looking at a nightly to run for awhile, but I'm holding back till the trunk gets a bit more stable.

Mike


- z12 - May. 16, 2005 04:23 PM

At news.yahoo.com I'm getting alot of matches like the following:

Code:
<Match: <a><body>: Block sel. JS Properties     4.01.22 (multi) [sd] (d.1) >
<a onMouseOut="cancelPreview()" onMouseOver="showPreview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">
</Match>
<a onMouseOut="cancelPreview()" onMouseOver="showPreview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

These are matching even though theres no matching property in the list to block.

Looks like there's a couple of ways to "fix" it, but I'm not even sure its broken. Smile!

Mike


- sidki3003 - May. 16, 2005 07:07 PM

That's because the initial quick test "\son[a-z]+=" matches there. What follows is a zero-to-infinite loop, that always matches. zero -> buffer is returned unchanged. If you append a "{1,*}" to that loop or let the second test fail some other way, you'll notice a ~50% slow-down. (Was quoting from an older email, don't laugh JJoe.)

I used to append "always" to that type of filters but was running out of space in the name field. *lol*

sidki


- z12 - May. 17, 2005 09:23 AM

It still has a problem with catching the right attribute name in \2

Test Code
Code:
<a onMouseOut="cancelPreview()" onMouseOver="show.referrer.Preview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

Replacement code
Code:
\@ \2

Mike


- sidki3003 - May. 17, 2005 01:37 PM

It doesn't try to. It just catches the first one (which is the right one in most cases). The content of \2 is only used for informational purposes and doesn't affect the replacement string.

To get my point regarding speed a bit clearer:
If a filter needs 0.02ms or 0.03ms to parse a link doesn't matter at all... from the single-link point of view.
Now look at pages with hundreds of links.
And now consider that this config has hundreds of filters, so that these micro-micro delays *will* become perceptible.

Here is that filter in its accurate, very slow incarnation:
Code:
[Patterns]
Name = "<a><body>: Block sel. JS Properties     4.01.22 (multi) [sd] (d.1) TEST"
Active = FALSE
Multi = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.a_code.*))"
Bounds = "<(a|body)\s*>"
Limit = 512
Match = "("
"(*\s)\#"
""
"("
"((on[a-z]+)\2=$AV(*)|href=$AV( (javascript)\2:*))"
"&&"
"("
"\#(.$LST(JSProperties))\3([^a-z.]|.[a-z])\#"
"($TST(volat=*.log:2.*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB JS_Prop_\2 \t\3 \t\u)|)"
")+{1,*}\#"
")"
""
")+{1,*}\#"
Replace = "\@"
I didn't play with it a lot, maybe you can get it faster. It would be okay if it is slower than the old one for true matches, but not for failing (or buffer dumping).


I've added "dmp" to those filters with always-match routines. And added this to Abbreviations.txt:
Quote:dmp: This filter may match even though it doesn't change anything,
either to prevent slow-downs caused by late failing, or to
protect certain code from being matched by other filters.

Affected filters:
<*>: Tag Manager
Protect Textareas II - Apply
JS CSS Protect: Comments II - Apply
JS CSS Protect: Comments III - Other Types
<a><body>: Block sel. JS Properties

There were two or three others like the last one, but i don't remember which. If you come across them, please drop a note.

sidki


- z12 - May. 17, 2005 02:15 PM

I played around with it some, but couldn't get one that failed as fast. I understand your speed concern, as for most links, this filter "fails" (but quickly Smile! ).

Just trying to help,
Mike


- sidki3003 - May. 17, 2005 02:25 PM

I know. Smile! I've edited above test filter, got at least rid of the global var. Maybe a good quick test before the longish one would do.

sidki


- sidki3003 - May. 17, 2005 05:28 PM

Got it! Big Teeth (At least i hope so.)

I's a rather wild construct, but hey, it works: On true hit look back and grab attribute. Note that \2 and \3 only return the right value on the spot where the log line is.
Code:
[Patterns]
Name = "<a><body>: Block sel. JS Properties     5.05.17 (dmp multi) [sd] (d.1) WIP7"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.a_code.*))"
Bounds = "<(a|body)\s*>"
Limit = 512
Match = "(*\s((on[a-z]+)\2=|href="+ (javascript)\2:))\#"
"("
"\#(.$LST(JSProperties))\3([^a-z.]|.[a-z])\#"
"&&"
"(*\s((on[a-z]+)\2=|href="+ (javascript)\2:))+"
"($TST(volat=*.log:2.*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB JS_Prop_\2 \t\3 \t\u)|)*"
")+\#"
"&(^$TST(script=*)|$TST(comment=1))"
Replace = "\@"

Gotta run,
sidki


- z12 - May. 18, 2005 10:15 AM

One feature your original filter had that I overlooked, is it can fix multiple attributes on a single match.

Code:
<a onMouseOut="cancel.referrer.Preview()" onMouseOver="show.referrer.Preview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

Given this, capturing the attribute name is not important as you pointed out. Also, capturing the original property name doesn't seem like it matters either, since it is actually part of the replacement text. Perhaps all you need to log is the fact that the filter matched.

It seems that I sent you on a wild goose chase, as there is nothing "wrong" with the original filter. I just didn't fully understand it. Sorry about that.

Mike


- sidki3003 - May. 18, 2005 11:18 AM

You didn't - all fine. Smile!

I did test above filter with both, filter-worthy properties in multiple attributes, and multiple filter-worthy properties in one attribute. Your test-string works for me, sure you've picked the right version (WIP7)?

You can't test for the right \2 and \3 in the replacement match, because they are reassigned on the fly while the loop continues. The log line- which grabs them right after they got their new values - should work correctly, no?

sidki


- z12 - May. 18, 2005 12:36 PM

I just tried the new filter again, no problems, it works good.

I guess I need more coffee. Smile!

Mike


- sidki3003 - May. 18, 2005 12:50 PM

Phew - glad to hear that! Was quite an effort yesterday to get that darn goose into the pot. But it's always fun as well. *lol*

sidki