The Un-Official Proxomitron Forum
sidki's config set: 2005-06-09 - Printable Version

+- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums)
+-- Forum: Proxomitron Config Sets (/forumdisplay.php?fid=43)
+--- Forum: Sidki (/forumdisplay.php?fid=44)
+--- Thread: sidki's config set: 2005-06-09 (/showthread.php?tid=358)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20


- z12 - May. 04, 2005 09:22 PM

Hi sidki

Well, after you mentioned you concern about matching speed with $LST(Count), I decided to look into it a bit more. After playing around a while, I came up with a modified version that I submit for your approval. Smile!

Currently, it's two lists, but I suppose it could be made into one, if its worthy.

On my machine, an 850Mhz laptop, the test filter you posted for the modifed $LST(Count) takes about 2.5ms. With the new lists, it takes about 1.8ms. Not a huge improvement, but I thought you might be interested.

Here is your test filter modified for the new lists:

Code:
[Patterns]
Name = "<div>: Check Tag Nesting - Mike's New Lists"
Active = FALSE
Multi = TRUE
Limit = 8
Match = "<("
"div"
"($TST(($GET(n)+)=$LST(CTU))|)$SET(n=$GET(i))"
"|"
"/div"
"($TST(($GET(n)-)=$LST(CTD))|)$SET(n=$GET(i))"
"|eof$SET(2=-$GET(n))"
")\1>"
Replace = "<\1\2>"

I'll attach the lists, since you mentioned that the forum display was causing grief.

As before, the list extension is txt fot the upload.

Mike


- z12 - May. 04, 2005 09:25 PM

Ok, I guess one attachment per post.

Heres the other list.


- sidki3003 - May. 05, 2005 12:30 AM

Wow - Impressive! This makes a lot of a difference.

Yes, a single list seems to be better.
Combining the non-hashable entries helps ( \0(0\+...|1\+...|...) ).
$TST embracing parens don't work in replacement matches, so "$TST(($GET(n)+)=$LST(CTU)|*)$SET(n=$GET(i))".
Single +/- may need to come back b/c of unassigned globals.

I'll be back with that once i've figured out your recursive list call replacement. Big Teeth


edit: btw, i just got Jesse's JS Shell 1.3 ready for Prox - previous was broken for Firefox 1.03 - autocomplete is broken now for IE (fix under way).

The new invocation (ProxFunctions.js, sorry about the horizontal scroll) is:
Code:
anc + " href=\"javascript:with(window.open('" + prxCfgDir + "shell.html','prx_bookmarklet','width=663,left='+screen.width*.01+',top='+screen.width*.15+',height='+screen.height*.6+',resizable'));[].v\">JavaScript Shell &#8756;</a></div>" +

sidki


- ProxRocks - May. 05, 2005 02:06 AM

horizontal scroll? what horizontal scroll?

must be the 1280 x 1024 display resolution 'cause there's no scroll here... lol... (sorry, couldn't resist...)


- sidki3003 - May. 05, 2005 08:27 PM

Nevermind the "Single +/- may need to come back" comment, i didn't see the bottom entry.

I've fixed zero-swallowing (e.g. when counting down from 110), which takes away some of the speed gain, but it's still a noticable improvement.

I'm not sure about the zero-padding, strip it or not. Mona thinks that someone may need it, and one can strip it from within the filter if so desired. So i left it out of the list for now, i'll bring it back in if this prooves to be better. Smile!

sidki


- z12 - May. 08, 2005 12:46 AM

Hi sidki

Very nice!

I had been working on a single list, but this is faster than any version that I had come up with.

I still think the two list aprroach would be fastest, but I'm sure there are other considerations to take into account. Even so, this list is .5 ms faster on my machine.

In the end, I'm happy that its not filling the log window and the speed gain is nice bonus.

Now that this list straightend out, I've noticed there are a couple of other list that are also a bit chatty. Smile!

A few that come to mind are the url-parser, url-expander & Ascii table. Looking over the lists briefly, I'm not sure that there is an advantage to seeing them match in the log window...the Ascii table maybe. Time permitting, I'm going to look into these lists a bit closer.

Anyway, thanks for listening to my "gripe" and coming up with a nice solution.

Mike


- z12 - May. 08, 2005 10:05 AM

Hi sidki

Url-Parser.ptxt (list)

Since this list always matches, theres no need to log it.

I made the following changes:

added the following at the bottom of the list:

Code:
~?

modified the Match expression in the following filter:

Code:
Name = "Set Emergency Flags if Out Headers are bypassed     5.01.14 [sd] (d.r)"
Active = TRUE
URL = "^$TST(uHost=*)|$TST(keyword=*)"
Limit = 1
Match = "$STOP()"
"((^$TST(keyword=*))$SET(keyword=.)|)"
"((^$TST(flag=*))$SET(flag=.)|)"
"((^$TST(volat=*))$SET(volat=.)|)"
"((^$TST(uHost=*))($LST(URL-Parser)|)$SET(volat=$GET(volat)headers:1.)|)"
"PrxFail$TST()"

Things seem to be alright since the referer filter is working ok.

Mike


- z12 - May. 08, 2005 10:12 AM

Hi sidki

SpecialUAs.ptxt (list)

On every request I've been geting a Match on Line 62 of this list. Since the User-Agent header filter is checked, theres no need to log it.

added the following at bottom of list:

Code:
~?

modified the Match expression in the following filter:

Code:
In = FALSE
Out = TRUE
Key = "User-Agent: Handle specified UAs - Set Var      4.09.01 [sd] (d.r) (Out)"
Match = "\0&$SET(hOrigUA=\0)&((^$TST(keyword=*.(a_headers|a_ua).*))($LST(SpecialUAs)|)|$SET(1=\0))"
Replace = "\1"


Things seem to be alright since the User-Agent Header filter is working ok.

Mike


- z12 - May. 08, 2005 10:15 AM

Hi sidki

The changes to the Count, URL-Parser & SpecialUAs lists, have made a huge difference in the log window. I can now see more relevant matches that were previously "lost" due to scrolling of the log window. Smile!

Mike


- z12 - May. 08, 2005 11:52 AM

Hi sidki

"JS CSS Protect: Comments" filters (I,II,III)
"<script><style> Remove: Comments"

I'm thinking of unchecking these filters, but since theres an (.r) in the Name Field, I'm not sure what the consequences are, if any.

Basically, I'm rather neutral about comments. If the comment is in code that is being replaced, I can always see what was there via debug. If the comment is in code that is not being replaced, I would prefer to leave the comment in. From my point of view, it's just one less thing to spend time on trying to match.

Any comments Smile! you have on this would be appreciated.
Mike


- z12 - May. 08, 2005 01:16 PM

Hi sidki

AdComments.ptxt (list)

Page layout messed up at news.com.com. The culprit is a div tag that is not closed within the match.

I made the following brute force hack to list AdComments:

modified this:
Code:
# CNET Networks
- MAC ad - *- (MAC)\1 \[ ?++{0,130}-- >        $SET(2=CNET )

to this:

Code:
# CNET Networks
(- MAC ad - *-*&&(^*<div id='leftcol'>)*)<!--+ (MAC)\1 T [^<>]++{0,40}-->    $SET(2=CNET )

Probably not the best fix, but it works.

On a related note, I've had problems removing large blocks of code due to tags that are not properly closed. Sometimes the starting tag is before the code matches, other times, the closing tag is after the match. I'm wondering if you have found a solution to this problem.

Mike


- sidki3003 - May. 08, 2005 03:01 PM

Hi Mike, Wow!

Going over it one by one, didn't actually test any of your changes yet, will do so later.

URL Parser: Agreed, no output necessary. So the "! : URL Parser" header filter still works without modifications? Well, i guess i'll find out *lol*.

SpecialUAs: Yes, agreed, but i did major changes in this list since the release (just one non-hashable entry left). Gonna see if that makes a difference. Although i don't think so.

Quote:The changes to the Count, URL-Parser & SpecialUAs lists, have made a huge difference in the log window. I can now see more relevant matches that were previously "lost" due to scrolling of the log window. Smile!
Cool! If it's getting too quiet there, try turning on the "Use Debug Mode" header switch. Wink

JS/CSS comment removers/protectors: I really wouldn't deactivate those. All JS enabled filters would scan commented blocks, and you'd see funky side effects sometimes. Protecting is very fast, so you'd get a netto speed-loss. Also, deactivating the protector would break the JS sniffer. There were other reasons, which i don't recall right now. I didn't find a quick and simple way to protect comments in inline scripts, hence i remove them instead.

AdComments: I rewrote this list from scratch (next release), just one non-hashable entry left, down from 5.5ms to 4.3ms (for an everage 30k page on a PIII 600). Big Teeth I don't know if it's fixed already, can you give me an example URL of a broken page? Couldn't find one.

Quote:On a related note, I've had problems removing large blocks of code due to tags that are not properly closed. Sometimes the starting tag is before the code matches, other times, the closing tag is after the match. I'm wondering if you have found a solution to this problem.
That depends on the actual task. I've a work-around for a few of such cases. One is: Fall back to hide tag if block is dangerously large. It's used in "[iframe]... Remove: Ad Containers I - Ad IDs".
Do you have an example link, along with a note about what tag block you'd like to remove?


Thanks for the input!
sidki


- sidki3003 - May. 08, 2005 04:32 PM

Ad Comments at news.com.com: D'oh! Now i see it, right on the front page.
Since it's a very specific fix anyway, how about...
Code:
- MAC ad -(^- > <div id="leftcol")*- MAC \[ ?++{0,130}-- >    $SET(2=CNET MAC)
...which would preserve hashing of this item, or does it miss some other news.com.com page?

---------
edit: Looks like both, URL-Parser and SpecialUAs work fine with that "return false" bit. Smile!

sidki


- z12 - May. 08, 2005 07:56 PM

Hi sidki

Your fix for cnet looks ok to me, I've used a fix similar to what I posted earlier and had no ill effects. I think its an oversight on their part that the div is in there to begin with.

The remark I made about removing large chunks of code was just in general, I have nothing particular in mind. What made me wonder about it was the cnet div problem and the Count list. I haven't had the time yet to investigate how your using the Count list in detail, but I wondered if you weren't using it for checking tag closure. I based this on the code you posted for testing the Count List. It seems like a very interesting & useful method.

Quote:Cool! If it's getting too quiet there, try turning on the "Use Debug Mode" header switch.

lol, no way am I ready for that yet. Actually, I was hoping to disable the js css comment filters to eliminate even more matches from the log window. I really do run with my log window open all the time, just to keep an eye on things. The first time I used your config, I was a bit overwhelmed, as I am used to my config which tends to be quiet. I wish proxo had the ability to select which filters would show in the log window, but oh well.

As far as protecting remarks in js css, those filters caught my eye since I was getting a lot of matches for a filter that, at first glance, looks like it doesn't do anything. I understand it's to prevent other filters from acting when you don't want them to, but I couldn't help but wonder if the same technique used for marking html tags couldn't be used here as well. The main advantage would be a quiet log since the latter tag matching method is designed to fail.

On a unrelated topic, I've added a few extra caching filters to the config. Basically they are filters I've been running for some time, modified for your config. They seem to be working well, but I'm not 100% sure that I've got the right switches in the url match for enabling/disabling the filter based on the cache control method that is selected early in the config. Anyway, I was thinking of posting them to have you look them over to see what you thought.

Your config seems to be working very well for me. I must admit that I think I spend more time looking at the filters than I do surfing.

Mike


- sidki3003 - May. 08, 2005 09:13 PM

z12 Wrote:The remark I made about removing large chunks of code was just in general, I have nothing particular in mind. What made me wonder about it was the cnet div problem and the Count list. I haven't had the time yet to investigate how your using the Count list in detail, but I wondered if you weren't using it for checking tag closure. I based this on the code you posted for testing the Count List. It seems like a very interesting & useful method.
Yep, "Mark: Various: HTML" is watching the nesting level of table/font/div.
"Bottom Mark: Start - Close open Tags" is closing open ones, right before the Prox bottom insertions.
But note that block targeting filters encompass the entire block - way before the marker filter can see if there is a matching closing tag or not.

Also, what makes the block-targeting $NEST/$INEST filters break things at times are structures like:
Code:
<td id=bad_ad>foobar</div id=this_one is_also_closing_the_td_block_in_graceful_browsers>
lots of code here</td>
If you just hide the tag (which has other disadvantages), you let the browser decide when it considers a tag block as closed.

Quote:As far as protecting remarks in js css, those filters caught my eye since I was getting a lot of matches for a filter that, at first glance, looks like it doesn't do anything. I understand it's to prevent other filters from acting when you don't want them to, but I couldn't help but wonder if the same technique used for marking html tags couldn't be used here as well. The main advantage would be a quiet log since the latter tag matching method is designed to fail.
I think that would be possible. If you come up with a filter, i'll gladly test it. Smile!

This won't work for inline scripts tho. More important, i think we are hitting a general point here: simplicity of code versus speed and redundancy: Often it's the opposite of what intuition would say, simple filters being slower, and complex ones being faster. Personally, i'm still favoring speed over non-redundancy and quiescence, although i'd love to come back to latter, if justifiable.

Quote:On a unrelated topic, I've added a few extra caching filters to the config. Basically they are filters I've been running for some time, modified for your config. They seem to be working well, but I'm not 100% sure that I've got the right switches in the url match for enabling/disabling the filter based on the cache control method that is selected early in the config. Anyway, I was thinking of posting them to have you look them over to see what you thought.
Sure! But i'd highly appreciate if you'd add some comments to them. I saw your inserted and then removed Prox ETag at CC, and was completely clueless about its function!

Quote:I must admit that I think I spend more time looking at the filters than I do surfing.
*lol* Tell me about it!

sidki