Post Reply 
How to halt the match after the first instance
Aug. 09, 2019, 02:59 AM
Post: #1
How to halt the match after the first instance
.... without using the byte limit that is.

If the filter is something like:

Code:
<li class="user">*(jason|julia|jeff)*
</div>
</li>

and it encounters code like this

Code:
<li class="user">
<div title="erika"
<img src="pic.jpg">
</div>
</li>

<li class="user">
<div title="jason"
<img src="pic.jpg">
</div>
</li>

The match will include both <li> elements instead of just the second one. Is there a way to make it stop searching after the first "</div></li>" so that it will leave the first element alone, but match/replace the second, then continue doing the same for the rest of the page? Assume that other <li> elements exist which are longer, requiring a byte limit bigger than the examples above.
Add Thank You Quote this message in a reply
Aug. 09, 2019, 06:31 PM
Post: #2
RE: How to halt the match after the first instance
Code:
[Patterns]
Name = "Use Bounds"
Active = FALSE
Bounds = "<li class="user">*</div>  </li>"
Limit = 256
Match = "*(jason|julia|jeff)*"

[Patterns]
Name = "Use &&"
Active = FALSE
Limit = 256
Match = "<li class="user">*</div>  </li>"
        "&&"
        "*(jason|julia|jeff)*"

[Patterns]
Name = "Use Different Wild Card"
Active = FALSE
Limit = 256
Match = "<li class="user">[^/]++(jason|julia|jeff)*"
        "</div> "
        "</li>"

[Patterns]
Name = "Target Title Attribute"
Active = FALSE
Limit = 256
Match = "<li class="user"> "
        "<div title=$AV(jason|julia|jeff)"
        "*"
        "</div> "
        "</li>"
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to JJoe for this post:
zoltan
Aug. 10, 2019, 02:10 AM
Post: #3
RE: How to halt the match after the first instance
I wondered with a bit of pessimism, "Is there really a method?" And well... there's four. Fantastic.
Bounds should have been obvious, but I had never really used it.
Now that it's understood, I'm thinking that in the case of many names it might be better to match with a list. Is it possible with Bounds?
So far I'm not having much luck with the instructions on the Blocklist Creation page. This...

Code:
Name = "block with list"
Active = TRUE
Bounds = "<li class="user">*</div> </li> "
Limit = 2100
Match = "<li class*($LST(C-blocklist))* </div> </li>"

combined with a List of "jason|julia|jeff|etc" isn't matching anything. The match seems to be redundant with the bounds, but just putting "$LST(C-blocklist)" in the Match box didn't work either.
Add Thank You Quote this message in a reply
Aug. 10, 2019, 04:52 AM (This post was last modified: Aug. 10, 2019 04:54 AM by JJoe.)
Post: #4
RE: How to halt the match after the first instance
(Aug. 10, 2019 02:10 AM)zoltan Wrote:  This...isn't matching anything.

Works for me. Perhaps an error registering the list?

Regardless I see two problems.
1. Your filter will remove 'jason' AND 'jason lee'.
2. Using a wildcard before calling a list slows things down.

Start with something like:
(Add C-blocklist.txt to Proxomitron's 'Lists' folder and merge Blocklists and Patterns code from clipboard)
Code:
[Blocklists]
List.C-blocklist = "..\Lists\C-blocklist.txt"

[Patterns]
Name = "block with list"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "<li class="user">*</div> </li> "
Limit = 2100
Match = "*title=$AV($LST(C-blocklist))*"

So, the filter only calls the list after it finds 'title='.
Then, a list entry must consume the title attribute for the filter to match.

C-blocklist.txt contains:

Code:
jason
julia
jeff

This list format should be quicker than (jason|julia|jeff|etc).
But, if the list also contains 'j', the Proxomitron may find a list match with 'j', return to the expression, fail to consume the attribute (if jason, julia, or jeff) and stop looking.
So, you will need to make the entries unambiguous or force their order.

Code:
j(^?)
jason
julia
jeff

or

Code:
jason|julia|jeff|j


Tip: Save the cfg to another name so you can 'go back' easily.
Add Thank You Quote this message in a reply
[-] The following 1 user says Thank You to JJoe for this post:
zoltan
Aug. 10, 2019, 07:24 PM (This post was last modified: Aug. 10, 2019 07:26 PM by zoltan.)
Post: #5
RE: How to halt the match after the first instance
Thanks for the examples and refinements.
It took a while but everything seems to be working. It seems you were right about a list error. No code combo would match, so I finally just recreated and named the list, this time from Notepad instead of Metapad, and suddenly it matched.

I did find one oddity. For testing, I pasted in a big list of hundreds of words. When separating them using "|" the list matched all code within the bounds, even if it contained nothing in the list. I found that this started occurring when the list got above aproximately 4400 characters. But when using the line-based list, so far there's no limit.

For the "j" problem, I'm assuming you just meant an instance of a single "j" not the "j" that's part of a name. That probably wouldn't be an issue, but there's also the case where "mary" or "anne" would match "maryanne." The "j" solution should cover that too, right? But I also remember from Exceptions-U that order matters, so if "maryanne" is first, then "anne" and "mary" could come afterword and be treated separately.
Add Thank You Quote this message in a reply
Aug. 11, 2019, 05:30 AM (This post was last modified: Aug. 11, 2019 05:32 AM by JJoe.)
Post: #6
RE: How to halt the match after the first instance
(Aug. 10, 2019 07:24 PM)zoltan Wrote:  but there's also the case where "mary" or "anne" would match "maryanne." The "j" solution should cover that too, right?

"anne" should be ok. "mary" may need (^?).

(Aug. 10, 2019 07:24 PM)zoltan Wrote:  But I also remember from Exceptions-U that order matters, so if "maryanne" is first, then "anne" and "mary" could come afterword and be treated separately.

By default, to speed up matching, the Proxomitron reorders the lists.
http://local.ptron/.pinfo/lists/Exceptions-U tells me the entry on line 247 of Exceptions-U.ptxt will be scanned after 1171.
So, if "maryanne" precedes "mary" in 'C-blocklist.txt' and matches more frequently, it may also precede "mary" in The Proxomitron's hashed list. Then, 'mary' should not be a problem.

You can disable the reordering of a list by adding the "NoHash" keyword to the list. This slows matching but has its uses.

SRL Wrote:On the inside...
To get an "inside" look at what's going on with your blocklists - including how they're being hashed, how often each item is checked, and how often each item matches, Proxomitron now includes a special information URL...
http://local.ptron/.pinfo/lists/
Here you'll find a table of all loaded lists, their filenames, number of items they contain, and the number of items that have been prefix or URL hashed. Clicking on a list's name will bring up a detaied breakdown of each entry it contains. This can come in very useful when trying to make the most efficient use of your lists.
Add Thank You Quote this message in a reply
Aug. 12, 2019, 02:24 AM
Post: #7
RE: How to halt the match after the first instance
Didn't know or forgot about hashing, but it explains a few instances where results seemed to defy the order. I'll probably add "NoHash" just so I can count on it.
Appreciate the reminder about pinfo/lists/ I have a vague memory of the main page, but didn't realize it linked to stats for each list where the hits/matches of each item were counted.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: