Threaded Mode | Linear Mode

***sidki3003*** · Jul. 19, 2009, 10:48 AM

Here is another draft. Chapter 10 for Techniques.txt.

Quote:10a Loops -- Limiting expression scopes.
You can use "+" loops to isolate subexpressions, removing their
capatibility to look ahead.

Example:
Say we want to match <foo ... >, but only if the following tag isn't </foo >

<foo*>(^*</foo >)
... wouldn't work, because "*>" doesn't stop at the first match but is looking
ahead.

<foo[^>]+>(^[^<]+</foo >)
... would work, but [^...] forces inspection of each character.

<foo(*>)+{1}(^(*<)+{1}/foo >)
... does what we want, quickly. "*>", "*<" are not looking ahead anymore.

10b Avoiding superfluous tests in OR conditions.

Example:
Say we want to match "prefix-possible_suffix ... some_string" and capture
"-possible_suffix" if present.

prefix(-possible_suffix|)\1*some_string
... would cause the filter attempting twice to match:
"prefix-possible_suffix ... no_match"

prefix((-possible_suffix)+)\1*some_string
... does what we want.

10c However, +/++ loops remove the uniqueness of the string under test, even if
followed by {1,*}.
If possible, and if you aren't just testing the very beginning of a
document or bounds match, try to start your test string with at least one
unique character (better more),

Example:
To test for 100 asterisk symbols anywhere in a document:
\*\*\*+{98}

Suggestions welcome.