The Un-Official Proxomitron Forum

Full Version: Perl-style regular expressions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.

Guest

Hello kuruden,

I've heard that you are developing a program similar to Proxomitron. Do you think it is possible to make Proximodo compatible with Perl-style regular expressions instead of the non-standard regular expressions that Proxomitron uses? I have made filters for prox in the past, but the prox syntax is difficult to remember when you are used to the regular expression syntax that most programming languages use.
kuruden;
By Guest1216 Wrote:Could you please consider having Proxi's regexp syntax follow the PERL conventions, instead of Proxo's proprietary syntax?

Here, here! What that man said!! Hail Hail Hail

I had forgotten all about this one, obviously I've been leading a sheltered life for too long. [angry]

As an addendum to this suggestion......

I realize that a lot of code has already been developed that would have to be highly modified in order to recognize two versions of syntax. This would also cause code bloat, make debugging much more difficult, and possibly slow down the overall execution speed a tiny bit. So, why not put in separate regexp modules for each of the two versions? That way, the user can choose his own poison at compile time, and everybody is happy.

Guest1216, for this neat suggestion, you have just won a free membership here at The UOPF!! :P Sign up today for your membership privileges, and remember, it's free! [lol]


Oddysey
That's acceptable, if proposed as an option at pattern-level (using a convention such as ~/perlregexp/ ). It would be dangerous at app-level (all default filters and public filter packs would stop working), and at filter-level it would be confusing (users would not know which fields are concerned). Placing checkboxes near every field would mess the screen.

But Perl regexps are MUCH less powerful: no "&", no commands ("$" is used for EOL anyway), only () can store text, etc. You are losing all the magic of Proxomitron. Of couse, one could extend the new (optional) perl syntax (e.g by adding "&", "\9" and a new notation for commands), but it would not be standard perl regexp anymore. Blending perl standard and proxo standard (that's what it is, now!) seems a bad idea: people would complain to have yet another fancy syntax on the loose...

Guest

Perl-compatible regular expressions with a few necessary additions is imho preferable to the Proxomitron syntax, but thank you for your reply. Good luck with your project.
Screw Perl-style... To put it bluntly...
ProxRocks Wrote:Screw Perl-style... To put it bluntly...
ProxRocks, I cannot agree more.
Guest1216 Wrote:Hello kuruden,

I've heard that you are developing a program similar to Proxomitron. Do you think it is possible to make Proximodo compatible with Perl-style regular expressions instead of the non-standard regular expressions that Proxomitron uses? I have made filters for prox in the past, but the prox syntax is difficult to remember when you are used to the regular expression syntax that most programming languages use.
I have to admit that this was the first thing I wondered when I encountered Proxo filters for the first time! As a long-time Perl and unix user, regexp was what I was used to using for string matching problems. Obviously some of the Proxo commands such as $NEST and $INEST allow for more concise and understandable expressions (vs having to construct the equivalent in regexp) because they incorporate knowledge of an XML-like language.

As kuruden says, you'd have to extend regexp to incorporate most of the proxo commands, so you couldn't simply drop in an off-the-shelf regexp implementation. Unless there was a way to separate out the proxo commands from the pattern matching.
Guest1216, let me guess why Proxo syntax was made so different from Perl. I don't think it was made out of lazyness or ignorantness or stupidity, on the contrary!

I think Scott did not use Perl syntax in the first place because it is not well fitted to matching urls and html. What looks best, Proxo syntax:
http://www.yahoo.com/*.jpg or Perl syntax: www\.yahoo\.com/.*\.jpg ?

So Scott would keep dot as a normal character. Needing something for "any char", he chose the Windows wildcard "?". Here, Proxo was a Windows-only app, so he tuned it to the tastes of Windows users. Remember, many of them never heard of Unix or Linux or Perl!

There are so many ".*" in filters (an average of 7 per filter!) that he must have decided to condense it to the single character "*" (again, the Windows wildcard), to make one's life easier. That was a good choice.

Not having "*" for "0+ times", he changed the meaning of "+" from 1+ to 0+. That's because 0+ is much more useful than 1+. Have a look at existing filters, you hardly find a "+{1,". Again, that sounds a good move.

About braces, I understand he imposed the "+", because he created the "++" and one must specify if it is + or ++ that is constrained. Moreover, matching CSS styles or JavaScript blocs is simpler if you don't have to escape braces. I wouldn't have forced the star for "unbounded" though, just "{," or ",}" seemed ok to me. I'll let Proximodo users omit the star.

In html, beginning and end of string are meaningless (crlf is equivalent to space). It is not even useful in urls, since we only match at position 0, and text could be appended to a URL (#anchor for example). So Scott recycled ^ and $ for "not" in expressions (just like in []) and for commands.

For improving performance, he did not let () record data. So the user specifies which parentheses are to be recorded (using "()\0-9").

Scott kept the following notations: () [] [^] | \ \t \r \n \s
But, he didn't keep the following ones, here I don't understand why:
\xhh \b \B \S \d \D
I intend to add them to Proximodo ;-)

To sum up, differences from Perl regexps are justified by user-friendlyness. Scott made so much tweaking for better html matching (think about &, &&, ++, =, [#], (^), \u, \9 and \#, ' and ", \w that excludes >, commands) that, had they been extensions to perl regexp, patterns would nevertheless not look like perl regexp anymore. I think users should consider Proxo syntax as a new language, and forgive Scott for not keeping the meaning of ".*+?". I defy anyone to prove me wrong!
One more thing: I know some of you guys are used to writing perl regexps. But I doubt you are a majority. Most users never did and never will, so why constrain syntax to this particular language? Scott was right designing his syntax with html-matching in mind.

And last but not least: it was NOT possible at all to use any regexp engine in Proxo! Regexps are Non-Deterministic Finite Automata. But it is impossible to represent &, &&, \1, ', + (blind) and commands using NFA graphs. That's why Prox engines have to be depth-first search engines.
kuruden;

I have run out of arguments with which to persuade you, so I'll stop now. It's your baby, go for it! Hail


Oddysey
Reference URL's