Post Reply 
Perl-style regular expressions
Dec. 16, 2004, 12:19 PM
Post: #1
 
Hello kuruden,

I've heard that you are developing a program similar to Proxomitron. Do you think it is possible to make Proximodo compatible with Perl-style regular expressions instead of the non-standard regular expressions that Proxomitron uses? I have made filters for prox in the past, but the prox syntax is difficult to remember when you are used to the regular expression syntax that most programming languages use.
Quote this message in a reply
Dec. 17, 2004, 05:02 AM
Post: #2
 
kuruden;
By Guest1216 Wrote:Could you please consider having Proxi's regexp syntax follow the PERL conventions, instead of Proxo's proprietary syntax?

Here, here! What that man said!! Hail Hail Hail

I had forgotten all about this one, obviously I've been leading a sheltered life for too long. [angry]

As an addendum to this suggestion......

I realize that a lot of code has already been developed that would have to be highly modified in order to recognize two versions of syntax. This would also cause code bloat, make debugging much more difficult, and possibly slow down the overall execution speed a tiny bit. So, why not put in separate regexp modules for each of the two versions? That way, the user can choose his own poison at compile time, and everybody is happy.

Guest1216, for this neat suggestion, you have just won a free membership here at The UOPF!! :P Sign up today for your membership privileges, and remember, it's free! [lol]


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Dec. 17, 2004, 05:55 AM
Post: #3
 
That's acceptable, if proposed as an option at pattern-level (using a convention such as ~/perlregexp/ ). It would be dangerous at app-level (all default filters and public filter packs would stop working), and at filter-level it would be confusing (users would not know which fields are concerned). Placing checkboxes near every field would mess the screen.

But Perl regexps are MUCH less powerful: no "&", no commands ("$" is used for EOL anyway), only () can store text, etc. You are losing all the magic of Proxomitron. Of couse, one could extend the new (optional) perl syntax (e.g by adding "&", "\9" and a new notation for commands), but it would not be standard perl regexp anymore. Blending perl standard and proxo standard (that's what it is, now!) seems a bad idea: people would complain to have yet another fancy syntax on the loose...
Visit this user's website
Add Thank You Quote this message in a reply
Dec. 17, 2004, 04:18 PM
Post: #4
 
Perl-compatible regular expressions with a few necessary additions is imho preferable to the Proxomitron syntax, but thank you for your reply. Good luck with your project.
Quote this message in a reply
Dec. 17, 2004, 05:10 PM
Post: #5
 
Screw Perl-style... To put it bluntly...
Add Thank You Quote this message in a reply
Dec. 17, 2004, 05:44 PM
Post: #6
 
ProxRocks Wrote:Screw Perl-style... To put it bluntly...
ProxRocks, I cannot agree more.
Add Thank You Quote this message in a reply
Dec. 17, 2004, 06:18 PM
Post: #7
 
Guest1216 Wrote:Hello kuruden,

I've heard that you are developing a program similar to Proxomitron. Do you think it is possible to make Proximodo compatible with Perl-style regular expressions instead of the non-standard regular expressions that Proxomitron uses? I have made filters for prox in the past, but the prox syntax is difficult to remember when you are used to the regular expression syntax that most programming languages use.
I have to admit that this was the first thing I wondered when I encountered Proxo filters for the first time! As a long-time Perl and unix user, regexp was what I was used to using for string matching problems. Obviously some of the Proxo commands such as $NEST and $INEST allow for more concise and understandable expressions (vs having to construct the equivalent in regexp) because they incorporate knowledge of an XML-like language.

As kuruden says, you'd have to extend regexp to incorporate most of the proxo commands, so you couldn't simply drop in an off-the-shelf regexp implementation. Unless there was a way to separate out the proxo commands from the pattern matching.
Add Thank You Quote this message in a reply
Dec. 17, 2004, 07:11 PM
Post: #8
 
Guest1216, let me guess why Proxo syntax was made so different from Perl. I don't think it was made out of lazyness or ignorantness or stupidity, on the contrary!

I think Scott did not use Perl syntax in the first place because it is not well fitted to matching urls and html. What looks best, Proxo syntax:
http://www.yahoo.com/*.jpg or Perl syntax: www\.yahoo\.com/.*\.jpg ?

So Scott would keep dot as a normal character. Needing something for "any char", he chose the Windows wildcard "?". Here, Proxo was a Windows-only app, so he tuned it to the tastes of Windows users. Remember, many of them never heard of Unix or Linux or Perl!

There are so many ".*" in filters (an average of 7 per filter!) that he must have decided to condense it to the single character "*" (again, the Windows wildcard), to make one's life easier. That was a good choice.

Not having "*" for "0+ times", he changed the meaning of "+" from 1+ to 0+. That's because 0+ is much more useful than 1+. Have a look at existing filters, you hardly find a "+{1,". Again, that sounds a good move.

About braces, I understand he imposed the "+", because he created the "++" and one must specify if it is + or ++ that is constrained. Moreover, matching CSS styles or JavaScript blocs is simpler if you don't have to escape braces. I wouldn't have forced the star for "unbounded" though, just "{," or ",}" seemed ok to me. I'll let Proximodo users omit the star.

In html, beginning and end of string are meaningless (crlf is equivalent to space). It is not even useful in urls, since we only match at position 0, and text could be appended to a URL (#anchor for example). So Scott recycled ^ and $ for "not" in expressions (just like in []) and for commands.

For improving performance, he did not let () record data. So the user specifies which parentheses are to be recorded (using "()\0-9").

Scott kept the following notations: () [] [^] | \ \t \r \n \s
But, he didn't keep the following ones, here I don't understand why:
\xhh \b \B \S \d \D
I intend to add them to Proximodo ;-)

To sum up, differences from Perl regexps are justified by user-friendlyness. Scott made so much tweaking for better html matching (think about &, &&, ++, =, [#], (^), \u, \9 and \#, ' and ", \w that excludes >, commands) that, had they been extensions to perl regexp, patterns would nevertheless not look like perl regexp anymore. I think users should consider Proxo syntax as a new language, and forgive Scott for not keeping the meaning of ".*+?". I defy anyone to prove me wrong!
Visit this user's website
Add Thank You Quote this message in a reply
Dec. 17, 2004, 07:22 PM
Post: #9
 
One more thing: I know some of you guys are used to writing perl regexps. But I doubt you are a majority. Most users never did and never will, so why constrain syntax to this particular language? Scott was right designing his syntax with html-matching in mind.

And last but not least: it was NOT possible at all to use any regexp engine in Proxo! Regexps are Non-Deterministic Finite Automata. But it is impossible to represent &, &&, \1, ', + (blind) and commands using NFA graphs. That's why Prox engines have to be depth-first search engines.
Visit this user's website
Add Thank You Quote this message in a reply
Dec. 25, 2004, 03:45 AM
Post: #10
 
kuruden;

I have run out of arguments with which to persuade you, so I'll stop now. It's your baby, go for it! Hail


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: