473,382 Members | 1,078 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Distributions, RE-verb and the like

Psyco is finished now, and it works on the x86, for Win, the new macs,
many linux boxes, etc, and it's quite useful, so maybe it can be added
to the standard Python distribution.

PyChecker (and the other similar ones that work differently) is very
useful too, and it's pure Python, so maybe it too (or something
similar) can be added to the standard distribution.

--------------------

Regular Expressions can be useful but:
- They look really an-pythonic
- Their syntax is difficult to remember
- It's not easy to understand and debug REs written by other people,
comments help a little.
- They can have hidden bugs (because of their low readability)
- they mix their opeators/syntax with the data (this is thier advantage
too), this can create problems, and makes them less general (because
you have to avoid mistaking syntax elements with the data).
- Python has already syntax to define structures, loops, etc, so
another syntax can be seen as a kind of duplication.

Such things go against lot of points of the Python Zen. So I'd like a
more pythonic syntax, easy to remember, easy to read and debug, where
data and operators are fully separated. The "reverb" library does
something like that already:
http://home.earthlink.net/~jasonrandharper/reverb.py

It compiles RE written like this:

xdigit = set(digits, char('a').to('f'), char('A').to('F'))
pat = RE((text('$') | text('0x') | text('0X')) + required(hexdigit, max
= 8) - followedBy(hexdigit))

I have already "improved" reverb a little, but I think a better and
simpler syntax can be invented by people more expert than me in REs.
Here are some alrernative syntax possibilities, I don't like them, most
of them are impossible, silly, stupid, etc, but I am sure a good syntax
can be invented.

Standard RE:
pat.pattern: (?:\$|0x|0X)[\da-gA-G]{1,8}(?![\da-gA-G])

hexdigit = set(digits, chrint('a','f'), chrint('A','F'))
pat = RE((text('$') | text('0x') | text('0X')) + repeated(hexdigit, 1,
8) - followedBy(hexdigit))

hexdigit = alt(digits, chrint('a','f'), chrint('A','F'))
pat = optional("-") + alt('$', '0x', '0X') + times(hexdigit, 1, 8) -
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = optional("-") + VR('$', '0x', '0X') + times(hexdigit, 1, 8)-
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", min=0) + VR('$', '0x', '0X') + VR(hexdigit, min=1, max=8)
- hexdigit

hexdigit = VR( VR(digits) | interval('a', 'f') | interval('A', 'F') )
pat = VR("-", 0) + VR(VR('$') | VR('0x') | VR('0X')) + VR(hexdigit, 1,
8) - hexdigit

hexdigit = Alt(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", 0) + Alt('$', '0x', '0X') + VR(hexdigit, 1, 8) - hexdigit

hexdigit = Alternative(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = Optional("-") + Alternative('$', '0x', '0X') + RE(Hexdigit, 1, 8)
- Hexdigit

hexdigit = Alternative(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-", 0) + Alternative('$', '0x', '0X') + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE([digits, Interval('a', 'f'), Interval('A', 'F')]) #
flatten sul primo parametro
pat = RE("-", 0) + RE(['$', '0x', '0X']) + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-").repeat(0) + RE('$', '0x', '0X') + Hexdigit.repeat(1, 8) -
Hexdigit

hexdigit = VRE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = VRE("-").repeat(0) + VRE('$', '0x', '0X') + Hexdigit.repeat(1, 8)
- Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").repeat(0) + Vre('$', '0x', '0X') + Hexdigit.repeat(1,
8) - Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional() + Vre('$', '0x', '0X') +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Optional("-") + Vre('$', '0x', '0X') + Repeat(Hexdigit, 1, 8)
- Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional() + Vre('$', '0x', '0X') +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Vre(Vre().digits, Interval('a', 'f')).ignorecase()
hexnum = Vre("-").optional() + Vre('$', '0x').ignorecase() +
Hexdigit.repeat(1, 8) - Hexdigit

hexdigit = Alternative(Digits, Interval('a', 'f')).ignorecase()
hexnum = Optional("-") + Alternative('$', '0x').ignorecase() +
Repeat(Hexdigit, 1, 8) - Hexdigit
I think that once the best syntax is found, implementing a better
reverb-like module isn't too much work (my modified version of reverb
is only about 130 LOCs).

Bye,
bearophile

Dec 29 '05 #1
3 1298
Bearophile -

Well, I fear this may end up being another of those "easier to
reinvent" wheels. All of your issues with RE's are the same ones I had
with lex/yacc (and re's too) when I wrote pyparsing.

Any chance for convergence here?

(BTW, there is nothing inherently wrong with "reinventing wheels". The
metaphor is a bit flawed, since there are many different types of
wheels in the world, and not all interchangeable - consider a tractor
wheel vs. a bicycle wheel. Some legitimate/valid/justified endeavors
are wrongly indicted for "reinventing the wheel" when in fact they are
focusing on a particular niche of wheeldom, deserving of its own
specialized invention.)

-- Paul

Dec 29 '05 #2
Oh, the pyparsing rendition of your initial pat expression would be
something like:

import pyparsing as pp
pat = pp.Combine( pp.oneOf("$ 0x 0X") + pp.Word(pp.hexnums,max=8) )

Combine is needed to ensure that the leading $, 0x, or 0X is
immediately followed by 1-8 (and no more than 8) hex digits.
Otherwise, pyparsing is pretty tolerant of whitespace cropping up
wherever.

As for some of your other syntaxes:

I'm not sure what "Vre" means.

I found that "Alternative" needs to support both greedy and non-greedy
matches, so I provided Or and MatchFirst, respectively. They are also
definable using '^' and '|' operators, again respectively. Finally, I
ran into Literal("this") | Literal("that") | Literal("other") so often
that I just added a helper method oneOf that would take the string
"this that other" and build the right expression out of it. This too
is non-trivial, as you have to take care that some short literals may
mask longer ones in the list, as in oneOf("< = > <= >= !="). Just
replacing this directly with Literal("<") | Literal("=") | ... would
prevent any matching of the ">=" or "<=" literals. You could replace
with the Or (^) form, but this exhaustively checks all alternatives all
the time, a regrettable run-time performance penalty. Pyparsing's
implementation of oneOf leaves the literals in the given order, unless
a duplicate is given, or an earlier literal masks a later one - in that
case, the longer literal is moved ahead of the shorter.

I implemented Optional as a wrapper-type class, as opposed to the
..optional() method that you have given - I'd say there are tradeoffs
either way, just making the comparison.

Your "repeated" or "times" seem to map roughly to pyparsing's OneOrMore
and ZeroOrMore.

Any thought how a recursive grammar might look?

I don't find 'Interval' to be very easy on the eyes. In this case, I
stole^H^H^H^H^H borrowed the re form of "[A-Za-z0-9]", providing a
method named srange ("s" is for "string") such that srange("a-fA-F")
would return the string "abcdefABCDEF".

The other end of this process has to do with how the calling program
will process the parsed results. Once a grammar gets too deeply
nested, or has too many Optional elements, just returning a simple list
or nested list of tokens isn't enough. Pyparsing returns ParseResults
objects, which can be accessed as a list, dictionary, or object with
attributes (provided individual fields have been given names at grammar
definition time). I *have* had some complaints about ParseResults
("ParseResults are evil"), but the named access is a life-saver for
complex grammars. (Simple case, the first token for your hex number is
an optional sign - without names, you can't just access field 2, say,
of the expression, you have to first test to see if the sign was
provided or not, and then access field 2 or 3 accordingly. On the
other hand, if you had given field 2 a name, your parser would be more
robust, even you later changed your grammar to include other elements,
such as a leading, um, currency symbol or something.)

Just some fodder for your reverb considerations...

-- Paul

Dec 30 '05 #3
Paul McGuire wrote:
I don't find 'Interval' to be very easy on the eyes. In this case, I
stole^H^H^H^H^H borrowed the re form of "[A-Za-z0-9]", providing a
method named srange ("s" is for "string") such that srange("a-fA-F")
would return the string "abcdefABCDEF".


Thank you for your answers Paul. Just a note: I have called it
interval-something (like cinterval or Interval) instead of
range-something because it returns a closed interval (all letters
inclusive of both endpoints), to show its difference from the Python
range that returns a right open interval. Calling a srange("a","z") in
Python lets me think that it generates the ["a",...,"y"] range.

Bye,
bearophile

Jan 3 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Nel | last post by:
I have a question related to the "security" issues posed by Globals ON. It is good programming technique IMO to initialise variables, even if it's just $foo = 0; $bar = ""; Surely it would...
4
by: Craig Bailey | last post by:
Anyone recommend a good script editor for Mac OS X? Just finished a 4-day PHP class in front of a Windows machine, and liked the editor we used. Don't recall the name, but it gave line numbers as...
1
by: Chris | last post by:
Sorry to post so much code all at once but I'm banging my head against the wall trying to get this to work! Does anyone have any idea where I'm going wrong? Thanks in advance and sorry again...
1
by: John Ryan | last post by:
What PHP code would I use to check if submitted sites to my directory actually exist?? I want to use something that can return the server code to me, ie HTTP 300 OK, or whatever. Can I do this with...
10
by: James | last post by:
What is the best method for creating a Web Page that uses both PHP and HTML ? <HTML> BLA BLA BLA BLA BLA
8
by: Beowulf | last post by:
Hi Guru's, I have a query regarding using PHP to maintain a user profiles list. I want to be able to have a form where users can fill in their profile info (Name, hobbies etc) and attach an...
7
by: Player | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello all Decided to try and teach myself Python again for my 1st language, after having a look at perl again and a few others and finally...
0
by: zratis | last post by:
Hello, I wanted to find out what MySQL administrators think about MySQL installation on RedHat and other Linux distributions with the use of RPM. RedHat reallocates all MySQL files into...
3
by: BrianGenisio | last post by:
Does anyone know if any of the Windows Distributions are currently shipping with .Net 2.0? I am pretty sure that XP SP2 does not, but I assume that Vista does. Any further insight? Thanks,...
12
by: alf | last post by:
Hi, for some reason I have to deal with custom python distributions. It turned out it is quite simple - I just install the python from python.org and all the libs needed. Then I take...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.