473,809 Members | 2,742 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Distributions, RE-verb and the like

Psyco is finished now, and it works on the x86, for Win, the new macs,
many linux boxes, etc, and it's quite useful, so maybe it can be added
to the standard Python distribution.

PyChecker (and the other similar ones that work differently) is very
useful too, and it's pure Python, so maybe it too (or something
similar) can be added to the standard distribution.

--------------------

Regular Expressions can be useful but:
- They look really an-pythonic
- Their syntax is difficult to remember
- It's not easy to understand and debug REs written by other people,
comments help a little.
- They can have hidden bugs (because of their low readability)
- they mix their opeators/syntax with the data (this is thier advantage
too), this can create problems, and makes them less general (because
you have to avoid mistaking syntax elements with the data).
- Python has already syntax to define structures, loops, etc, so
another syntax can be seen as a kind of duplication.

Such things go against lot of points of the Python Zen. So I'd like a
more pythonic syntax, easy to remember, easy to read and debug, where
data and operators are fully separated. The "reverb" library does
something like that already:
http://home.earthlink.net/~jasonrandharper/reverb.py

It compiles RE written like this:

xdigit = set(digits, char('a').to('f '), char('A').to('F '))
pat = RE((text('$') | text('0x') | text('0X')) + required(hexdig it, max
= 8) - followedBy(hexd igit))

I have already "improved" reverb a little, but I think a better and
simpler syntax can be invented by people more expert than me in REs.
Here are some alrernative syntax possibilities, I don't like them, most
of them are impossible, silly, stupid, etc, but I am sure a good syntax
can be invented.

Standard RE:
pat.pattern: (?:\$|0x|0X)[\da-gA-G]{1,8}(?![\da-gA-G])

hexdigit = set(digits, chrint('a','f') , chrint('A','F') )
pat = RE((text('$') | text('0x') | text('0X')) + repeated(hexdig it, 1,
8) - followedBy(hexd igit))

hexdigit = alt(digits, chrint('a','f') , chrint('A','F') )
pat = optional("-") + alt('$', '0x', '0X') + times(hexdigit, 1, 8) -
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = optional("-") + VR('$', '0x', '0X') + times(hexdigit, 1, 8)-
hexdigit

hexdigit = VR(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", min=0) + VR('$', '0x', '0X') + VR(hexdigit, min=1, max=8)
- hexdigit

hexdigit = VR( VR(digits) | interval('a', 'f') | interval('A', 'F') )
pat = VR("-", 0) + VR(VR('$') | VR('0x') | VR('0X')) + VR(hexdigit, 1,
8) - hexdigit

hexdigit = Alt(digits, interval('a', 'f'), interval('A', 'F'))
pat = VR("-", 0) + Alt('$', '0x', '0X') + VR(hexdigit, 1, 8) - hexdigit

hexdigit = Alternative(dig its, Interval('a', 'f'), Interval('A', 'F'))
pat = Optional("-") + Alternative('$' , '0x', '0X') + RE(Hexdigit, 1, 8)
- Hexdigit

hexdigit = Alternative(dig its, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-", 0) + Alternative('$' , '0x', '0X') + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE([digits, Interval('a', 'f'), Interval('A', 'F')]) #
flatten sul primo parametro
pat = RE("-", 0) + RE(['$', '0x', '0X']) + RE(Hexdigit, 1, 8) -
Hexdigit

hexdigit = RE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = RE("-").repeat(0 ) + RE('$', '0x', '0X') + Hexdigit.repeat (1, 8) -
Hexdigit

hexdigit = VRE(digits, Interval('a', 'f'), Interval('A', 'F'))
pat = VRE("-").repeat(0 ) + VRE('$', '0x', '0X') + Hexdigit.repeat (1, 8)
- Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").repeat(0 ) + Vre('$', '0x', '0X') + Hexdigit.repeat (1,
8) - Hexdigit

hexdigit = Vre(digits, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional () + Vre('$', '0x', '0X') +
Hexdigit.repeat (1, 8) - Hexdigit

hexdigit = Vre(Vre().digit s, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Optional("-") + Vre('$', '0x', '0X') + Repeat(Hexdigit , 1, 8)
- Hexdigit

hexdigit = Vre(Vre().digit s, Interval('a', 'f'), Interval('A', 'F'))
hexnum = Vre("-").optional () + Vre('$', '0x', '0X') +
Hexdigit.repeat (1, 8) - Hexdigit

hexdigit = Vre(Vre().digit s, Interval('a', 'f')).ignorecas e()
hexnum = Vre("-").optional () + Vre('$', '0x').ignorecas e() +
Hexdigit.repeat (1, 8) - Hexdigit

hexdigit = Alternative(Dig its, Interval('a', 'f')).ignorecas e()
hexnum = Optional("-") + Alternative('$' , '0x').ignorecas e() +
Repeat(Hexdigit , 1, 8) - Hexdigit
I think that once the best syntax is found, implementing a better
reverb-like module isn't too much work (my modified version of reverb
is only about 130 LOCs).

Bye,
bearophile

Dec 29 '05 #1
3 1308
Bearophile -

Well, I fear this may end up being another of those "easier to
reinvent" wheels. All of your issues with RE's are the same ones I had
with lex/yacc (and re's too) when I wrote pyparsing.

Any chance for convergence here?

(BTW, there is nothing inherently wrong with "reinventin g wheels". The
metaphor is a bit flawed, since there are many different types of
wheels in the world, and not all interchangeable - consider a tractor
wheel vs. a bicycle wheel. Some legitimate/valid/justified endeavors
are wrongly indicted for "reinventin g the wheel" when in fact they are
focusing on a particular niche of wheeldom, deserving of its own
specialized invention.)

-- Paul

Dec 29 '05 #2
Oh, the pyparsing rendition of your initial pat expression would be
something like:

import pyparsing as pp
pat = pp.Combine( pp.oneOf("$ 0x 0X") + pp.Word(pp.hexn ums,max=8) )

Combine is needed to ensure that the leading $, 0x, or 0X is
immediately followed by 1-8 (and no more than 8) hex digits.
Otherwise, pyparsing is pretty tolerant of whitespace cropping up
wherever.

As for some of your other syntaxes:

I'm not sure what "Vre" means.

I found that "Alternativ e" needs to support both greedy and non-greedy
matches, so I provided Or and MatchFirst, respectively. They are also
definable using '^' and '|' operators, again respectively. Finally, I
ran into Literal("this") | Literal("that") | Literal("other" ) so often
that I just added a helper method oneOf that would take the string
"this that other" and build the right expression out of it. This too
is non-trivial, as you have to take care that some short literals may
mask longer ones in the list, as in oneOf("< = > <= >= !="). Just
replacing this directly with Literal("<") | Literal("=") | ... would
prevent any matching of the ">=" or "<=" literals. You could replace
with the Or (^) form, but this exhaustively checks all alternatives all
the time, a regrettable run-time performance penalty. Pyparsing's
implementation of oneOf leaves the literals in the given order, unless
a duplicate is given, or an earlier literal masks a later one - in that
case, the longer literal is moved ahead of the shorter.

I implemented Optional as a wrapper-type class, as opposed to the
..optional() method that you have given - I'd say there are tradeoffs
either way, just making the comparison.

Your "repeated" or "times" seem to map roughly to pyparsing's OneOrMore
and ZeroOrMore.

Any thought how a recursive grammar might look?

I don't find 'Interval' to be very easy on the eyes. In this case, I
stole^H^H^H^H^H borrowed the re form of "[A-Za-z0-9]", providing a
method named srange ("s" is for "string") such that srange("a-fA-F")
would return the string "abcdefABCD EF".

The other end of this process has to do with how the calling program
will process the parsed results. Once a grammar gets too deeply
nested, or has too many Optional elements, just returning a simple list
or nested list of tokens isn't enough. Pyparsing returns ParseResults
objects, which can be accessed as a list, dictionary, or object with
attributes (provided individual fields have been given names at grammar
definition time). I *have* had some complaints about ParseResults
("ParseResul ts are evil"), but the named access is a life-saver for
complex grammars. (Simple case, the first token for your hex number is
an optional sign - without names, you can't just access field 2, say,
of the expression, you have to first test to see if the sign was
provided or not, and then access field 2 or 3 accordingly. On the
other hand, if you had given field 2 a name, your parser would be more
robust, even you later changed your grammar to include other elements,
such as a leading, um, currency symbol or something.)

Just some fodder for your reverb considerations. ..

-- Paul

Dec 30 '05 #3
Paul McGuire wrote:
I don't find 'Interval' to be very easy on the eyes. In this case, I
stole^H^H^H^H^H borrowed the re form of "[A-Za-z0-9]", providing a
method named srange ("s" is for "string") such that srange("a-fA-F")
would return the string "abcdefABCD EF".


Thank you for your answers Paul. Just a note: I have called it
interval-something (like cinterval or Interval) instead of
range-something because it returns a closed interval (all letters
inclusive of both endpoints), to show its difference from the Python
range that returns a right open interval. Calling a srange("a","z") in
Python lets me think that it generates the ["a",...,"y"] range.

Bye,
bearophile

Jan 3 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4325
by: Nel | last post by:
I have a question related to the "security" issues posed by Globals ON. It is good programming technique IMO to initialise variables, even if it's just $foo = 0; $bar = ""; Surely it would be better to promote better programming than rely on PHP to compensate for lazy programming?
4
6431
by: Craig Bailey | last post by:
Anyone recommend a good script editor for Mac OS X? Just finished a 4-day PHP class in front of a Windows machine, and liked the editor we used. Don't recall the name, but it gave line numbers as well as some color coding, etc. Having trouble finding the same in an editor that'll run on OS X. -- Floydian Slip(tm) - "Broadcasting from the dark side of the moon"
1
4102
by: Chris | last post by:
Sorry to post so much code all at once but I'm banging my head against the wall trying to get this to work! Does anyone have any idea where I'm going wrong? Thanks in advance and sorry again for adding so much code... <TABLE border="1" bordercolor="#000000" cellspacing="0"> <TR>
1
3706
by: John Ryan | last post by:
What PHP code would I use to check if submitted sites to my directory actually exist?? I want to use something that can return the server code to me, ie HTTP 300 OK, or whatever. Can I do this with sockets??
10
4228
by: James | last post by:
What is the best method for creating a Web Page that uses both PHP and HTML ? <HTML> BLA BLA BLA BLA BLA
8
4421
by: Beowulf | last post by:
Hi Guru's, I have a query regarding using PHP to maintain a user profiles list. I want to be able to have a form where users can fill in their profile info (Name, hobbies etc) and attach an image, which will upload the record to a mySql db so users can then either view all profiles or query.. I.e. show all males in UK, all femails over 35 etc. Now, I'm not asking for How to do this but more what would be the best way? I've looked at...
7
1782
by: Player | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello all Decided to try and teach myself Python again for my 1st language, after having a look at perl again and a few others and finally deciding that if I was ever going to fully teach myself a language it would be Python that would be the one I could adapt to the quickest and easiest. = Basically meaning the syntax seems to really settle well in my brain, more so than any
0
1801
by: zratis | last post by:
Hello, I wanted to find out what MySQL administrators think about MySQL installation on RedHat and other Linux distributions with the use of RPM. RedHat reallocates all MySQL files into different places in the file system. Is it easier to learn MySQL administration on a non-RPM MySQL installation? Are the standards that are used in books, articles and guides more relevant to non-RPM installations? I get a sense that it's better to...
3
1131
by: BrianGenisio | last post by:
Does anyone know if any of the Windows Distributions are currently shipping with .Net 2.0? I am pretty sure that XP SP2 does not, but I assume that Vista does. Any further insight? Thanks, B
12
1410
by: alf | last post by:
Hi, for some reason I have to deal with custom python distributions. It turned out it is quite simple - I just install the python from python.org and all the libs needed. Then I take python2n.dll from c:\win*\system32 and move directly to PYTHONDIR. Then I can just tar/zip the PYTHON dir and distribute around so users do not have to deal with dozen of libs and packages.
0
9601
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10376
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10379
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10115
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9199
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6881
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5550
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5687
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4332
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.