parser recommendation

Filipe Fernandes

I have a project that uses a proprietary format and I've been using
regex to extract information from it. I haven't hit any roadblocks
yet, but I'd like to use a parsing library rather than maintain my own
code base of complicated regex's. I've been intrigued by the parsers
available in python, which may add some much needed flexibility.

I've briefly looked at PLY and pyparsing. There are several others,
but too many to enumerate. My understanding is that PLY (although
more difficult to use) has much more flexibility than pyparsing. I'm
basically looking to make an informed choice. Not just for this
project, but for the long haul. I'm not afraid of using a difficult
(to use or learn) parser either if it buys me something like
portability (with other languages) or flexibility).

I've been to a few websites that enumerate the parsers, but not all
that very helpful when it came to comparisons...

http://nedbatchelder.com/text/python-parsers.html
http://www.python.org/community/sigs...ards-standard/

I'm not looking to start a flame war... I'd just like some honest opinions.. ;)

thanks,
filipe

Jun 27 '08 #1

Subscribe Reply

2929

Paul McGuire

On Jun 3, 8:43*am, "Filipe Fernandes" <fernandes...@g mail.comwrote:

>
I've briefly looked at PLY and pyparsing. *There are several others,
but too many to enumerate. *My understanding is that PLY (although
more difficult to use) has much more flexibility than pyparsing. *I'm
basically looking to make an informed choice. *Not just for this
project, but for the long haul. *I'm not afraid of using a difficult
(to use or learn) parser either if it buys me something like
portability (with other languages) or flexibility).

Short answer: try them both. Learning curve on pyparsing is about a
day, maybe two. And if you are already familiar with regex, PLY
should not seem too much of a stretch. PLY parsers will probably be
faster running than pyparsing parsers, but I think pyparsing parsers
will be quicker to work up and get running.

Longer answer: PLY is of the lex/yacc school of parsing libraries
(PLY=Python Lex/Yacc). Use regular expressions to define terminal
token specifications (a la lex). Then use "t_XXX" and "p_XXX" methods
to build up the parsing logic - docstrings in these methods capture
regex or BNF grammar definitions. In contrast, pyparsing is of the
combinator school of parsers. Within your Python code, you compose
your parser using '+' and '|' operations, building up the parser using
pyparsing classes such as Literal, Word, OneOrMore, Group, etc. Also,
pyparsing is 100% Python, so you wont have any portability issues
(don't know about PLY).

Here is a link to a page with a PLY and pyparsing example (although
not strictly a side-by-side comparison): http://www.rexx.com/~dkuhlman/python_201/.
For comparison, here is a pyparsing version of the PLY parser on that
page (this is a recursive grammar, not necessarily a good beginner's
example for pyparsing):
===============
term = Word(alphas,alp hanums)

func_call = Forward()
func_call_list = Forward()
comma = Literal(",").su ppress()
func_call_list << Group( func_call + Optional(comma +
func_call_list) )

lpar = Literal("(").su ppress()
rpar = Literal(")").su ppress()
func_call << Group( term + lpar +
Optional(func_c all_list,defaul t=[""]) + rpar )
command = func_call

prog = OneOrMore(comma nd)

comment = "#" + restOfLine
prog.ignore( comment )
=============== =
With the data set given at Dave Kuhlman's web page, here is the
output:
[['aaa', ['']],
['bbb', [['ccc', ['']]]],
['ddd',
[['eee', ['']],
[['fff', [['ggg', ['']], [['hhh', ['']], [['iii', ['']]]]]]]]]]

Pyparsing makes some judicious assumptions about how you will want to
parse, most significant being that whitespace can be ignored during
parsing (this *can* be overridden in the parser definition).
Pyparsing also supports token grouping (for building parse trees),
parse-time callbacks (called 'parse actions'), and assigning names
within subexpressions (called 'results names'), which really helps in
working with the tokens returned from the parsing process.

If you learn both, you may find that pyparsing is a good way to
quickly prototype a particular parsing problem, which you can then
convert to PLY for performance if necessary. The pyparsing prototype
will be an efficient way to work out what the grammar "kinks" are, so
that when you get around to PLY-ifying it, you already have a clear
picture of what the parser needs to do.

But, really, "more flexible"? I wouldn't really say that was the big
difference between the two.

Cheers,
-- Paul

(More pyparsing info at http://pyparsing.wikispaces.com.)

Jun 27 '08 #2

Filipe Fernandes

On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <pt***@austin.r r.comwrote:

If you learn both, you may find that pyparsing is a good way to
quickly prototype a particular parsing problem, which you can then
convert to PLY for performance if necessary. The pyparsing prototype
will be an efficient way to work out what the grammar "kinks" are, so
that when you get around to PLY-ifying it, you already have a clear
picture of what the parser needs to do.

Thanks (both Paul and Kay) for responding. I'm still looking at Trail
in EasyExtend and pyparsing is very nicely objected oriented but PLY
does seems to have the speed advantage, so I'm leaning towards PLY

But I do have more questions... when reading the ply.py header (in
2.5) I found the following paragraph...

# The current implementation is only somewhat object-oriented. The
# LR parser itself is defined in terms of an object (which allows multiple
# parsers to co-exist). However, most of the variables used during table
# construction are defined in terms of global variables. Users shouldn't
# notice unless they are trying to define multiple parsers at the same
# time using threads (in which case they should have their head examined).

Now, I'm invariably going to have to use threads... I'm not exactly
sure what the author is alluding to, but my guess is that to overcome
this limitation I need to acquire a thread lock first before
"defining/creating" a parser object before I can use it?

Has anyone ran into this issue....? This would definitely be a
showstopper (for PLY anyway), if I couldn't create multiple parsers
because of threads. I'm not saying I need more than one, I'm just not
comfortable with that limitation.

I have a feeling I'm just misunderstandin g since it doesn't seem to
hold you back from creating multiple parsers under a single process.

filipe

Jun 27 '08 #3

Kay Schluehr

On 3 Jun., 19:34, "Filipe Fernandes" <fernandes...@g mail.comwrote:

# The current implementation is only somewhat object-oriented. The
# LR parser itself is defined in terms of an object (which allows multiple
# parsers to co-exist). However, most of the variables used during table
# construction are defined in terms of global variables. Users shouldn't
# notice unless they are trying to define multiple parsers at the same
# time using threads (in which case they should have their head examined).

Now, I'm invariably going to have to use threads... I'm not exactly
sure what the author is alluding to, but my guess is that to overcome
this limitation I need to acquire a thread lock first before
"defining/creating" a parser object before I can use it?

Nope. It just says that the parser-table construction itself relies on
global state. But you will most likely build your parser offline in a
separate run.

Jun 27 '08 #4

Paul McGuire

On Jun 3, 12:34*pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:

On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <pt...@austin.r r.comwrote:
But I do have more questions... when reading the ply.py header (in
2.5) I found the following paragraph...

# The current implementation is only somewhat object-oriented. The
# LR parser itself is defined in terms of an object (which allows multiple
# parsers to co-exist). *However, most of the variables used during table
# construction are defined in terms of global variables. *Users shouldn't
# notice unless they are trying to define multiple parsers at the same
# time using threads (in which case they should have their head examined).

Now, I'm invariably going to have to use threads... *I'm not exactly
sure what the author is alluding to, but my guess is that to overcome
this limitation I need to acquire a thread lock first before
"defining/creating" a parser object before I can use it?

Has anyone ran into this issue....? *This would definitely be a
showstopper (for PLY anyway), if I couldn't create multiple parsers
because of threads. *I'm not saying I need more than one, I'm just not
comfortable with that limitation.

I have a feeling I'm just misunderstandin g since it doesn't seem to
hold you back from creating multiple parsers under a single process.

filipe

You can use pyparsing from any thread, and you can create multiple
parsers each running in a separate thread, but you cannot concurrently
use one parser from two different threads. Some users work around
this by instantiating a separate parser per thread using pickle to
quickly construct the parser at thread start time.

-- Paul

Jun 27 '08 #5

Filipe Fernandes

On Jun 3, 12:34 pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:

>On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <pt...@austin.r r.comwrote:
But I do have more questions... when reading the ply.py header (in
2.5) I found the following paragraph...

# The current implementation is only somewhat object-oriented. The
# LR parser itself is defined in terms of an object (which allows multiple
# parsers to co-exist). However, most of the variables used during table
# construction are defined in terms of global variables. Users shouldn't
# notice unless they are trying to define multiple parsers at the same
# time using threads (in which case they should have their head examined).

Now, I'm invariably going to have to use threads... I'm not exactly
sure what the author is alluding to, but my guess is that to overcome
this limitation I need to acquire a thread lock first before
"defining/creating" a parser object before I can use it?

Has anyone ran into this issue....? This would definitely be a
showstopper (for PLY anyway), if I couldn't create multiple parsers
because of threads. I'm not saying I need more than one, I'm just not
comfortable with that limitation.

On Tue, Jun 3, 2008 at 1:53 PM, Kay Schluehr <ka**********@g mx.netwrote:

Nope. It just says that the parser-table construction itself relies on
global state. But you will most likely build your parser offline in a
separate run.

Thanks Kay for the context.., I misunderstood completely, but your
last sentence coupled with a few running examples, cleared things
right up...

On Tue, Jun 3, 2008 at 4:36 PM, Paul McGuire <pt***@austin.r r.comwrote:

You can use pyparsing from any thread, and you can create multiple
parsers each running in a separate thread, but you cannot concurrently
use one parser from two different threads. Some users work around
this by instantiating a separate parser per thread using pickle to
quickly construct the parser at thread start time.

I didn't know that pyparsing wasn't thread safe. I kind of just
assumed because of it's OO approach. Thanks for the work around. I
haven't given up on pyparsing, although I'm now heavily leaning
towards PLY as an end solution since lex and yacc parsing is available
on other platforms as well.

Thanks Kay and Paul for the advice... I'm still using the first two I
started looking at, but I'm much for confident in the choices made.

filipe

Jun 27 '08 #6

rurpy

On Jun 3, 2:55 pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:

I haven't given up on pyparsing, although I'm now heavily leaning
towards PLY as an end solution since lex and yacc parsing is available
on other platforms as well.

Keep in mind that PLY's "compatibil ity" with YACC is functional,
not syntactical. That is, you can not take a YACC file, replace
the actions with Python actions and feed it to PLY.

It's a shame that the Python world has no truly YACC compatible
parser like YAPP in the Perl world.

Jun 27 '08 #7

Alan Isaac

One other possibility:
SimpleParse (for speed).
<URL:http://simpleparse.sou rceforge.net/>
It is very nice.
Alan Isaac

Jun 27 '08 #8

Kay Schluehr

On 6 Jun., 01:58, Alan Isaac <ais...@america n.eduwrote:

One other possibility:
SimpleParse (for speed).
<URL:http://simpleparse.sou rceforge.net/>
It is very nice.
Alan Isaac

How does SimpleParse manage left-factorings, left-recursion and other
ambiguities?

For example according to [1] there are two non-terminals

UNICODEESCAPEDC HAR_16

and

UNICODEESCAPEDC HAR_32

with an equal initial section of 4 token. How does SimpleParse detect
when it has to use the second production?

[1] http://simpleparse.sourceforge.net/s..._grammars.html

Jun 27 '08 #9

Similar topics

5286

Python 2.3.5 make: *** [Parser/pgen] Error 1 Parser/grammar.o: I

by: Karalius, Joseph | last post by:

Can anyone explain what is happening here? I haven't found any useful info on Google yet. Thanks in advance. mmagnet:/home/jkaralius/src/zopeplone/Python-2.3.5 # make gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o Modules/python.o Modules/python.c gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -I. -I./Include -DPy_BUILD_CORE -o...

Python

3125

Where to look for source of HTML::Parser

by: Himanshu Garg | last post by:

Hello, I am trying to pinpoint an apparent bug in HTML::Parser. The encoding of the text seems to change incorrectly if the locale isn't set properly. However Parser.pm in the directory (/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/HTML/) doesn't seem to contain the "real" parsing statements.

Perl

1524

Testing for an XML parser

by: CES | last post by:

All, Is their a way of testing if their is an XML parser installed and what the version number is with in JavaScript??? Thanks in advance. CES

Javascript

1836

Parser suggestion

by: Jorge Godoy | last post by:

Hi! I'm needing a parser to retrieve some information from source code -- including parts of code -- from Fortran, to use in a project with a documentation system. Any recommendations on a Python app or parser that I could use for that?

Python

1268

XML Parser Won't Accept (#DATA) In DTD

by: MrFredBloggs | last post by:

Hi All, I'm new to XML. My DTD: <?xml version='1.0' encoding='utf-8'?> <!DOCTYPE database > isn't accepted by an XML parser, it complians about the line:

.NET Framework

2274

xml:base attribute added by a parser make validation fails

by: SL | last post by:

I try to validate against a schema a document stored in several files thanks to external entities. The parseur add a 'xml:base="url"' attribute on the root element of this sub-trees during parsing, so the validation of the document fails. Is there a recommanded solution to this situation ? I have no idea how to handle the problem: I don't want to take into account at the vocabulary level a question of syntax (the external entities) by...

.NET Framework

2051

Recommendation for simple command parser

by: Ratbert | last post by:

Hi, I'm searching for a portable library, providing some sort of parsing functionality. The library should help to encode and decodes command phrases. Commands are plain text C-strings, consisting of a command name and parameters (defined by parameter name and value); e.g.: command{param1{123}param2{ghij}param3{1.234}param4{param5{param6{x}param7{0}}}}

C / C++

1068

XML Parser

by: an0047 | last post by:

Hello Would like to develop a simple XML parser with own commands The aproach is first to develop a state machine to later implement it in C. I had a look to some posts relating lexical analysers but the information i found was not helpfull. I know there are some books relating the creation of tables and

.NET Framework

2240

Neophyte having trouble Installing XML::Parser module on OS X

by: UncleRic | last post by:

Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): PERL5LIB=/Users/Ric/Library/Perl/ ls XML-Parser-2.34/ XML-Parser-2.34.tar

Perl

10107

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9900

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9765

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8768

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7324

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5214

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

3863

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3442

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2733

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General