473,385 Members | 1,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Any help with PLY?

Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!

Nov 22 '05 #1
2 2187
<ma********@reading.ac.uk> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!

Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :

any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
...

identifier = Word( alphas, alphanums + "_" ).setName("identifier")

real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")

udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")

# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
"longArray"
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )

This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.

Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:

any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
...
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )
Does PLY support greedy matching?

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)
Nov 22 '05 #2
<ma********@reading.ac.uk> wrote in message
news:11**********************@g43g2000cwa.googlegr oups.com...
Hi folks,

I've been trying to write a PLY parser and have run into a bit of
bother.

At the moment, I have a RESERVEDWORD token which matches all reserved
words and then alters the token type to match the reserved word that
was detected. I also have an IDENTIFIER token which matches
identifiers that are not reserved words.

The problem is, if I put RESERVEDWORD before IDENTIFIER, then
identifiers that happen to begin with reserved words are wrongly lexed
as the reserved word followed by an identifier. For example, because
"if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
as the IDENTIFIER "ifollowyou".

If I put IDENTIFIER first, though, every single reserved word in the
input is lexed as an IDENTIFIER.

Is there any way I can tell PLY that it should only return a
RESERVEDWORD in the correct circumstances? If PLY can't do this, can
any of the other Python parser generators? (It seems that Lex can..)

Thanks!

Pyparsing uses the Keyword class for just this purpose. Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :

any_ = Literal("any")
boolean_ = Literal("boolean")
char_ = Literal("char")
double_ = Literal("double")
...

identifier = Word( alphas, alphanums + "_" ).setName("identifier")

real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
+ Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
Word( nums+"+-", nums ) ).setName("int")

udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")

# have to use longest match for type, in case a user-defined
# type name starts with a keyword type, like "stringSeq" or
"longArray"
typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
wchar_ ^ wstring_ ^ udTypeName )

This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.

Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression. Now
with Keyword, I can define:

any_ = Keyword("any")
boolean_ = Keyword("boolean")
char_ = Keyword("char")
double_ = Keyword("double")
...
typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
float_ | long_ | octet_ | short_ | string_ |
wchar_ | wstring_ | udTypeName )
Does PLY support greedy matching?

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)
Nov 22 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Madhusudan Singh | last post by:
Hi After months of waiting for Redhat to come out with up to date rpms, I decided to compile a working OpenSSL/ MySQL / PHP / httpd installation for myself. Installed the latest versions of...
0
by: Alexander Skwar | last post by:
Hello! I'm having problems getting PHP 4.3.3RC4 successfully to install on my HP-UX 11.00 server. After a (successfull?) compile, "make install" errors out with this error message: ...
7
by: Darren Gamble | last post by:
Good day, I've sent a message on this to the php-general list already, but unfortunately no one replied. Sorry for the repost. to those that read both... I'm having a problem working with an...
7
by: kecebong | last post by:
I tried to compile php 4.3.3 with gd but it doesn't work, it wasnt show in phpinfo(). My system is redhat 9 and apache 2.0.47 webserver.
0
by: Slavik | last post by:
All libraries were installed (precompiled) This is FreeBSD 5.1 installed zlib, installed jpeg and png libraries (in default directories) GD 2.0.11 source is in /usr/gd-2.0.11 (compiled and...
3
by: Garrett Albright | last post by:
Trying to compile PHP 5 beta 4, and not having much fun... % ./configure --with-apxs --with-mod_charset --with-zlib --with-bz2 --with-curl --with-gd --with-mhash --with-pspell...
0
by: LRW | last post by:
(Not even sure if that's the right way to word the question.) We're trying to migrate to a new server, and upgrade the PHP on the new server in the process. We're using a RedHat Enterprise Server...
9
by: Penn Markham | last post by:
Hello all, I am writing a script where I need to use the system() function to call htpasswd. I can do this just fine on the command line...works great (see attached file, test.php). When my...
3
by: Ron King | last post by:
When I installed Mandrake 10.0 I thought I had Apache, PHP, and MySQL installed correctly. I could serve web pages, MySQL worked, and when I tried the phpinfo() function, I got a page that looked...
13
by: Gary Quiring | last post by:
I need to create an XML string using PHP5. The examples I have followed seem to be using out dated libary calls. I tried new_xmldoc() and new DomDocument. Both get undefined errors. How do I...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.