473,842 Members | 1,580 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Using PLY

Hi,

I know that PLY lex is able to do line counting. I am wondering if there
is a way to count the number of each keywords (tokens) in a given file?
For example, how many IF tokens etc?

Thanks
Maurice
Jul 18 '05 #1
9 2206
On Fri, 17 Sep 2004 04:48:36 GMT, Maurice LING <ma*********@ac m.org> wrote:
Hi,

I know that PLY lex is able to do line counting. I am wondering if there
is a way to count the number of each keywords (tokens) in a given file?
For example, how many IF tokens etc?

import tokenize
import StringIO
src = StringIO.String IO(""" ... if a: foo()
... elif b: bar()
... if c: baz()
... """) sum([1 for t in tokenize.genera te_tokens(src.r eadline) if t[1]=='if']) 2

That generates an intermediate list with a 1 for each 'if', but it's not a big
price to pay IMO.
If you have a file in the current working directory, e.g., foo.py, substitute

src = file('foo.py')

or do it in one line, like (untested):

sum([1 for t in tokenize.genera te_tokens(file( 'foo.py').readl ine) if t[1]=='if'])

generate_tokens returns a generator that returns tuples, e.g. for the above

Rewind src: src.seek(0)
Get the generator: tg = tokenize.genera te_tokens(src.r eadline)
Manually get a couple of examples: tg.next() (53, '\n', (1, 0), (1, 1), '\n') tg.next() (1, 'if', (2, 0), (2, 2), 'if a: foo()\n')

Rewind the StringIO object to start again: src.seek(0)
Show all the token tuples: for t in tokenize.genera te_tokens(src.r eadline): print t

...
(53, '\n', (1, 0), (1, 1), '\n')
(1, 'if', (2, 0), (2, 2), 'if a: foo()\n')
(1, 'a', (2, 3), (2, 4), 'if a: foo()\n')
(50, ':', (2, 4), (2, 5), 'if a: foo()\n')
(1, 'foo', (2, 6), (2, 9), 'if a: foo()\n')
(50, '(', (2, 9), (2, 10), 'if a: foo()\n')
(50, ')', (2, 10), (2, 11), 'if a: foo()\n')
(4, '\n', (2, 11), (2, 12), 'if a: foo()\n')
(1, 'elif', (3, 0), (3, 4), 'elif b: bar()\n')
(1, 'b', (3, 5), (3, 6), 'elif b: bar()\n')
(50, ':', (3, 6), (3, 7), 'elif b: bar()\n')
(1, 'bar', (3, 8), (3, 11), 'elif b: bar()\n')
(50, '(', (3, 11), (3, 12), 'elif b: bar()\n')
(50, ')', (3, 12), (3, 13), 'elif b: bar()\n')
(4, '\n', (3, 13), (3, 14), 'elif b: bar()\n')
(1, 'if', (4, 0), (4, 2), 'if c: baz()\n')
(1, 'c', (4, 3), (4, 4), 'if c: baz()\n')
(50, ':', (4, 4), (4, 5), 'if c: baz()\n')
(1, 'baz', (4, 6), (4, 9), 'if c: baz()\n')
(50, '(', (4, 9), (4, 10), 'if c: baz()\n')
(50, ')', (4, 10), (4, 11), 'if c: baz()\n')
(4, '\n', (4, 11), (4, 12), 'if c: baz()\n')
(0, '', (5, 0), (5, 0), '')

HTH

Regards,
Bengt Richter
Jul 18 '05 #2
huy
Maurice LING wrote:
Hi,

I know that PLY lex is able to do line counting. I am wondering if there
is a way to count the number of each keywords (tokens) in a given file?
For example, how many IF tokens etc?

Thanks
Maurice

PLY can do much more then line counting. Build an AST then just count
your IF tokens. If you can use ply to recognise your if tokens, then you
can easility count them.

It's kinda vague what you are wanting to do ? Is it source code you are
parsing or text with keywords ?

Huy
Jul 18 '05 #3
> >>> import tokenize
>>> import StringIO
>>> src = StringIO.String IO("""

...


The tokenize module would definitely be simpler if it's Python code
that he happens to be parsing. If it's not Python code, then there's
still a reason to use PLY..

------------------------------------------

Here's a kludgy but quick solution- modify the LexToken class in
lex.py to keep track of number of type occurences.

class LexToken(object ): # change to new style class
type_count = {} # store the count here
def __setattr__(sel f, key, value):
if key == 'type':
# when type attribute is assigned, increment counter
if value not in self.type_count :
self.type_count[value] = 1
else:
self.type_count[value] += 1
object.__setatt r__(self, key, value)

# ... and proceed with the original definition of LexToken

def __str__(self):
return "LexToken(%s,%r ,%d)" %
(self.type,self .value,self.lin eno)
def __repr__(self):
return str(self)
def skip(self,n):
try:
self._skipn += n
except AttributeError:
self._skipn = n
-----------------------------------------

After you've run the lexer, lex.LexToken.ty pe_count will the contain
number of occurences of each token type.

-----------------------------------------

(Caveats- 1. I haven't tested this code. 2. I've got PLY 1.3;
syntax may have changed in newer versions. In fact, I hope it's
changed; while PLY works very well, its usage could be way more
pythonic)
Jul 18 '05 #4

The tokenize module would definitely be simpler if it's Python code
that he happens to be parsing. If it's not Python code, then there's
still a reason to use PLY..

Thanks, I'm not parsing Python codes for sure and so, it is a good
reason to use PLY.

Another thing that I am quite puzzled by is the yacc part of PLY. Most
of the examples are showing calculators and the yacc part does the
calculations such as,

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]

this is a bad example, I know. But how do I get it to output some
intermediate representations , like AST, or an intermediate code
(byte-code type).

Is

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
print "byte_x" + p[0]

or something like this legal?

I hope that I am clear about what I am trying to say.

Thanks in advanced
Maurice
Jul 18 '05 #5
>
PLY can do much more then line counting. Build an AST then just count
your IF tokens. If you can use ply to recognise your if tokens, then you
can easility count them.
How do I build an AST with PLY? I'm trying to find some examples of that
but unsuccessful.

It's kinda vague what you are wanting to do ? Is it source code you are
parsing or text with keywords ?


I'm trying to parse what looks like a 4GL source code.
Jul 18 '05 #6
>
Here's a kludgy but quick solution- modify the LexToken class in
lex.py to keep track of number of type occurences.

class LexToken(object ): # change to new style class
type_count = {} # store the count here
def __setattr__(sel f, key, value):
if key == 'type':
# when type attribute is assigned, increment counter
if value not in self.type_count :
self.type_count[value] = 1
else:
self.type_count[value] += 1
object.__setatt r__(self, key, value)

# ... and proceed with the original definition of LexToken

def __str__(self):
return "LexToken(%s,%r ,%d)" %
(self.type,self .value,self.lin eno)
def __repr__(self):
return str(self)
def skip(self,n):
try:
self._skipn += n
except AttributeError:
self._skipn = n
-----------------------------------------

After you've run the lexer, lex.LexToken.ty pe_count will the contain
number of occurences of each token type.

-----------------------------------------

(Caveats- 1. I haven't tested this code. 2. I've got PLY 1.3;
syntax may have changed in newer versions. In fact, I hope it's
changed; while PLY works very well, its usage could be way more
pythonic)


I may be an idiot here but I don't quite see how LexToken.__seta ttr__ is
called. There seems to be a gap in my logics.

Please assist.

Thanks
Maurice
Jul 18 '05 #7
Maurice LING wrote:
....
Another thing that I am quite puzzled by is the yacc part of PLY. Most
of the examples are showing calculators and the yacc part does the
calculations such as,

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]

this is a bad example, I know.
Simple examples of lex/yacc type things tend to have this though.
But how do I get it to output some
intermediate representations , like AST, or an intermediate code
(byte-code type).

Is

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
print "byte_x" + p[0]

or something like this legal?
It's legal, but probably not what you want.

Normally you have Lex --(token) --> Parse --(AST)--> Something Interesting.

If Something Interesting is simple, you can do that instead at the AST stage
which is what the examples do.

If you wanted to modify the example/calc/calc.py in the PLY distribution to
return an AST to play with you would change it's rules to store the parsed
structure rather than do the work. Taking the route of minimal change to
try and make it obvious what I've changed:

def p_statement_ass ign(p):
'statement : NAME EQUALS expression'
p[0] = [ "assignment ", p[1], p[3] ] # names[p[1]] = p[3]

def p_statement_exp r(p):
'statement : expression'
p[0] = [ expr_statement" , p[1] ] # print p[1]

def p_expression_bi nop(p):
'''expression : expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression'''
p[0] = ["binop_expr ", p[2], p[1], p[3] ] # long if/elif evaluation

def p_expression_um inus(p):
'expression : MINUS expression %prec UMINUS'
p[0] = ["uminus_exp r", p[2]] # p[0] = -p[2]

def p_expression_gr oup(p):
'expression : LPAREN expression RPAREN'
p[0] = ["expression ", p[2] ] # p[0] = p[2]

def p_expression_nu mber(p):
'expression : NUMBER'
p[0] = ["number", p[1]] # p[0] = p[1]

def p_expression_na me(p):
'expression : NAME'
p[0] = ["name", p[1] ] # p[0] = names[p[1]], with error handling

A sample AST this could generate would be:

[ "assignment ",
["name", "BOB" ],
["expression ",
["binop_expr ",
"*",
["number", 7],
["number", 9]
]
]
]

In example/calc/calc.py this value would be returned here:

while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
AST = yacc.parse(s) #### <- ------ HERE!

(NB, slight change to the line ####)

This is a very boring, not very interesting, not that great AST,but should
hopefully get you started. You should be able to see that by traversing
this tree you could get the same result as the original code, or could spit
out code that performs this functionality. Often its nice to have some
simplification of the tree as well since this sort of thing can be rather
unwieldy for realistic languages.

It's also worth noting that the calc.py example is also very toy in that it
matches single lines using the parser rather than collections of lines. (ie
the parser has no conception of a piece of code containing more than one
statement)
I'm trying to parse what looks like a 4GL source code.


FWIW, start small - start with matching the simplest expressions you can and
work forward from there (unless you're lucky enough to have a LALR(1) or
SLR(1) grammar for it suitable for PLY already). Test first style coding
for grammars feels intuitively wrong, but seems to work really well in
practice - just make sure that after making every test work check in the
result to CVS/your favourite version control system :-)

One other tip you might find useful - rather than sending the lexer whole
files as PLY seems to expect, do line handling yourself and send it lines
instead - it works much more like Flex/lex that way.

Regards,
Michael.

Jul 18 '05 #8
>
Here's a kludgy but quick solution- modify the LexToken class in
lex.py to keep track of number of type occurences.

class LexToken(object ): # change to new style class
type_count = {} # store the count here
def __setattr__(sel f, key, value):
if key == 'type':
# when type attribute is assigned, increment counter
if value not in self.type_count :
self.type_count[value] = 1
else:
self.type_count[value] += 1
object.__setatt r__(self, key, value)

# ... and proceed with the original definition of LexToken

def __str__(self):
return "LexToken(%s,%r ,%d)" %
(self.type,self .value,self.lin eno)
def __repr__(self):
return str(self)
def skip(self,n):
try:
self._skipn += n
except AttributeError:
self._skipn = n
-----------------------------------------

After you've run the lexer, lex.LexToken.ty pe_count will the contain
number of occurences of each token type.

-----------------------------------------

(Caveats- 1. I haven't tested this code. 2. I've got PLY 1.3;
syntax may have changed in newer versions. In fact, I hope it's
changed; while PLY works very well, its usage could be way more
pythonic)


Thank you, it works well. I think this should be included in the next
release.

I am able to do a "print lex.LexToken.ty pe_count" after each token and
it did show the incremental numbers of each tokens, except t_ignore.

Thanks again
maurice
Jul 18 '05 #9
Michael Sparks wrote:
Maurice LING wrote:
...
Another thing that I am quite puzzled by is the yacc part of PLY. Most
of the examples are showing calculators and the yacc part does the
calculation s such as,

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]

this is a bad example, I know.

Simple examples of lex/yacc type things tend to have this though.

But how do I get it to output some
intermediat e representations , like AST, or an intermediate code
(byte-code type).

Is

def p_expression_gr oup(self, p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
print "byte_x" + p[0]

or something like this legal?

It's legal, but probably not what you want.

Normally you have Lex --(token) --> Parse --(AST)--> Something Interesting.

If Something Interesting is simple, you can do that instead at the AST stage
which is what the examples do.

If you wanted to modify the example/calc/calc.py in the PLY distribution to
return an AST to play with you would change it's rules to store the parsed
structure rather than do the work. Taking the route of minimal change to
try and make it obvious what I've changed:

def p_statement_ass ign(p):
'statement : NAME EQUALS expression'
p[0] = [ "assignment ", p[1], p[3] ] # names[p[1]] = p[3]

def p_statement_exp r(p):
'statement : expression'
p[0] = [ expr_statement" , p[1] ] # print p[1]

def p_expression_bi nop(p):
'''expression : expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression'''
p[0] = ["binop_expr ", p[2], p[1], p[3] ] # long if/elif evaluation

def p_expression_um inus(p):
'expression : MINUS expression %prec UMINUS'
p[0] = ["uminus_exp r", p[2]] # p[0] = -p[2]

def p_expression_gr oup(p):
'expression : LPAREN expression RPAREN'
p[0] = ["expression ", p[2] ] # p[0] = p[2]

def p_expression_nu mber(p):
'expression : NUMBER'
p[0] = ["number", p[1]] # p[0] = p[1]

def p_expression_na me(p):
'expression : NAME'
p[0] = ["name", p[1] ] # p[0] = names[p[1]], with error handling

A sample AST this could generate would be:

[ "assignment ",
["name", "BOB" ],
["expression ",
["binop_expr ",
"*",
["number", 7],
["number", 9]
]
]
]

In example/calc/calc.py this value would be returned here:

while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
AST = yacc.parse(s) #### <- ------ HERE!

(NB, slight change to the line ####)

This is a very boring, not very interesting, not that great AST,but should
hopefully get you started. You should be able to see that by traversing
this tree you could get the same result as the original code, or could spit
out code that performs this functionality. Often its nice to have some
simplification of the tree as well since this sort of thing can be rather
unwieldy for realistic languages.

It's also worth noting that the calc.py example is also very toy in that it
matches single lines using the parser rather than collections of lines. (ie
the parser has no conception of a piece of code containing more than one
statement)

I'm trying to parse what looks like a 4GL source code.

FWIW, start small - start with matching the simplest expressions you can and
work forward from there (unless you're lucky enough to have a LALR(1) or
SLR(1) grammar for it suitable for PLY already). Test first style coding
for grammars feels intuitively wrong, but seems to work really well in
practice - just make sure that after making every test work check in the
result to CVS/your favourite version control system :-)


I've worked out my grammar in BNF, so I hope it is context free.
One other tip you might find useful - rather than sending the lexer whole
files as PLY seems to expect, do line handling yourself and send it lines
instead - it works much more like Flex/lex that way.

Regards,
Michael.

Thank you, this really helped my understanding.

maurice
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
5728
by: Enos Meroka | last post by:
Hallo, I am a student doing my project in the university.. I have been trying to compile the program using HP -UX aCC compiler, however I keep on getting the following errors. ================================================================= Error 19: "CORBAManagerMessages.h", line 4 # Unexpected 'std'. using std::string; ^^^
3
2168
by: Mike L | last post by:
Should the command call "using" be before or after my namespace? **AFTER** namespace DataGridBrowser { using System; using System.Drawing; using System.Drawing.Drawing2D; using System.Collections;
3
2445
by: xzzy | last post by:
I was wondering why we have to have using System.Data using System.Configuration using etc.... why are they not all lumped into one 'using'? In other words, is there a best way to use project classes with 'using' meaning
14
5808
by: pmud | last post by:
Hi, I need to use an Excel Sheet in ASP.NET application so that the users can enter (copy, paste ) large number of rows in this Excel Sheet. Also, Whatever the USER ENETRS needs to go to the SQL DATABASE, probably by the click of a button. Is this possible? & what is the BEST APPROACH for doing this? & also if any links are there do tell those to me too coz I have no idea how to go about doing it.
8
2417
by: acb | last post by:
Hi, I wrote a DLL Component (using Visual Studio 2005) and managed to include it into a C# Console application. I am now trying to include this component into a Web project. I copy the DLL into the bin directory but am not able to progress. Can anyone please guide me to an online tutorial on the subject. Thanks,
0
2210
by: Metal2You | last post by:
I'm working on an ASP.NET 2.0 application in Visual Studio 2005 that accesses a Sybase database back end. We're using Sybase SQL Anywhere 9.0.2.3228. I have installed and registered the Sybase .NET 2.0 DataProvider (iAnywhere.Data.AsaClient.dll) into the GAC so it can be used in the ProviderName property of a SQLDataSource and loads properly at run time. The application I'm writing is a bit more complex than the example I'm about to...
10
1979
by: mg | last post by:
I'm migrating from VB6 and have a question about using 'Using' and the best way to use it. Here is a example of a small bit of code: dbConx("open") Using CN Dim CMD As New OleDbCommand(sSQL, CN) Dim DR As OleDbDataReader = CMD.ExecuteReader()
0
2583
by: Eugene Anthony | last post by:
The problem with my coding is that despite removing the records stored in the array list, the rptPages repeater control is still visible. The rptPages repeater control displayes the navigation link (1,2,3 so on). The code can be found in SubscriptionCart.aspx.cs. Default.aspx ------------
3
8312
by: JDeats | last post by:
I have some .NET 1.1 code that utilizes this technique for encrypting and decrypting a file. http://support.microsoft.com/kb/307010 In .NET 2.0 this approach is not fully supported (a .NET 2.0 build with these methods, will appear to encrypt and decrypt, but the resulting decrypted file will be corrupted. I tried encrypting a .bmp file and then decrypting, the resulting decrypted file under .NET 2.0 is garbage, the .NET 1.1 build works...
6
5179
by: =?Utf-8?B?U2hhd24gU2VzbmE=?= | last post by:
Greetings! I was researching AJAX to provide a solution to displaying status messages while a long process executed. I found several examples online and was able to use their code to get a quick application working. However, when attempting to implement the solution, the AJAX calls weren't updating the screen like the examples were and seemed not to fire until after the long running process had completed. I found the only real...
0
9717
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10950
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10681
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9459
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7862
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5699
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5886
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4506
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4096
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.