473,387 Members | 3,787 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

speed


I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

What are the average times used for text processing of Python
compared to C?

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #1
7 1922
On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.


flex has an option to generate code without the gotos...

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
Don't read everything you believe.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBJLEYgPqu395ykGsRAnZWAJ9Kf/+vqmZ/t/FJrBWvfsQPwMVdXwCgk7Jp
YmxLnwJ2ciNDG9qzeKHSW/s=
=BquW
-----END PGP SIGNATURE-----

Jul 18 '05 #2
John Lenton schreef:

flex has an option to generate code without the gotos...


I have the latest version. I can't find it, not as run time
option, not as build option.

--
Peter Kleiweg L:NL,af,da,de,en,ia,nds,no,sv,(fr,it) S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/~kleiweg/ls.html

Jul 18 '05 #3
On Thu, Aug 19, 2004 at 04:16:24PM +0200, Peter Kleiweg wrote:
John Lenton schreef:

flex has an option to generate code without the gotos...


I have the latest version. I can't find it, not as run time
option, not as build option.


hmm! you're right... I wonder what lexer it was, then? I definitely
have a weak ref to the option in my head, but the owner has been gc'ed
:(

--
John Lenton (jo**@grulic.org.ar) -- Random fortune:
There was a phone call for you.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBJLuogPqu395ykGsRAhDKAJ4xO/JWXvLl8UnQGpV3VzZWE7ArWwCgtefk
Kdqboao+WYsvWqsdZkgz2UY=
=4JCc
-----END PGP SIGNATURE-----

Jul 18 '05 #4
Peter Kleiweg <in*************@nl.invalid> wrote:
I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

What are the average times used for text processing of Python
compared to C?


I don't know Pylly, but I guess it generates a parser using
a finite automaton -- just like lex/flex, except it handles
every single character in Python, wheres lex/flex will lead
to compiled C code. That would explain the speed difference.

When I have to parse something in Python, I try to do that
using things like string.split(), string.find(), the "re"
module etc. Those things are written in C, therefore they
are fast enough for most applications. There are also some
modules for specialized cases, such as "ConfigParser" and
"shlex". See the Python Library Reference.

Best regards
Oliver

--
Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

``All that we see or seem is just a dream within a dream.''
(E. A. Poe)
Jul 18 '05 #5
Hi,

On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.
Don't try to translate the generated code to python. Python code is
(almost) always slower than C code, because C is converted into machine
code, and Python has to be interpreted by the VM. Besides, python does a
lot of checks.

Try with PLY, <http://systems.cs.uchicago.edu/ply/>. If you have
experience with flex/yacc in C, this module should be easy to use.

You can also play with Psyco (a JIT compiler for x86) or even with
Pyrex.

But, IMHO, if you has to process very big files, don't do it with
python. Instead, write a simple C-module, which uses your Flex parser
and creates python objects with that information. It should be trivial
if you have experience with the C API. :-)

What are the average times used for text processing of Python
compared to C?


IMO, Python is a powerful language to do almost everything, but in some
cases it is bad. One of this cases is intensive computing (like parsing a
big file). Use the correct tool =)

--
Ayose Cazorla León
Debian GNU/Linux - setepo
Jul 18 '05 #6

Another Python parser generator to look into is SimpleParse/mxTextTools

<http://simpleparse.sourceforge.net/>

We use it to parse and process large log files. In our case, a typical
grammar contains over 250 productions and parsing a log file of 180
Klines (100 MB) takes approx 3 min. Processing the result from the
parse step requires an additional 3 mins. This on a 2.4 GHz Xeon
machine running RedHat 8.

Obviously these figures are very grammar and application specific. Your
milage may vary.

/Jean Brouwers

PS) A good reference is David Mertz' book "Text Processing in Python"

<http://www.informit.com/title/0321112547>

or several articles on (t)his web page

<http://gnosis.cx/publish/tech_index_cp.html>


In article <ma**************************************@python.o rg>, Ayose
<ay***********@hispalinux.es> wrote:
<http://systems.cs.uchicago.edu/ply/>.

Jul 18 '05 #7
At some point, Ayose <ay***********@hispalinux.es> wrote:
On Thu, Aug 19, 2004 at 03:37:26PM +0200, Peter Kleiweg wrote:

I implemented a lexer in Pylly and compared it to the version I
had written in Flex. Processing 219062 lines took 0.9 seconds in
C (from Flex), and 5 minutes 54 second in Python (from Pylly), a
ratio of 393 to 1.

Is this normal for Python, or does Flex produce better parsers
than Pylly? I have been looking at the code produced by Flex to
see if I could translate it to Python automaticly. But it has a
lot of goto statements, and I haven't figured out how to
translate those to Python efficiently.

...
But, IMHO, if you has to process very big files, don't do it with
python. Instead, write a simple C-module, which uses your Flex parser
and creates python objects with that information. It should be trivial
if you have experience with the C API. :-)


Or have a look at FlexModule at
http://www.cs.utexas.edu/users/mcgui...ware/fbmodule/
which makes it really simple without experience with the C API.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Yang Li Ke | last post by:
Hi guys, Is it possible to know the internet speed of the visitors with php? Thanx -- Yang
8
by: Rob Ristroph | last post by:
I have tried out PHP 5 for the first time (with assistance from this group -- thanks!). The people I was working with have a site that uses lots of php objects. They are having problems with...
34
by: Jacek Generowicz | last post by:
I have a program in which I make very good use of a memoizer: def memoize(callable): cache = {} def proxy(*args): try: return cache except KeyError: return cache.setdefault(args,...
28
by: Maboroshi | last post by:
Hi I am fairly new to programming but not as such that I am a total beginner From what I understand C and C++ are faster languages than Python. Is this because of Pythons ability to operate on...
52
by: Neuruss | last post by:
It seems there are quite a few projects aimed to improve Python's speed and, therefore, eliminate its main limitation for mainstream acceptance. I just wonder what do you all think? Will Python...
7
by: YAZ | last post by:
Hello, I have a dll which do some number crunching. Performances (execution speed) are very important in my application. I use VC6 to compile the DLL. A friend of mine told me that in Visual...
6
by: Ham | last post by:
Yeah, Gotto work with my VB.Net graphic application for days, do any possible type of code optimization, check for unhandled errors and finally come up with sth that can't process 2D graphics and...
6
by: Jassim Rahma | last post by:
I want to detect the internet speed using C# to show the user on what speed he's connecting to internet?
11
by: kyosohma | last post by:
Hi, We use a script here at work that runs whenever someone logs into their machine that logs various bits of information to a database. One of those bits is the CPU's model and speed. While...
4
by: nestle | last post by:
I have DSL with a download speed of 32MB/s and an upload speed of 8MB/s(according to my ISP), and I am using a router. My upload speed is always between 8MB/s and 9MB/s(which is above the max upload...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.