473,734 Members | 2,647 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Pyparsing Question

Ant
Hi all,

I have a question on PyParsing. I am trying to create a parser for a
hierarchical todo list format, but have hit a stumbling block. I have
parsers for the header of the list (title and description), and the body
(recursive descent on todo items).

Individually they are working fine, combined they throw an exception.
The code follows:

#!/usr/bin/python
# parser.py
import pyparsing as pp

def grammar():
underline = pp.Word("=").su ppress()
dotnum = pp.Combine(pp.W ord(pp.nums) + ".")
textline = pp.Combine(pp.G roup(pp.Word(pp .alphas, pp.printables) +
pp.restOfLine))
number = pp.Group(pp.One OrMore(dotnum))

headtitle = textline
headdescription = pp.ZeroOrMore(t extline)
head = pp.Group(headti tle + underline + headdescription )

taskname = pp.OneOrMore(do tnum) + textline
task = pp.Forward()
subtask = pp.Group(dotnum + task)
task << (taskname + pp.ZeroOrMore(s ubtask))
maintask = pp.Group(pp.Lin eStart() + task)

parser = pp.OneOrMore(ma intask)

return head, parser

text = """
My Title
========

Text on a longer line of several words.
More test
and more.

"""

text2 = """

1. Task 1
1.1. Subtask
1.1.1. More tasks.
1.2. Another subtask
2. Task 2
2.1. Subtask again"""

head, parser = grammar()

print head.parseStrin g(text)
print parser.parseStr ing(text2)

comb = head + pp.OneOrMore(pp .LineStart() + pp.restOfLine) + parser
print comb.parseStrin g(text + text2)

#============== =============== =============== =============== ========

Now the first two print statements output the parse tree as I would
expect, but the combined parser fails with an exception:

Traceback (most recent call last):
File "parser.py" , line 50, in ?
print comb.parseStrin g(text + text2)
..
.. [Stacktrace snipped]
..
raise exc
pyparsing.Parse Exception: Expected start of line (at char 81), (line:9,
col:1)

Any help appreciated!

Cheers,

--
Ant.
Jun 27 '08 #1
19 1554
On May 16, 6:43*am, Ant <ant...@gmail.c omwrote:
Hi all,

I have a question on PyParsing. I am trying to create a parser for a
hierarchical todo list format, but have hit a stumbling block. I have
parsers for the header of the list (title and description), and the body
(recursive descent on todo items).
LineStart *really* wants to be parsed at the beginning of a line.
Your textline reads up to but not including the LineEnd. Try making
these changes.

1. Change textline to:

textline = pp.Combine(
pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
\
pp.LineEnd().su ppress()

2. Change comb to:

comb = head + parser

With these changes, my version of your code runs ok.

-- Paul
Jun 27 '08 #2
On May 16, 6:43*am, Ant <ant...@gmail.c omwrote:
Hi all,

I have a question on PyParsing. I am trying to create a parser for a
hierarchical todo list format, but have hit a stumbling block. I have
parsers for the header of the list (title and description), and the body
(recursive descent on todo items).

Individually they are working fine, combined they throw an exception.
The code follows:

#!/usr/bin/python
# parser.py
import pyparsing as pp

def grammar():
* * *underline = pp.Word("=").su ppress()
* * *dotnum = pp.Combine(pp.W ord(pp.nums) + ".")
* * *textline = pp.Combine(pp.G roup(pp.Word(pp .alphas, pp.printables) +
pp.restOfLine))
* * *number = pp.Group(pp.One OrMore(dotnum))

* * *headtitle = textline
* * *headdescriptio n = pp.ZeroOrMore(t extline)
* * *head = pp.Group(headti tle + underline + headdescription )

* * *taskname = pp.OneOrMore(do tnum) + textline
* * *task = pp.Forward()
* * *subtask = pp.Group(dotnum + task)
* * *task << (taskname + pp.ZeroOrMore(s ubtask))
* * *maintask = pp.Group(pp.Lin eStart() + task)

* * *parser = pp.OneOrMore(ma intask)

* * *return head, parser

text = """

My Title
========

Text on a longer line of several words.
More test
and more.

"""

text2 = """

1. Task 1
* * *1.1. Subtask
* * * * *1.1.1. More tasks.
* * *1.2. Another subtask
2. Task 2
* * *2.1. Subtask again"""

head, parser = grammar()

print head.parseStrin g(text)
print parser.parseStr ing(text2)

comb = head + pp.OneOrMore(pp .LineStart() + pp.restOfLine) + parser
print comb.parseStrin g(text + text2)

#============== =============== =============== =============== ========

Now the first two print statements output the parse tree as I would
expect, but the combined parser fails with an exception:

Traceback (most recent call last):
* *File "parser.py" , line 50, in ?
* * *print comb.parseStrin g(text + text2)
.
. [Stacktrace snipped]
.
* * *raise exc
pyparsing.Parse Exception: Expected start of line (at char 81), (line:9,
col:1)

Any help appreciated!

Cheers,

--
Ant.
I hold that the + operator should be overloaded for strings to include
newlines. Python 3.0 print has parentheses around it; wouldn't it
make sense to take them out?
Jun 27 '08 #3
Ant
Hi Paul,
LineStart *really* wants to be parsed at the beginning of a line.
Your textline reads up to but not including the LineEnd. Try making
these changes.

1. Change textline to:

textline = pp.Combine(
pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
\
pp.LineEnd().su ppress()
Ah - so restOfLine excludes the actual line ending does it?
2. Change comb to:

comb = head + parser
Yes - I'd got this originally. I added the garbage to try to fix the
problem and forgot to take it back out! Thanks for the advice - it works
fine now, and will provide a base for extending the list format.

Thanks,

Ant...
Jun 27 '08 #4
On May 16, 10:45*am, Ant <ant...@gmail.c omwrote:
Hi Paul,
LineStart *really* wants to be parsed at the beginning of a line.
Your textline reads up to but not including the LineEnd. *Try making
these changes.
1. Change textline to:
* * *textline = pp.Combine(
* * * * pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
\
* * * * pp.LineEnd().su ppress()

Ah - so restOfLine excludes the actual line ending does it?
2. Change comb to:
* * comb = head + parser

Yes - I'd got this originally. I added the garbage to try to fix the
problem and forgot to take it back out! Thanks for the advice - it works
* fine now, and will provide a base for extending the list format.

Thanks,

Ant...
There is a possibility that spirals can come from doubles, which could
be non-trivially useful, in par. in the Java library. I won't see a
cent. Can anyone start a thread to spin letters, and see what the
team looks like? I want to animate spinners. It's across
dimensions. (per something.) Swipe a cross in a fluid. I'm draw
crosses. Animate cubes to draw crosses. I.e. swipe them.

Jun 27 '08 #5
I am just getting into python, and know little about it, and am
posting to ask on what beaches the salt water crocodiles hang out.

1. Looks to me that python will not scale to very large programs,
partly because of the lack of static typing, but mostly because there
is no distinction between creating a new variable and utilizing an
existing variable, so the interpreter fails to catch typos and name
collisions. I am inclined to suspect that when a successful small
python program turns into a large python program, it rapidly reaches
ninety percent complete, and remains ninety percent complete forever.

2. It is not clear to me how a python web application scales. Python
is inherently single threaded, so one will need lots of python
processes on lots of computers, with the database software handling
parallel accesses to the same or related data. One could organize it
as one python program for each url, and one python process for each
http request, but that involves a lot of overhead starting up and
shutting down python processes. Or one could organize it as one
python program for each url, but if one gets a lot of http requests
for one url, a small number of python processes will each sequentially
handle a large number of those requests. What I am really asking is:
Are there python web frameworks that scale with hardware and how do
they handle scaling?

Please don't read this as "Python sucks, everyone should program in
machine language expressed as binary numbers". I am just asking where
the problems are.
--
----------------------
We have the right to defend ourselves and our property, because
of the kind of animals that we are. True law derives from this
right, not from the arbitrary power of the omnipotent state.

http://www.jim.com/ James A. Donald
Jun 27 '08 #6
On Tue, 20 May 2008 10:47:50 +1000, James A. Donald wrote:
>
1. Looks to me that python will not scale to very large programs,
partly because of the lack of static typing, but mostly because there
is no distinction between creating a new variable and utilizing an
existing variable, so the interpreter fails to catch typos and name
collisions. I am inclined to suspect that when a successful small
python program turns into a large python program, it rapidly reaches
ninety percent complete, and remains ninety percent complete forever.
I find this frustrating too, but not to the extent that I choose a
different language. pylint helps but it's not as good as a nice, strict
compiler.
2. It is not clear to me how a python web application scales. Python
is inherently single threaded, so one will need lots of python
processes on lots of computers, with the database software handling
parallel accesses to the same or related data. One could organize it
as one python program for each url, and one python process for each
http request, but that involves a lot of overhead starting up and
shutting down python processes. Or one could organize it as one
python program for each url, but if one gets a lot of http requests
for one url, a small number of python processes will each sequentially
handle a large number of those requests. What I am really asking is:
Are there python web frameworks that scale with hardware and how do
they handle scaling?
This sounds like a good match for Apache with mod_python.

Reid
Jun 27 '08 #7
On Mon, May 19, 2008 at 8:47 PM, James A. Donald <ja****@echeque .comwrote:
I am just getting into python, and know little about it, and am
posting to ask on what beaches the salt water crocodiles hang out.

1. Looks to me that python will not scale to very large programs,
partly because of the lack of static typing, but mostly because there
is no distinction between creating a new variable and utilizing an
existing variable, so the interpreter fails to catch typos and name
collisions. I am inclined to suspect that when a successful small
python program turns into a large python program, it rapidly reaches
ninety percent complete, and remains ninety percent complete forever.
I can assure you that in practice this is not a problem. If you do
proper unit testing then you will catch many, if not all, of the
errors that static typing catches. There are also tools like PyLint,
PyFlakes and pep8.py will also catch many of those mistakes.

2. It is not clear to me how a python web application scales. Python
is inherently single threaded, so one will need lots of python
processes on lots of computers, with the database software handling
parallel accesses to the same or related data. One could organize it
as one python program for each url, and one python process for each
http request, but that involves a lot of overhead starting up and
shutting down python processes. Or one could organize it as one
python program for each url, but if one gets a lot of http requests
for one url, a small number of python processes will each sequentially
handle a large number of those requests. What I am really asking is:
Are there python web frameworks that scale with hardware and how do
they handle scaling?
What is the difference if you have a process with 10 threads or 10
separate processes running in parallel? Apache is a good example of a
server that may be configured to use multiple processes to handle
requests. And from what I hear is scales just fine.

I think you are looking at the problem wrong. The fundamentals are the
same between threads and processes. You simply have a pool of workers
that handle requests. Any process is capable of handling any request.
The key to scalability is that the processes are persistent and not
forked for each request.

Please don't read this as "Python sucks, everyone should program in
machine language expressed as binary numbers". I am just asking where
the problems are.
The only real problem I have had with process pools is that sharing
resources is harder. It is harder to create things like connection
pools.
--
David
http://www.traceback.org
Jun 27 '08 #8
On May 19, 8:47 pm, James A. Donald <jam...@echeque .comwrote:
1. Looks to me that python will not scale to very large programs,
partly because of the lack of static typing, but mostly because there
is no distinction between creating a new variable and utilizing an
existing variable, so the interpreter fails to catch typos and name
collisions.
This factor is scale-neutral. You can expect the number of such bugs
to be proportional to the lines of code.

It might not scale up well if you engage in poor programming practives
(for example, importing lots of unqualified globals with tiny,
undescriptive names directly into every module's namespace), but if
you do that you have worse problems than accidental name collisions.

I am inclined to suspect that when a successful small
python program turns into a large python program, it rapidly reaches
ninety percent complete, and remains ninety percent complete forever.
Unlike most C++/Java/VB/Whatever programs which finish and ship, and
are never patched or improved or worked on ever again?

2. It is not clear to me how a python web application scales. Python
is inherently single threaded,
No it isn't.

It has some limitations in threading, but many programs make good use
of threads nonetheless. In fact for something like a web app Python's
threading limitations are relatively unimportant, since they tend to
be I/O-bound under heavy load.

[snip rest]
Carl Banks
Jun 27 '08 #9
1. Looks to me that python will not scale to very large programs,
partly because of the lack of static typing, but mostly because there
is no distinction between creating a new variable and utilizing an
existing variable,
Ben Finney
This seems quite a non sequitur. How do you see a connection between
these properties and "will not scale to large programs"?
The larger the program, the greater the likelihood of inadvertent name
collisions creating rare and irreproducible interactions between
different and supposedly independent parts of the program that each
work fine on their own, and supposedly cannot possibly interact.
These errors are a small subset of possible errors. If writing a large
program, an automated testing suite is essential, and can catch far
more errors than the compiler can hope to catch. If you run a static
code analyser, you'll be notified of unused names and other simple
errors that are often caught by static-declaration compilers.
That is handy, but the larger the program, the bigger the problem with
names that are over used, rather than unused.

--
----------------------
We have the right to defend ourselves and our property, because
of the kind of animals that we are. True law derives from this
right, not from the arbitrary power of the omnipotent state.

http://www.jim.com/ James A. Donald
Jun 27 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2354
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could. Or how can I replace the html &entities; in a string "blablabla&amp;blablabal&amp;balbalbal" with the chars they mean using re.sub? I found out they are stored in an dict . I though about this functionality:
4
2075
by: the.theorist | last post by:
Hey, I'm trying my hand and pyparsing a log file (named l.log): FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2
3
1306
by: Ant | last post by:
I have a home-grown Wiki that I created as an excercise, with it's own wiki markup (actually just a clone of the Trac wiki markup). The wiki text parser I wrote works nicely, but makes heavy use of regexes, tags and stacks to parse the text. As such it is a bit of a mantainability nightmare - adding new wiki constructs can be a bit painful. So I thought I'd look into the pyparsing module, but can't find a simple example of processing...
4
1590
by: Bytter | last post by:
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = "A"..."Z" | "a"..."z" literal = letter+ include_bool := "+" | "-" term = literal
13
2060
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching around a bit, I finally discovered a tiny link buried in some text at the top of the home page. 3) Link goes to sourceforge. At sourceforge, there was a nice, green 'download' button that stood out from the page. 4) I clicked on the download...
2
1984
by: Nathan Harmston | last post by:
Hi, I know this isnt the pyparsing list, but it doesnt seem like there is one. I m trying to use pyparsing to parse a file however I cant get the Optional keyword to work. My file generally looks like this: ALIGNMENT 1020 YS2-10a02.q1k chr09 1295 42 141045 142297 C 1254 95.06 1295 reject_bad_break 0 or this:
1
2647
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is very clumsy at the moment) : import string
3
1721
by: hubritic | last post by:
I am trying to parse data that looks like this: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 2BFA76F6 1208230607 T S SYSPROC SYSTEM SHUTDOWN BY USER A6D1BD62 1215230807 I H Firmware Event My problem is that sometimes there is a RESOURCE_NAME and sometimes not, so I wind up with "Firmware" as my RESOURCE_NAME and "Event" as
5
1491
by: Paul McGuire | last post by:
I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have accumulated in SVN, so time for a release! Here's what's new in Pyparsing 1.5.1: - Added __dir__() methods to ParseBaseException and ParseResults, to support new dir() behavior in Py2.6 and Py3.0. If dir() is called on a ParseResults object,...
0
8946
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8776
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9449
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9310
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9236
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8186
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6031
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4809
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3261
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.