pyparsing and svg

Donn Ingle

Hi - I have been trying, but I need some help:
Here's my test code to parse an SVG path element, can anyone steer me right?

d1="""
M 209.12237 , 172.2415
L 286.76739 , 153.51369
L 286.76739 , 275.88534
L 209.12237 , 294.45058
L 209.12237 , 172.2415
z """

#Try it with no enters
d1="""M 209.12237,172.2415 L 286.76739,153.51369 L 286.76739,275.88534 L
209.12237,294.45058 L 209.12237,172.2415 z """

#Try it with no spaces
d1="""M209.12237,172.2415L286.76739,153.51369L286. 76739,275.88534L209.12237,294.45058L209.12237,172. 2415z"""

#For later, has more commands
d2="""
M 269.78326 , 381.27104
C 368.52151 , 424.27023
90.593578 , -18.581883
90.027729 , 129.28708
C 89.461878 , 277.15604
171.04501 , 338.27184
269.78326 , 381.27104
z """

## word :: M, L, C, Z
## number :: group of numbers
## dot :: "."
## comma :: ","
## couple :: number dot number comma number dot number
## command :: word
##
## phrase :: command couple

from pyparsing import Word, Literal, alphas, nums, Optional, OneOrMore

command = Word("MLCZ")
comma = Literal(",")
dot = Literal(".")
float = nums + dot + nums
couple = float + comma + float

phrase = OneOrMore(command + OneOrMore(couple | ( couple + couple ) ) )

print phrase

print phrase.parseString(d1.upper())

Thanks
\d

Nov 8 '07 #1

Subscribe Post Reply

3964

Paul McGuire

On Nov 8, 3:14 am, Donn Ingle <donn.in...@gmail.comwrote:

float = nums + dot + nums

Should be:

float = Combine(Word(nums) + dot + Word(nums))

nums is a string that defines the set of numeric digits for composing
Word instances. nums is not an expression by itself.

For that matter, I see in your later tests that some values have a
leading minus sign, so you should really go with:

float = Combine(Optional("-") + Word(nums) + dot + Word(nums))

Some other comments:

1. Read up on the Word class, you are not using it quite right.

command = Word("MLCZ")

will work with your test set, but it is not the form I would choose.
Word(characterstring) will match any "word" made up of the characters
in the input string. So Word("MLCZ") will match
M
L
C
Z
MM
LC
MCZL
MMLCLLZCZLLM

I would suggest instead using:

command = Literal("M") | "L" | "C" | "Z"

or

command = oneOf("M L C Z")

2. Change comma to

comma = Literal(",").suppress()

The comma is important to the parsing process, but the ',' token is
not much use in the returned set of matched tokens, get rid of it (by
using suppress).

3. Group your expressions, such as

couple = Group(float + comma + float)

It will really simplify getting at the resulting parsed tokens.
4. What is the purpose of (couple + couple)? This is sufficient:

phrase = OneOrMore(command + Group(OneOrMore(couple)) )

(Note use of Group to return the coord pairs as a sublist.)
5. Results names!

phrase = OneOrMore(command("command") + Group(OneOrMore(couple))
("coords") )

will allow you to access these fields by name instead of by index.
This will make your parser code *way* more readable.
-- Paul

Nov 8 '07 #2

Arkanes

Paul McGuire wrote:

On Nov 8, 3:14 am, Donn Ingle <donn.in...@gmail.comwrote:

>float = nums + dot + nums

Should be:

float = Combine(Word(nums) + dot + Word(nums))

nums is a string that defines the set of numeric digits for composing
Word instances. nums is not an expression by itself.

For that matter, I see in your later tests that some values have a
leading minus sign, so you should really go with:

float = Combine(Optional("-") + Word(nums) + dot + Word(nums))

I have a working path data parser (in pyparsing) at
http://code.google.com/p/wxpsvg.

Parsing the numeric values initially gave me a lot of trouble - I
translated the BNF in the spec literally and there was a *ton* of
backtracking going on with every numeric value. I ended up using a more
generous grammar, and letting pythons float() reject invalid values.

I couldn't get repeating path elements (like M 100 100 200 200, which is
the same as M 100 100 M 200 200) working right in the grammar, so I
expand those with post-processing.

The parser itself can be seen at
http://wxpsvg.googlecode.com/svn/trunk/svg/pathdata.py

Some other comments:

1. Read up on the Word class, you are not using it quite right.

command = Word("MLCZ")

will work with your test set, but it is not the form I would choose.
Word(characterstring) will match any "word" made up of the characters
in the input string. So Word("MLCZ") will match
M
L
C
Z
MM
LC
MCZL
MMLCLLZCZLLM

I would suggest instead using:

command = Literal("M") | "L" | "C" | "Z"

or

command = oneOf("M L C Z")

2. Change comma to

comma = Literal(",").suppress()

The comma is important to the parsing process, but the ',' token is
not much use in the returned set of matched tokens, get rid of it (by
using suppress).

3. Group your expressions, such as

couple = Group(float + comma + float)

It will really simplify getting at the resulting parsed tokens.
4. What is the purpose of (couple + couple)? This is sufficient:

phrase = OneOrMore(command + Group(OneOrMore(couple)) )

(Note use of Group to return the coord pairs as a sublist.)
5. Results names!

phrase = OneOrMore(command("command") + Group(OneOrMore(couple))
("coords") )

will allow you to access these fields by name instead of by index.
This will make your parser code *way* more readable.
-- Paul

Nov 8 '07 #3

Similar topics

Saving search results in a dictionary

by: Lukas Holcik | last post by:

Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....

Python

trouble pyparsing

by: the.theorist | last post by:

Hey, I'm trying my hand and pyparsing a log file (named l.log): FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2

Python

Parsing files -- pyparsing to the rescue?

by: rh0dium | last post by:

Hi all, I have a file which I need to parse and I need to be able to break it down by sections. I know it's possible but I can't seem to figure this out. The sections are broken by <> with...

Python

PyParsing and Headaches

by: Bytter | last post by:

Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = "A"..."Z" | "a"..."z" literal = letter+ include_bool := "+"...

Python

pyparsing Catch-22

by: 7stud | last post by:

To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...

Python

Help With PyParsing of output from win32pdhutil.ShowAllProcesses()

by: Steve | last post by:

Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...

Python

help with pyparsing

by: Neal Becker | last post by:

I'm just trying out pyparsing. I get stack overflow on my first try. Any help? #/usr/bin/python from pyparsing import Word, alphas, QuotedString, OneOrMore, delimitedList first_line = ''...

Python

Is pyparsing really a recursive descent parser?

by: Just Another Victim of the Ambient Morality | last post by:

Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...

Python

pyparsing question

by: hubritic | last post by:

I am trying to parse data that looks like this: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 2BFA76F6 1208230607 T S SYSPROC SYSTEM SHUTDOWN BY USER...

Python

ANN: pyparsing 1.5.1 released

by: Paul McGuire | last post by:

I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have...

Python

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp