By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,882 Members | 928 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,882 IT Pros & Developers. It's quick & easy.

Ok, I'm quite new to Python

P: n/a
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is?
Any thoughts??

Many Thanks

Mike


Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Michael wrote:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

Well, if you have (or know how to write) an EBNF grammar, SimpleParse
would likely be ideal for this. See the VRML97 sample grammar in
SimpleParse (or even the VRML97 loader in OpenGLContext for a more
real-world example).

Primary value of SimpleParse for this kind of thing is that it's fast
compared to most other Python parser generators while still being easy
to use. If you're loading largish (say 10s of MBs) models the speed can
be quite useful. (It was originally written explicitly to produce a
fast VRML97 parser (btw)).

If you're loading *huge* models (100s of MBs), you may need to go for a
C/C++ extension to directly convert from an on-disk buffer to objects,
but try it with the Python versions first. Even with 100s of MBs, you
can write SimpleParse grammars fast enough to parse them quite quickly,
it just requires a little more care with how you structure your productions.
but I have no idea what the best way to do this is?
Any thoughts??

Mostly it's just a matter of what you feel comfortable with. There's
quite a range of Python text-processing tools available. See the text
"Text Processing in Python" (available in both dead-tree and online
format) for extensive treatment of various approaches, from writing your
own recursive descent parsers through using one of the parser-generators.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com

Jul 18 '05 #2

P: n/a

"Michael" <sl***********@hotmail.com> wrote in message
news:ck**********@hercules.btinternet.com...
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in
relevant
fields:


A more useful-to-the-reader and possibly more fruitful-to-you subject line
would have been something like 'Need help parsing text files'.

tjr

Jul 18 '05 #3

P: n/a
Well, here's a sre-based scanner and recursive-descent parser based on
my understanding of the grammar you gave.

Using a real scanner and parser may or may not be a better choice, but
it's not hard in Python to create a scanner and write a
recursive-descent parser for a simple grammar.

Jeff

------------------------------------------------------------------------
# This code is in the public domain
class PeekableIterator:
def __init__(self, s):
self.s = iter(s)
self._peek = []

def atend(self):
try:
self.peek()
except StopIteration:
return True
return False

def peek(self):
if not self._peek: self._peek = [self.s.next()]
return self._peek[0]

def next(self):
if self._peek:
return self._peek.pop()
return self.s.next()

def __iter__(self): return self

def tok(scanner, s):
return s
def num(scanner, s):
try:
return int(s)
except ValueError:
return float(s)

import sre
scanner = sre.Scanner([
(r"/\*(?:[^*]|[*]+[^/])*\*/", None),
(r"\*?[A-Za-z_][A-Za-z0-9_]*", tok),
(r"//.*$", None),
(r"[0-9]*\.[0-9]+|[0-9]+\.?", num),
(r"[{}]", tok),
(r'"(?:[^\\"]|\\.)*"', tok),
(r"[ \t\r\n]*", None),
], sre.MULTILINE)

class Node:
def __init__(self, name):
self.name = name
self.contents = []
def add(self, v): self.contents.append(v)
def __str__(self):
sc = " ".join(map(repr, self.contents))
return "<%s: %s>" % (self.name, sc)
__repr__ = __str__

def parse_nodes(t):
n = []
while 1:
if t.peek() == "}":
t.next()
break
n.append(parse_node(t))
return n

def parse_contents(n, t):
if t.atend(): return
if t.peek() == "{":
t.next()
for n1 in parse_nodes(t):
n.add(n1)
while 1:
if t.atend(): break
if t.peek() == "}": break
if isinstance(p, basestring) and t.peek().startswith("*"): break
n.add(t.next())

def parse_node(t):
n = Node(t.next())
parse_contents(n, t)
return n

def parse_top(t):
nodes = []
while not t.atend():
yield parse_node(t)


import sys
def main(source = sys.stdin):
tokens, rest = scanner.scan(source.read())
if rest:
print "Garbage at end of file:", `rest`
for n in parse_top(PeekableIterator(tokens)):
print n

if __name__ == '__main__': main()
------------------------------------------------------------------------
$ python michael.py < michael.txt # and reindented for show
<*Version: 200>
<*SCENE: <AMBIENT_COLOUR: 0.0 0.0 0.0>>
<*MATERIAL_LIST: <*MATERIAL_COUNT: 0>>
<*GEOMOBJECT:
<*NODE_NAME: '"obj1"'>
<*MESH:
<*MESH_VERTEX_LIST:
<*MESH_VERTEX: 0 0 0 0>
<*MESH_VERTEX: 1 0 1 2>
<*MESH_FACE_LIST: <*MESH_FACE: 1 2 3>>


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBbIQVJd01MZaTXX0RAlmXAJ9CjRfV1w4NQo2wSBa4do ZSWuNvDQCeKzyd
Z0SHDzDLxFnacVGNf6PQmtE=
=s51L
-----END PGP SIGNATURE-----

Jul 18 '05 #4

P: n/a
On Wed, 13 Oct 2004 00:03:41 +0000 (UTC), "Michael" <sl***********@hotmail.com> wrote:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant ^^^^^^^^^^^^^^^^^[1] ^^^^^[2] ^^^^^^^^^^^[3] ^^^^^^^^-fields: -^^^^^[4]
[1] ok
[2] where?
[3] which
[4] relevant to what?
[5] ;-)
//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is? ^^^^^^^[1]
[1] do what?Any thoughts??

Id probably start eith stripping out the tokens with a regular expression
and then process the list to build a tree that you can then walk? To start:
data = """\ ... *Version 200
... *SCENE {
... AMBIENT_COLOUR 0.0 0.0 0.0
... }
...
... *MATERIAL_LIST{
... *MATERIAL_COUNT 0
... }
... *GEOMOBJECT {
... *NODE_NAME "obj1"
... *MESH {
... *MESH_VERTEX_LIST{
... *MESH_VERTEX 0 0 0 0
... *MESH_VERTEX 1 0 1 2
... }
... *MESH_FACE_LIST {
... *MESH_FACE 1 2 3
... }
... }
... }
... """
import re
rxs = re.compile(r'([{}]|"[^"]*"|[*A-Z_a-z]+|[0-9.]+)')
tokens = rxs.findall(data)
tokens

['*Version', '200', '*SCENE', '{', 'AMBIENT_COLOUR', '0.0', '0.0', '0.0', '}', '*MATERIAL_LIST',
'{', '*MATERIAL_COUNT', '0', '}', '*GEOMOBJECT', '{', '*NODE_NAME', '"obj1"', '*MESH', '{', '*M
ESH_VERTEX_LIST', '{', '*MESH_VERTEX', '0', '0', '0', '0', '*MESH_VERTEX', '1', '0', '1', '2', '
}', '*MESH_FACE_LIST', '{', '*MESH_FACE', '1', '2', '3', '}', '}', '}']

IWT that isolates the basic info of interest. It should not be hard to make a tree or
extract what suits your purposes, but I'm not going to guess what those are ;-)

Regards,
Bengt Richter
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.