473,386 Members | 1,712 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Ok, I'm quite new to Python

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is?
Any thoughts??

Many Thanks

Mike


Jul 18 '05 #1
4 1424
Michael wrote:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

Well, if you have (or know how to write) an EBNF grammar, SimpleParse
would likely be ideal for this. See the VRML97 sample grammar in
SimpleParse (or even the VRML97 loader in OpenGLContext for a more
real-world example).

Primary value of SimpleParse for this kind of thing is that it's fast
compared to most other Python parser generators while still being easy
to use. If you're loading largish (say 10s of MBs) models the speed can
be quite useful. (It was originally written explicitly to produce a
fast VRML97 parser (btw)).

If you're loading *huge* models (100s of MBs), you may need to go for a
C/C++ extension to directly convert from an on-disk buffer to objects,
but try it with the Python versions first. Even with 100s of MBs, you
can write SimpleParse grammars fast enough to parse them quite quickly,
it just requires a little more care with how you structure your productions.
but I have no idea what the best way to do this is?
Any thoughts??

Mostly it's just a matter of what you feel comfortable with. There's
quite a range of Python text-processing tools available. See the text
"Text Processing in Python" (available in both dead-tree and online
format) for extensive treatment of various approaches, from writing your
own recursive descent parsers through using one of the parser-generators.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com

Jul 18 '05 #2

"Michael" <sl***********@hotmail.com> wrote in message
news:ck**********@hercules.btinternet.com...
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in
relevant
fields:


A more useful-to-the-reader and possibly more fruitful-to-you subject line
would have been something like 'Need help parsing text files'.

tjr

Jul 18 '05 #3
Well, here's a sre-based scanner and recursive-descent parser based on
my understanding of the grammar you gave.

Using a real scanner and parser may or may not be a better choice, but
it's not hard in Python to create a scanner and write a
recursive-descent parser for a simple grammar.

Jeff

------------------------------------------------------------------------
# This code is in the public domain
class PeekableIterator:
def __init__(self, s):
self.s = iter(s)
self._peek = []

def atend(self):
try:
self.peek()
except StopIteration:
return True
return False

def peek(self):
if not self._peek: self._peek = [self.s.next()]
return self._peek[0]

def next(self):
if self._peek:
return self._peek.pop()
return self.s.next()

def __iter__(self): return self

def tok(scanner, s):
return s
def num(scanner, s):
try:
return int(s)
except ValueError:
return float(s)

import sre
scanner = sre.Scanner([
(r"/\*(?:[^*]|[*]+[^/])*\*/", None),
(r"\*?[A-Za-z_][A-Za-z0-9_]*", tok),
(r"//.*$", None),
(r"[0-9]*\.[0-9]+|[0-9]+\.?", num),
(r"[{}]", tok),
(r'"(?:[^\\"]|\\.)*"', tok),
(r"[ \t\r\n]*", None),
], sre.MULTILINE)

class Node:
def __init__(self, name):
self.name = name
self.contents = []
def add(self, v): self.contents.append(v)
def __str__(self):
sc = " ".join(map(repr, self.contents))
return "<%s: %s>" % (self.name, sc)
__repr__ = __str__

def parse_nodes(t):
n = []
while 1:
if t.peek() == "}":
t.next()
break
n.append(parse_node(t))
return n

def parse_contents(n, t):
if t.atend(): return
if t.peek() == "{":
t.next()
for n1 in parse_nodes(t):
n.add(n1)
while 1:
if t.atend(): break
if t.peek() == "}": break
if isinstance(p, basestring) and t.peek().startswith("*"): break
n.add(t.next())

def parse_node(t):
n = Node(t.next())
parse_contents(n, t)
return n

def parse_top(t):
nodes = []
while not t.atend():
yield parse_node(t)


import sys
def main(source = sys.stdin):
tokens, rest = scanner.scan(source.read())
if rest:
print "Garbage at end of file:", `rest`
for n in parse_top(PeekableIterator(tokens)):
print n

if __name__ == '__main__': main()
------------------------------------------------------------------------
$ python michael.py < michael.txt # and reindented for show
<*Version: 200>
<*SCENE: <AMBIENT_COLOUR: 0.0 0.0 0.0>>
<*MATERIAL_LIST: <*MATERIAL_COUNT: 0>>
<*GEOMOBJECT:
<*NODE_NAME: '"obj1"'>
<*MESH:
<*MESH_VERTEX_LIST:
<*MESH_VERTEX: 0 0 0 0>
<*MESH_VERTEX: 1 0 1 2>
<*MESH_FACE_LIST: <*MESH_FACE: 1 2 3>>


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBbIQVJd01MZaTXX0RAlmXAJ9CjRfV1w4NQo2wSBa4do ZSWuNvDQCeKzyd
Z0SHDzDLxFnacVGNf6PQmtE=
=s51L
-----END PGP SIGNATURE-----

Jul 18 '05 #4
On Wed, 13 Oct 2004 00:03:41 +0000 (UTC), "Michael" <sl***********@hotmail.com> wrote:
But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant ^^^^^^^^^^^^^^^^^[1] ^^^^^[2] ^^^^^^^^^^^[3] ^^^^^^^^-fields: -^^^^^[4]
[1] ok
[2] where?
[3] which
[4] relevant to what?
[5] ;-)
//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is? ^^^^^^^[1]
[1] do what?Any thoughts??

Id probably start eith stripping out the tokens with a regular expression
and then process the list to build a tree that you can then walk? To start:
data = """\ ... *Version 200
... *SCENE {
... AMBIENT_COLOUR 0.0 0.0 0.0
... }
...
... *MATERIAL_LIST{
... *MATERIAL_COUNT 0
... }
... *GEOMOBJECT {
... *NODE_NAME "obj1"
... *MESH {
... *MESH_VERTEX_LIST{
... *MESH_VERTEX 0 0 0 0
... *MESH_VERTEX 1 0 1 2
... }
... *MESH_FACE_LIST {
... *MESH_FACE 1 2 3
... }
... }
... }
... """
import re
rxs = re.compile(r'([{}]|"[^"]*"|[*A-Z_a-z]+|[0-9.]+)')
tokens = rxs.findall(data)
tokens

['*Version', '200', '*SCENE', '{', 'AMBIENT_COLOUR', '0.0', '0.0', '0.0', '}', '*MATERIAL_LIST',
'{', '*MATERIAL_COUNT', '0', '}', '*GEOMOBJECT', '{', '*NODE_NAME', '"obj1"', '*MESH', '{', '*M
ESH_VERTEX_LIST', '{', '*MESH_VERTEX', '0', '0', '0', '0', '*MESH_VERTEX', '1', '0', '1', '2', '
}', '*MESH_FACE_LIST', '{', '*MESH_FACE', '1', '2', '3', '}', '}', '}']

IWT that isolates the basic info of interest. It should not be hard to make a tree or
extract what suits your purposes, but I'm not going to guess what those are ;-)

Regards,
Bengt Richter
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: J. W. McCall | last post by:
Sorry again if this is OT; I'm not sure if this is a python problem or just a CGI problem, but I couldn't find a decent CGI NG. Let me know if there's somewhere else I should be posting. I got...
1
by: Emile van Sebille | last post by:
QOTW: "If we get 2.3.3c1 out in early December, we could release 2.3.3 final before the end of the year, and start 2004 with a 100% bug-free codebase <wink>." -- Tim Peters "cjOr proWe vbCould...
0
by: Emile van Sebille | last post by:
QOTW: "Have you ever used the copy module? I am *not* a beginner, and have used it *once* (and I can't remember what for, either)." -- Michael Hudson "It will likely take a little practice...
5
by: geskerrett | last post by:
We are working on a project to decipher a record structure of an old accounting system that originates from the late80's mid-90's. We have come across a number format that appears to be a "float"...
4
by: ajkadri | last post by:
Folks, I have written a word frequency counter program in python that works well for .txt files; but it cannot handle .DOC files. Can someone help me to resolve this issue???
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.