Ok, I'm quite new to Python

Michael

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is?
Any thoughts??

Many Thanks

Mike

Jul 18 '05 #1

Subscribe Post Reply

1424

Mike C. Fletcher

Michael wrote:

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant
fields:

Well, if you have (or know how to write) an EBNF grammar, SimpleParse
would likely be ideal for this. See the VRML97 sample grammar in
SimpleParse (or even the VRML97 loader in OpenGLContext for a more
real-world example).

Primary value of SimpleParse for this kind of thing is that it's fast
compared to most other Python parser generators while still being easy
to use. If you're loading largish (say 10s of MBs) models the speed can
be quite useful. (It was originally written explicitly to produce a
fast VRML97 parser (btw)).

If you're loading *huge* models (100s of MBs), you may need to go for a
C/C++ extension to directly convert from an on-disk buffer to objects,
but try it with the Python versions first. Even with 100s of MBs, you
can write SimpleParse grammars fast enough to parse them quite quickly,
it just requires a little more care with how you structure your productions.
but I have no idea what the best way to do this is?
Any thoughts??

Mostly it's just a matter of what you feel comfortable with. There's
quite a range of Python text-processing tools available. See the text
"Text Processing in Python" (available in both dead-tree and online
format) for extensive treatment of various approaches, from writing your
own recursive descent parsers through using one of the parser-generators.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com

Jul 18 '05 #2

Terry Reedy

"Michael" <sl***********@hotmail.com> wrote in message
news:ck**********@hercules.btinternet.com...

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in
relevant
fields:

A more useful-to-the-reader and possibly more fruitful-to-you subject line
would have been something like 'Need help parsing text files'.

tjr

Jul 18 '05 #3

Jeff Epler

Well, here's a sre-based scanner and recursive-descent parser based on
my understanding of the grammar you gave.

Using a real scanner and parser may or may not be a better choice, but
it's not hard in Python to create a scanner and write a
recursive-descent parser for a simple grammar.

Jeff

------------------------------------------------------------------------
# This code is in the public domain
class PeekableIterator:
def __init__(self, s):
self.s = iter(s)
self._peek = []

def atend(self):
try:
self.peek()
except StopIteration:
return True
return False

def peek(self):
if not self._peek: self._peek = [self.s.next()]
return self._peek[0]

def next(self):
if self._peek:
return self._peek.pop()
return self.s.next()

def __iter__(self): return self

def tok(scanner, s):
return s
def num(scanner, s):
try:
return int(s)
except ValueError:
return float(s)

import sre
scanner = sre.Scanner([
(r"/\*(?:[^*]|[*]+[^/])*\*/", None),
(r"\*?[A-Za-z_][A-Za-z0-9_]*", tok),
(r"//.*$", None),
(r"[0-9]*\.[0-9]+|[0-9]+\.?", num),
(r"[{}]", tok),
(r'"(?:[^\\"]|\\.)*"', tok),
(r"[ \t\r\n]*", None),
], sre.MULTILINE)

class Node:
def __init__(self, name):
self.name = name
self.contents = []
def add(self, v): self.contents.append(v)
def __str__(self):
sc = " ".join(map(repr, self.contents))
return "<%s: %s>" % (self.name, sc)
__repr__ = __str__

def parse_nodes(t):
n = []
while 1:
if t.peek() == "}":
t.next()
break
n.append(parse_node(t))
return n

def parse_contents(n, t):
if t.atend(): return
if t.peek() == "{":
t.next()
for n1 in parse_nodes(t):
n.add(n1)
while 1:
if t.atend(): break
if t.peek() == "}": break
if isinstance(p, basestring) and t.peek().startswith("*"): break
n.add(t.next())

def parse_node(t):
n = Node(t.next())
parse_contents(n, t)
return n

def parse_top(t):
nodes = []
while not t.atend():
yield parse_node(t)

import sys
def main(source = sys.stdin):
tokens, rest = scanner.scan(source.read())
if rest:
print "Garbage at end of file:", `rest`
for n in parse_top(PeekableIterator(tokens)):
print n

if __name__ == '__main__': main()
------------------------------------------------------------------------
$ python michael.py < michael.txt # and reindented for show
<*Version: 200>
<*SCENE: <AMBIENT_COLOUR: 0.0 0.0 0.0>>
<*MATERIAL_LIST: <*MATERIAL_COUNT: 0>>
<*GEOMOBJECT:
<*NODE_NAME: '"obj1"'>
<*MESH:
<*MESH_VERTEX_LIST:
<*MESH_VERTEX: 0 0 0 0>
<*MESH_VERTEX: 1 0 1 2>

<*MESH_FACE_LIST: <*MESH_FACE: 1 2 3>>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFBbIQVJd01MZaTXX0RAlmXAJ9CjRfV1w4NQo2wSBa4do ZSWuNvDQCeKzyd
Z0SHDzDLxFnacVGNf6PQmtE=
=s51L
-----END PGP SIGNATURE-----

Jul 18 '05 #4

Bengt Richter

On Wed, 13 Oct 2004 00:03:41 +0000 (UTC), "Michael" <sl***********@hotmail.com> wrote:

But i'm a good c++ programmer.

What i want to do is parse a text file and store the information in relevant ^^^^^^^^^^^^^^^^^[1] ^^^^^[2] ^^^^^^^^^^^[3] ^^^^^^^^-fields: -^^^^^[4]
[1] ok
[2] where?
[3] which
[4] relevant to what?
[5] ;-)
//Text File:

*Version 200
*SCENE {
AMBIENT_COLOUR 0.0 0.0 0.0
}

*MATERIAL_LIST{
*MATERIAL_COUNT 0
}
*GEOMOBJECT {
*NODE_NAME "obj1"
*MESH {
*MESH_VERTEX_LIST{
*MESH_VERTEX 0 0 0 0
*MESH_VERTEX 1 0 1 2
}
*MESH_FACE_LIST {
*MESH_FACE 1 2 3
}
}
}
/* ... More GEOMOBJECTS ...*/
but I have no idea what the best way to do this is? ^^^^^^^[1]
[1] do what?Any thoughts??

Id probably start eith stripping out the tokens with a regular expression
and then process the list to build a tree that you can then walk? To start:

data = """\ ... *Version 200
... *SCENE {
... AMBIENT_COLOUR 0.0 0.0 0.0
... }
...
... *MATERIAL_LIST{
... *MATERIAL_COUNT 0
... }
... *GEOMOBJECT {
... *NODE_NAME "obj1"
... *MESH {
... *MESH_VERTEX_LIST{
... *MESH_VERTEX 0 0 0 0
... *MESH_VERTEX 1 0 1 2
... }
... *MESH_FACE_LIST {
... *MESH_FACE 1 2 3
... }
... }
... }
... """
import re
rxs = re.compile(r'([{}]|"[^"]*"|[*A-Z_a-z]+|[0-9.]+)')
tokens = rxs.findall(data)
tokens

['*Version', '200', '*SCENE', '{', 'AMBIENT_COLOUR', '0.0', '0.0', '0.0', '}', '*MATERIAL_LIST',
'{', '*MATERIAL_COUNT', '0', '}', '*GEOMOBJECT', '{', '*NODE_NAME', '"obj1"', '*MESH', '{', '*M
ESH_VERTEX_LIST', '{', '*MESH_VERTEX', '0', '0', '0', '0', '*MESH_VERTEX', '1', '0', '1', '2', '
}', '*MESH_FACE_LIST', '{', '*MESH_FACE', '1', '2', '3', '}', '}', '}']

IWT that isolates the basic info of interest. It should not be hard to make a tree or
extract what suits your purposes, but I'm not going to guess what those are ;-)

Regards,
Bengt Richter

Jul 18 '05 #5

by: J. W. McCall | last post by:

Sorry again if this is OT; I'm not sure if this is a python problem or just a CGI problem, but I couldn't find a decent CGI NG. Let me know if there's somewhere else I should be posting. I got...

Python

Dr. Dobb's Python-URL! - weekly Python news and links (Dec 3)

by: Emile van Sebille | last post by:

QOTW: "If we get 2.3.3c1 out in early December, we could release 2.3.3 final before the end of the year, and start 2004 with a 100% bug-free codebase <wink>." -- Tim Peters "cjOr proWe vbCould...

Python

Dr. Dobb's Python-URL! - weekly Python news and links (Dec 8)

by: Emile van Sebille | last post by:

QOTW: "Have you ever used the copy module? I am *not* a beginner, and have used it *once* (and I can't remember what for, either)." -- Michael Hudson "It will likely take a little practice...

Python

Anyone recognize this numeric storage format - similar to "float", but not quite

by: geskerrett | last post by:

We are working on a project to decipher a record structure of an old accounting system that originates from the late80's mid-90's. We have come across a number format that appears to be a "float"...

Python

Opening a word file in Python [not quite solved]

by: ajkadri | last post by:

Folks, I have written a word frequency counter program in python that works well for .txt files; but it cannot handle .DOC files. Can someone help me to resolve this issue???

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Ok, I'm quite new to Python

Similar topics