By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 935 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

parsing a dbIII file

P: n/a
Hello everybody, I'm new to python (...I work with cobol...)

I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|

Is there anything in python that parses this stuff?
thanks a lot
korovev

Aug 7 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
ko*******@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)

I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|

Is there anything in python that parses this stuff?
thanks a lot
korovev
That's not a standard dBaseIII data file though, correct? It looks more
like something that was produced *from* a dBaseIII file.

If the format is similar to Excel's CSV format then the csv module from
Python's standard library may well be what you want. Otherwise there are
parsers at all levels - one called PyParsing is quite popular, and I am
sure other readers will have their own suggestions.

I am not sure whether the pipe bars actually appear in your data file,
so it is difficult to know quite exactly what to suggest, but I would
play with the file in an interactive interpreter session first to see
whether csv can do the job.

Good luck with your escape from COBOL!

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Aug 7 '07 #2

P: n/a
On 7 Ago, 09:21, Steve Holden <st...@holdenweb.comwrote:
That's not a standard dBaseIII data file though, correct? It looks more
like something that was produced *from* a dBaseIII file.
yeap... unfortunately it is not...
Good luck with your escape from COBOL!
i'm not escaping by now... Actually I'd like to use cobol for the rest
of my life (as a programmer) ;-)
But thanx anyway!

korovev

Aug 7 '07 #3

P: n/a
ko*******@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)

I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
There are a number of relatively simple options that come to mind, including
regular expressions:

##### BEGIN CODE #####

import re

#
# dbIII.txt:
# |string|, |string|, |string|, |string|, |,1,2,3,4|, |other string|
#
handle = open('dbIII.txt')
for line in handle.xreadlines():
for match in re.finditer(r'\|\s*([^|]+)\s*\|,*', line):
for each in match.groups():
print each

handle.close()

##### END CODE #####
Without knowing what you need to do with the data, it's hard to suggest a better
method for parsing it. The above should work, provided that the data is always
in the format | data | with no pipe symbols in between the ones used as separators.

HTH,

-Jay
Aug 7 '07 #4

P: n/a
On Aug 7, 2:21 am, Steve Holden <st...@holdenweb.comwrote:
korove...@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
As Steve mentioned pyparsing, here is a pyparsing version for cracking
your data:

from pyparsing import *

data = "|string|, |string|, |string that may contain commas inside|,
1, 2, 3, |other string|"

integer = Word(nums)
# change unquoteResults to True to omit '|' chars from results
string = QuotedString("|", unquoteResults=False)
itemList = delimitedList( integer | string )

# parse the data and print out the results as a simple list
print itemList.parseString(data).asList()

# add a parse action to convert integer strings to actual integers
integer.setParseAction(lambda t:int(t[0]))

# reparse the data and now get converted integers in results
print itemList.parseString(data).asList()

Prints:

['|string|', '|string|', '|string that may contain commas inside|',
'1', '2', '3', '|other string|']
['|string|', '|string|', '|string that may contain commas inside|', 1,
2, 3, '|other string|']

-- Paul

Aug 7 '07 #5

P: n/a
On 8/7/07, ko*******@gmail.com <ko*******@gmail.comwrote:
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
The CSV module is probably the easiest way to go:
>>data = "|string|, |string|, |string that may contain commas
inside|, 1, 2, 3, |other string|"
>>import csv
reader = csv.reader([data], quotechar="|", skipinitialspace=True)
for row in reader:
print row

['string', 'string', 'string that may contain commas inside', '1',
'2', '3', 'other string']

--
Jerry
Aug 7 '07 #6

P: n/a
On 7 Ago, 17:47, "Jerry Hill" <malaclyp...@gmail.comwrote:
On 8/7/07, korove...@gmail.com <korove...@gmail.comwrote:
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|

The CSV module is probably the easiest way to go:
>data = "|string|, |string|, |string that may contain commas

inside|, 1, 2, 3, |other string|">>import csv
>reader = csv.reader([data], quotechar="|", skipinitialspace=True)
for row in reader:

print row

['string', 'string', 'string that may contain commas inside', '1',
'2', '3', 'other string']

--
Jerry

you all were right, I had to mention that I must put the datas in
mysql... So actually the best way to do it is with csv.reader: i tried
it and it works out!

thanx very much!

Aug 8 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.