Hello everybody, I'm new to python (...I work with cobol...)
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
Is there anything in python that parses this stuff?
thanks a lot
korovev 6 1428 ko*******@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
Is there anything in python that parses this stuff?
thanks a lot
korovev
That's not a standard dBaseIII data file though, correct? It looks more
like something that was produced *from* a dBaseIII file.
If the format is similar to Excel's CSV format then the csv module from
Python's standard library may well be what you want. Otherwise there are
parsers at all levels - one called PyParsing is quite popular, and I am
sure other readers will have their own suggestions.
I am not sure whether the pipe bars actually appear in your data file,
so it is difficult to know quite exactly what to suggest, but I would
play with the file in an interactive interpreter session first to see
whether csv can do the job.
Good luck with your escape from COBOL!
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
On 7 Ago, 09:21, Steve Holden <st...@holdenweb.comwrote:
That's not a standard dBaseIII data file though, correct? It looks more
like something that was produced *from* a dBaseIII file.
yeap... unfortunately it is not...
Good luck with your escape from COBOL!
i'm not escaping by now... Actually I'd like to use cobol for the rest
of my life (as a programmer) ;-)
But thanx anyway!
korovev ko*******@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
There are a number of relatively simple options that come to mind, including
regular expressions:
##### BEGIN CODE #####
import re
#
# dbIII.txt:
# |string|, |string|, |string|, |string|, |,1,2,3,4|, |other string|
#
handle = open('dbIII.txt')
for line in handle.xreadlines():
for match in re.finditer(r'\|\s*([^|]+)\s*\|,*', line):
for each in match.groups():
print each
handle.close()
##### END CODE #####
Without knowing what you need to do with the data, it's hard to suggest a better
method for parsing it. The above should work, provided that the data is always
in the format | data | with no pipe symbols in between the ones used as separators.
HTH,
-Jay
On Aug 7, 2:21 am, Steve Holden <st...@holdenweb.comwrote:
korove...@gmail.com wrote:
Hello everybody, I'm new to python (...I work with cobol...)
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
As Steve mentioned pyparsing, here is a pyparsing version for cracking
your data:
from pyparsing import *
data = "|string|, |string|, |string that may contain commas inside|,
1, 2, 3, |other string|"
integer = Word(nums)
# change unquoteResults to True to omit '|' chars from results
string = QuotedString("|", unquoteResults=False)
itemList = delimitedList( integer | string )
# parse the data and print out the results as a simple list
print itemList.parseString(data).asList()
# add a parse action to convert integer strings to actual integers
integer.setParseAction(lambda t:int(t[0]))
# reparse the data and now get converted integers in results
print itemList.parseString(data).asList()
Prints:
['|string|', '|string|', '|string that may contain commas inside|',
'1', '2', '3', '|other string|']
['|string|', '|string|', '|string that may contain commas inside|', 1,
2, 3, '|other string|']
-- Paul
On 8/7/07, ko*******@gmail.com <ko*******@gmail.comwrote:
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
The CSV module is probably the easiest way to go:
>>data = "|string|, |string|, |string that may contain commas
inside|, 1, 2, 3, |other string|"
>>import csv reader = csv.reader([data], quotechar="|", skipinitialspace=True) for row in reader:
print row
['string', 'string', 'string that may contain commas inside', '1',
'2', '3', 'other string']
--
Jerry
On 7 Ago, 17:47, "Jerry Hill" <malaclyp...@gmail.comwrote:
On 8/7/07, korove...@gmail.com <korove...@gmail.comwrote:
I have to parse a file (that is a dbIII file) whose stucture look like
this:
|string|, |string|, |string that may contain commas inside|, 1, 2, 3, |
other string|
The CSV module is probably the easiest way to go:
>data = "|string|, |string|, |string that may contain commas
inside|, 1, 2, 3, |other string|">>import csv
>reader = csv.reader([data], quotechar="|", skipinitialspace=True) for row in reader:
print row
['string', 'string', 'string that may contain commas inside', '1',
'2', '3', 'other string']
--
Jerry
you all were right, I had to mention that I must put the datas in
mysql... So actually the best way to do it is with csv.reader: i tried
it and it works out!
thanx very much! This discussion thread is closed Replies have been disabled for this discussion. Similar topics
3 posts
views
Thread by Willem Ligtenberg |
last post: by
|
2 posts
views
Thread by Cigdem |
last post: by
|
1 post
views
Thread by G.Esmeijer |
last post: by
|
1 post
views
Thread by Christoph Bisping |
last post: by
|
4 posts
views
Thread by Rick Walsh |
last post: by
|
3 posts
views
Thread by toton |
last post: by
|
13 posts
views
Thread by Chris Carlen |
last post: by
|
13 posts
views
Thread by charliefortune |
last post: by
|
2 posts
views
Thread by Felipe De Bene |
last post: by
| | | | | | | | | | |