473,385 Members | 1,867 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Parsing by Line Data

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
..
..
..

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?
Jul 18 '05 #1
5 1426
python1 <py*****@spamless.net> writes:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example: 01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
. The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header. I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found. How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?


def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie
Jul 18 '05 #2
python1 wrote:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is
found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?


First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill
Jul 18 '05 #3
Eddie Corns wrote:
python1 <py*****@spamless.net> writes:

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:


01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.


The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.


I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.


How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie


Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)
Jul 18 '05 #4
Bill Dandreta wrote:
python1 wrote:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will need to be done by reading the '01' from the line ahead. Would
you need to read the whole file into a string and use a regex to break
where a '\n01' is found?

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill


Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.

Thanks again.
Jul 18 '05 #5
python1 <py*****@spamless.net>
(news:ca*********@enews3.newsguy.com) wrote:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.
I'd probably do something like
records = ('\n'+open('foo.data').read).split('\n01')

You can later do
structured=[record.split('\n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.
How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will
need to be done by reading the '01' from the line ahead. Would you
need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Todd Moyer | last post by:
I would like to use Python to parse a *python-like* data description language. That is, it would have it's own keywords, but would have a syntax like Python. For instance: Ob1 ('A'): Ob2...
5
by: Rafal 'Raf256' Maj | last post by:
Hi, I need to parse a file. This means reading from it as from std::istream. But - sometimes I also need to put-back some text I read before. What type of string can I use for that? Something...
4
by: ralphNOSPAM | last post by:
Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. ...
26
by: SL33PY | last post by:
Hi, I'm having a problem parsing strings (comming from a flat text input file) to doubles. the code: currentImportDetail.Result = CType(line.Substring(7, 8).Trim(" "), System.Double) What...
8
by: Eric Anderson | last post by:
I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
1
by: Thomas Kowalski | last post by:
Hi, I have to parse a plain, ascii text file (on local HD). Since the file might be many millions lines long I want to improve the efficiency of my parsing process. The resulting data structure...
9
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...
4
by: Dave Townsend | last post by:
Hi, I have to read some memory data from a stream. This would be in the following format, for example: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 that is i have 8 values on a line, separated...
5
by: Luis Zarrabeitia | last post by:
I need to parse a file, text file. The format is something like that: TYPE1 metadata data line 1 data line 2 .... data line N TYPE2 metadata data line 1 ....
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.