Parsing by Line Data

python1

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
..
..
..

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

Jul 18 '05 #1

Subscribe Post Reply

1426

Eddie Corns

python1 <py*****@spamless.net> writes:

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example: 01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
. The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header. I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found. How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie

Jul 18 '05 #2

Bill Dandreta

python1 wrote:

...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is
found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

Jul 18 '05 #3

python1

Eddie Corns wrote:

python1 <py*****@spamless.net> writes:

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie

Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)

Jul 18 '05 #4

python1

Bill Dandreta wrote:

python1 wrote:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will need to be done by reading the '01' from the line ahead. Would
you need to read the whole file into a string and use a regex to break
where a '\n01' is found?

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.

Thanks again.

Jul 18 '05 #5

Mitja

python1 <py*****@spamless.net>
(news:ca*********@enews3.newsguy.com) wrote:

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.
I'd probably do something like
records = ('\n'+open('foo.data').read).split('\n01')

You can later do
structured=[record.split('\n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.
How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will
need to be done by reading the '01' from the line ahead. Would you
need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

Jul 18 '05 #6

Similar topics

parsing

by: Todd Moyer | last post by:

I would like to use Python to parse a *python-like* data description language. That is, it would have it's own keywords, but would have a syntax like Python. For instance: Ob1 ('A'): Ob2...

Python

read/write stream - parsing

by: Rafal 'Raf256' Maj | last post by:

Hi, I need to parse a file. This means reading from it as from std::istream. But - sometimes I also need to put-back some text I read before. What type of string can I use for that? Something...

C / C++

Parsing XML Tags Help

by: ralphNOSPAM | last post by:

Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. ...

PHP

Parsing doubles in vb.NET

by: SL33PY | last post by:

Hi, I'm having a problem parsing strings (comming from a flat text input file) to doubles. the code: currentImportDetail.Result = CType(line.Substring(7, 8).Trim(" "), System.Double) What...

Visual Basic .NET

Stream from FTP directly to MySQL while parsing CSV

by: Eric Anderson | last post by:

I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...

PHP

Parsing Baseball Stats

by: ankitdesai | last post by:

I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...

Python

Performance File Parsing

by: Thomas Kowalski | last post by:

Hi, I have to parse a plain, ascii text file (on local HD). Since the file might be many millions lines long I want to improve the efficiency of my parsing process. The resulting data structure...

C / C++

Need help with parsing a multilined log file into objects

by: Paulers | last post by:

Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...

Visual Basic .NET

parsing with std::istringstream.

by: Dave Townsend | last post by:

Hi, I have to read some memory data from a stream. This would be in the following format, for example: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 that is i have 8 values on a line, separated...

C / C++

Parsing a file with iterators

by: Luis Zarrabeitia | last post by:

I need to parse a file, text file. The format is something like that: TYPE1 metadata data line 1 data line 2 .... data line N TYPE2 metadata data line 1 ....

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing