Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:
01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
..
..
..
The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found? 5 1426
python1 <py*****@spamless.net> writes: Having slight trouble conceptualizing a way to write this script. The problem is that I have a bunch of lines in a file, for example:
01A\n 02B\n 01A\n 02B\n 02C\n 01A\n 02B\n . . .
The lines beginning with '01' are the 'header' records, whereas the lines beginning with '02' are detail. There can be several detail lines to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file data line by line, but in this case, determining the break-point will need to be done by reading the '01' from the line ahead. Would you need to read the whole file into a string and use a regex to break where a '\n01' is found?
def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec
inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)
Eddie
python1 wrote: ...lines in a file, for example:
01A\n 02B\n 01A\n 02B\n 02C\n 01A\n 02B\n . . .
The lines beginning with '01' are the 'header' records, whereas the lines beginning with '02' are detail. There can be several detail lines to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file data line by line, but in this case, determining the break-point will need to be done by reading the '01' from the line ahead. Would you need to read the whole file into a string and use a regex to break where a '\n01' is found?
First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):
myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()
mydict = {}
index = -1
for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1
You can easy extract the data with a nested loop.
Bill
Eddie Corns wrote: python1 <py*****@spamless.net> writes:
Having slight trouble conceptualizing a way to write this script. The problem is that I have a bunch of lines in a file, for example:
01A\n 02B\n 01A\n 02B\n 02C\n 01A\n 02B\n . . .
The lines beginning with '01' are the 'header' records, whereas the lines beginning with '02' are detail. There can be several detail lines to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file data line by line, but in this case, determining the break-point will need to be done by reading the '01' from the line ahead. Would you need to read the whole file into a string and use a regex to break where a '\n01' is found?
def gen_records(src): rec = [] for line in src: if line.startswith('01'): if rec: yield rec rec = [line] else: rec.append(line) if rec:yield rec
inf = file('input-file') for record in gen_records (inf): do_something_to_list (record)
Eddie
Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)
Bill Dandreta wrote: python1 wrote:
...lines in a file, for example:
01A\n 02B\n 01A\n 02B\n 02C\n 01A\n 02B\n . . .
The lines beginning with '01' are the 'header' records, whereas the lines beginning with '02' are detail. There can be several detail lines to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file data line by line, but in this case, determining the break-point will need to be done by reading the '01' from the line ahead. Would you need to read the whole file into a string and use a regex to break where a '\n01' is found?
First let me prface my remarks by saying I am not much of a programmer so this may not be the best way to solve this but I would use a dictionary someting like this (untested):
myinput = open(myfile,'r') lines = myinput.readlines() myinput.close()
mydict = {} index = -1
for l in lines: if l[0:2] == '01' counter = 0 index += 1 mydict[(index,counter)] = l[2:] else: mydict[(index,counter)] = l[2:] counter += 1
You can easy extract the data with a nested loop.
Bill
Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.
Thanks again.
python1 <py*****@spamless.net>
(news:ca*********@enews3.newsguy.com) wrote: Having slight trouble conceptualizing a way to write this script. The problem is that I have a bunch of lines in a file, for example:
01A\n 02B\n 01A\n 02B\n 02C\n 01A\n 02B\n . . .
The lines beginning with '01' are the 'header' records, whereas the lines beginning with '02' are detail. There can be several detail lines to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into one list, and breaking into another list when the next '01' record is found.
I'd probably do something like
records = ('\n'+open('foo.data').read).split('\n01')
You can later do
structured=[record.split('\n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.
How would you do this? I'm used to using 'readlines()' to pull the file data line by line, but in this case, determining the break-point will need to be done by reading the '01' from the line ahead. Would you need to read the whole file into a string and use a regex to break where a '\n01' is found? This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Todd Moyer |
last post by:
I would like to use Python to parse a *python-like* data description
language. That is, it would have it's own keywords, but would have a
syntax like Python. For instance:
Ob1 ('A'):
Ob2...
|
by: Rafal 'Raf256' Maj |
last post by:
Hi,
I need to parse a file. This means reading from it as from std::istream.
But - sometimes I also need to put-back some text I read before.
What type of string can I use for that? Something...
|
by: ralphNOSPAM |
last post by:
Is there a function or otherwise some way to pull out the target text
within an XML tag?
For example, in the XML tag below, I want to pull out 'CALIFORNIA'.
...
|
by: SL33PY |
last post by:
Hi,
I'm having a problem parsing strings (comming from a flat text input file)
to doubles.
the code:
currentImportDetail.Result = CType(line.Substring(7, 8).Trim(" "),
System.Double)
What...
|
by: Eric Anderson |
last post by:
I have some files that sit on a FTP server. These files contain data
stored in a tab-separated format. I need to download these files and
insert/update them in a MySQL database. My current basic...
|
by: ankitdesai |
last post by:
I would like to parse a couple of tables within an individual player's
SHTML page. For example, I would like to get the "Actual Pitching
Statistics" and the "Translated Pitching Statistics"...
|
by: Thomas Kowalski |
last post by:
Hi,
I have to parse a plain, ascii text file (on local HD). Since the file
might be many millions lines long I want to improve the efficiency of
my parsing process. The resulting data structure...
|
by: Paulers |
last post by:
Hello,
I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an...
|
by: Dave Townsend |
last post by:
Hi,
I have to read some memory data from a stream. This would be in the
following format, for example:
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
that is i have 8 values on a line, separated...
|
by: Luis Zarrabeitia |
last post by:
I need to parse a file, text file. The format is something like that:
TYPE1 metadata
data line 1
data line 2
....
data line N
TYPE2 metadata
data line 1
....
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |