473,396 Members | 1,707 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

file reading by record separator (not line by line)

Dear all,
I would like to read a really huge file that looks like this:
name1....
line_11
line_12
line_13
....
>name2 ...
line_21
line_22
....
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

many thanks
Lee

May 31 '07 #1
6 2666
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:
Dear all,
I would like toreada really hugefilethat looks like this:
name1....

line_11
line_12
line_13
...>name2 ...

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"

many thanks
Lee

May 31 '07 #2

something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))

On 31 mai, 14:39, Lee Sander <lesa...@gmail.comwrote:
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:
Dear all,
I would like toreada really hugefilethat looks like this:
name1....
line_11
line_12
line_13
...>name2 ...
line_21
line_22
...
etc
where line_ij is just a free form text on that line.
how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"
many thanks
Lee

May 31 '07 #3
Lee Sander wrote:
I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee
Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).

def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines

if __name__ == '__main__':
from StringIO import StringIO
s = \
"""name1
line1
line2
line3
name2
line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)
$ python test.py
*** name1
line1
line2
line3

*** name2
line 4
line 5
line 6

--

Regards,
Tijs
May 31 '07 #4
aspineux wrote:
>
something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))
That would miss the last chunk.

--

Regards,
Tijs
May 31 '07 #5
In <11**********************@g4g2000hsf.googlegroups. com>, Lee Sander
wrote:
Dear all,
I would like to read a really huge file that looks like this:
>name1....
line_11
line_12
line_13
...
>>name2 ...
line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"
There was just recently a thread with a `itertools.groupby()` solution.
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
counter = 0
for line in lines:
if line.startswith('>'):
counter += 1
yield (counter, line)
def iter_records(lines):
fst = itemgetter(0)
snd = itemgetter(1)
for dummy, record_lines in groupby(mark_records(lines), fst):
yield imap(snd, record_lines)
def main():
source = """\
name1....
line_11
line_12
line_13
....
name2 ...
line_21
line_22
....""".splitlines()

for record in iter_records(source):
print 'Start of record...'
for line in record:
print ':', line

Ciao,
Marc 'BlackJack' Rintsch
May 31 '07 #6
"Lee Sander" <le..e@gmail.com>wrote:

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:
Dear all,
I would like toreada really hugefilethat looks like this:
name1....
line_11
line_12
line_13
...>name2 ...

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"

many thanks
Lee
I would do something like: (not tested):

def get_a_record(f,sep):
ret_rec = ''
while True:
char = f.read(1)
if char == sep:
break
else:
ret_rec += char
return ret_rec

- Hendrik

Jun 1 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Johnny Meredith | last post by:
I have seven huge fixed width text file that I need to import to Access. They contain headers, subtotals, etc. that are not needed. There is also some corrupt data that we know about and can...
3
by: happy | last post by:
/* Book name : The prodessional programmers guide to C File name : E:\programs\tc\iti01\ch09\main\01setupm.c Program discription: file setuping -up -Version 01-ver01-W Logic ...
4
by: Martin Hvidberg | last post by:
Dear group I need to make a very simple piece of code in C, that can be command line executed and will compile on Linux, i.e. gcc. It should read a ascii Comma Separated Values (CSV) file and...
12
by: oksuresh | last post by:
Hi talents, I have a situation where , I should keep on reading a FILE stream until a location. And I have to immediately write the EOF character , so that the rest of the file is cleared. ...
29
by: Markus Pitha | last post by:
Hello, I read a simple bmp-file with this loop: while ( !feof(fp) ) { printf("%x\n", fgetc(fp)); } fclose(fp); Everything seems to be correct, but at the end of the file, I get a weird
19
by: rmr531 | last post by:
First of all I am very new to c++ so please bear with me. I am trying to create a program that keeps an inventory of items. I am trying to use a struct to store a product name, purchase price,...
5
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming. FYI Although I have called...
3
by: masood.iqbal | last post by:
Hi, Kindly excuse my novice question. In all the literature on ifstream that I have seen, nowhere have I read what happens if you try to read a binary file using the ">>" operator. I ran into...
1
by: shyaminf | last post by:
hi everybody! iam facing a problem with the transfer of file using servlet programming. i have a code for uploading a file. but i'm unable to execute it using tomcat5.5 server. kindly help me how to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.