file reading by record separator (not line by line)

Dear all,
I would like to read a really huge file that looks like this:

name1....

line_11
line_12
line_13
....

>name2 ...

line_21
line_22
....
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

many thanks
Lee

May 31 '07 #1

Subscribe Post Reply

2666

Lee Sander

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:

Dear all,
I would like toreada really hugefilethat looks like this:

name1....

line_11
line_12
line_13
...>name2 ...

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"

many thanks
Lee

May 31 '07 #2

aspineux

something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))

On 31 mai, 14:39, Lee Sander <lesa...@gmail.comwrote:

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:

Dear all,
I would like toreada really hugefilethat looks like this:

name1....

line_11
line_12
line_13
...>name2 ...

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"

many thanks
Lee

May 31 '07 #3

Tijs

Lee Sander wrote:

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).

def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines

if __name__ == '__main__':
from StringIO import StringIO
s = \
"""name1
line1
line2
line3

name2

line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)
$ python test.py
*** name1
line1
line2
line3

*** name2
line 4
line 5
line 6

--

Regards,
Tijs

May 31 '07 #4

Tijs

aspineux wrote:

>
something like

name=None
lines=[]
for line in open('yourfilename.txt'):
if line.startwith('>'):
if name!=None:
print 'Here is the record', name
print lines
print
name=line.stripr('\r')
lines=[]
else:
lines.append(line.stripr('\n'))

That would miss the last chunk.

--

Regards,
Tijs

May 31 '07 #5

Marc 'BlackJack' Rintsch

In <11**********************@g4g2000hsf.googlegroups. com>, Lee Sander
wrote:

Dear all,
I would like to read a really huge file that looks like this:

>name1....
line_11
line_12
line_13
...
>>name2 ...
line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can i read file so that every time i do a "read()" i get exactly
one record
up to the next ">"

There was just recently a thread with a `itertools.groupby()` solution.
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
counter = 0
for line in lines:
if line.startswith('>'):
counter += 1
yield (counter, line)
def iter_records(lines):
fst = itemgetter(0)
snd = itemgetter(1)
for dummy, record_lines in groupby(mark_records(lines), fst):
yield imap(snd, record_lines)
def main():
source = """\

name1....

line_11
line_12
line_13
....

name2 ...

line_21
line_22
....""".splitlines()

for record in iter_records(source):
print 'Start of record...'
for line in record:
print ':', line

Ciao,
Marc 'BlackJack' Rintsch

May 31 '07 #6

Hendrik van Rooyen

"Lee Sander" <le..e@gmail.com>wrote:

I wanted to also say that this file is really huge, so I cannot
just do a read() and then split on ">" to get a record
thanks
lee

On May 31, 1:26 pm, Lee Sander <lesa...@gmail.comwrote:
Dear all,
I would like toreada really hugefilethat looks like this:

name1....
line_11
line_12
line_13
...>name2 ...

line_21
line_22
...
etc

where line_ij is just a free form text on that line.

how can ireadfileso that every time i do a "read()" i get exactly
onerecord
up to the next ">"

many thanks
Lee

I would do something like: (not tested):

def get_a_record(f,sep):
ret_rec = ''
while True:
char = f.read(1)
if char == sep:
break
else:
ret_rec += char
return ret_rec

- Hendrik

Jun 1 '07 #7

by: Johnny Meredith | last post by:

I have seven huge fixed width text file that I need to import to Access. They contain headers, subtotals, etc. that are not needed. There is also some corrupt data that we know about and can...

Microsoft Access / VBA

How can I do sort program for unsorted setup file?

by: happy | last post by:

/* Book name : The prodessional programmers guide to C File name : E:\programs\tc\iti01\ch09\main\01setupm.c Program discription: file setuping -up -Version 01-ver01-W Logic ...

C / C++

Q: Newbee. Read and convert CSV file...

by: Martin Hvidberg | last post by:

Dear group I need to make a very simple piece of code in C, that can be command line executed and will compile on Linux, i.e. gcc. It should read a ascii Comma Separated Values (CSV) file and...

C / C++

How to fputc 'EOF' to a FILE stream.

by: oksuresh | last post by:

Hi talents, I have a situation where , I should keep on reading a FILE stream until a location. And I have to immediately write the EOF character , so that the rest of the file is cleared. ...

C / C++

end-of-file problem

by: Markus Pitha | last post by:

Hello, I read a simple bmp-file with this loop: while ( !feof(fp) ) { printf("%x\n", fgetc(fp)); } fclose(fp); Everything seems to be correct, but at the end of the file, I get a weird

C / C++

Problem writing struct out to file

by: rmr531 | last post by:

First of all I am very new to c++ so please bear with me. I am trying to create a program that keeps an inventory of items. I am trying to use a struct to store a product name, purchase price,...

C / C++

How to parse a file in C++

by: AdrianH | last post by:

Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming. FYI Although I have called...

C / C++

Streaming file IO and binary files

by: masood.iqbal | last post by:

Hi, Kindly excuse my novice question. In all the literature on ifstream that I have seen, nowhere have I read what happens if you try to read a binary file using the ">>" operator. I ran into...

C / C++

file transfer in servlet programming.

by: shyaminf | last post by:

hi everybody! iam facing a problem with the transfer of file using servlet programming. i have a code for uploading a file. but i'm unable to execute it using tomcat5.5 server. kindly help me how to...

Java

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

file reading by record separator (not line by line)

Similar topics