473,472 Members | 2,153 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Python's CSV reader

I'm fairly new to python and am working on parsing some delimited text
files. I noticed that there's a nice CSV reading/writing module
included in the libraries.

My data files however, are odd in that they are composed of lines with
alternating formats. (Essentially the rows are a header record and a
corresponding detail record on the next line. Each line type has a
different number of fields.)

Can the CSV module be coerced to read two line formats at once or am I
better off using read and split?

Thanks for your insight,
Stephan

Aug 4 '05 #1
8 27607
Stephan wrote:
Can the CSV module be coerced to read two line formats at once or am I
better off using read and split?


Well, readlines/split really isn't bad. So long as the file fits
comfortably in memory:

fi = open(file)
lines = fi.readlines()
evens = iter(lines[0::2])
odds = iter(lines[1::2])
csv1 = csv.reader(evens)
csv2 = csv.reader(odds)

The trick is that the "csvfile" in the CSV object doesn't have to be a
real file, it just has to be an iterator that returns strings. If the
file's too big to fit in memory, you could piece together a pair of
iterators that execute read() on the file appropriately.
Aug 4 '05 #2
Stephan wrote:
Can the CSV module be coerced to read two line formats at once or am I
better off using read and split?


Yes, it can:

import csv
import sys

reader = csv.reader(sys.stdin)

while True:
try:
names = reader.next()
values = reader.next()
except StopIteration:
break
print dict(zip(names, values))

Python offers an elegant way to do the same using the zip() or
itertools.izip() function:

import csv
import sys
from itertools import izip

reader = csv.reader(sys.stdin)

for names, values in izip(reader, reader):
print dict(izip(names, values))

Now let's add some minimal error checking, and we are done:

import csv
import sys
from itertools import izip, chain

def check_orphan():
raise Exception("Unexpected end of input")
yield None

reader = csv.reader(sys.stdin)
for names, values in izip(reader, chain(reader, check_orphan())):
if len(names) != len(values):
if len(names) > len(values):
raise Exception("More names than values")
else:
raise Exception("More values than names")
print dict(izip(names, values))

Peter

Aug 4 '05 #3
In article <11**********************@z14g2000cwz.googlegroups .com>,
Stephan <us***********@gmail.com> writes
I'm fairly new to python and am working on parsing some delimited text
files. I noticed that there's a nice CSV reading/writing module
included in the libraries.

My data files however, are odd in that they are composed of lines with
alternating formats. (Essentially the rows are a header record and a
corresponding detail record on the next line. Each line type has a
different number of fields.)

Can the CSV module be coerced to read two line formats at once or am I
better off using read and split?

Thanks for your insight,
Stephan


The csv module should be suitable. The reader just takes each line,
parses it, then returns a list of strings. It doesn't matter if
different lines have different numbers of fields.

To get an idea of what I mean, try something like the following
(untested):

import csv

reader = csv.reader(open(filename))

while True:

# Read next "header" line, if there isn't one then exit the
loop
header = reader.next()
if not header: break

# Assume that there is a "detail" line if the preceding
# "header" line exists
detail = reader.next()

# Print the parsed data
print '-' * 40
print "Header (%d fields): %s" % (len(header), header)
print "Detail (%d fields): %s" % (len(detail), detail)

You could wrap this up into a class which returns (header, detail) pairs
and does better error handling, but the above code should illustrate the
basics.

--
Andrew McLean
Aug 4 '05 #4
Thank you all for these interesting examples and methods!

Supposing I want to use DictReader to bring in the CSV lines and tie
them to field names, (again, with alternating lines having different
fields), should I use two two DictReaders as in Christopher's example
or is there a better way?

--
Stephan

Aug 4 '05 #5
Stephan wrote:
Thank you all for these interesting examples and methods!
You're welcome.
Supposing I want to use DictReader to bring in the CSV lines and tie
them to field names, (again, with alternating lines having different
fields), should I use two two DictReaders as in Christopher's example
or is there a better way?


For a clean design you would need not just two DictReader instances, but one
DictReader for every two lines.
However, with the current DictReader implementation, the following works,
too:

import csv
import sys

reader = csv.DictReader(sys.stdin)

for record in reader:
print record
reader.fieldnames = None

Peter

Aug 4 '05 #6
In article <11**********************@o13g2000cwo.googlegroups .com>,
Stephan <us***********@gmail.com> writes
Thank you all for these interesting examples and methods!


You are welcome. One point. I think there have been at least two
different interpretations of precisely what you task is.

I had assumed that all the different "header" lines contained data for
the same fields in the same order, and similarly that all the "detail"
lines contained data for the same fields in the same order.

However, I think Peter has answered on the basis that you have records
consisting of pairs of lines, the first line being a header containing
field names specific to that record with the second line containing the
corresponding data.

It would help of you let us know which (if any) was correct.

--
Andrew McLean
Aug 5 '05 #7
Andrew McLean wrote:
You are welcome. One point. I think there have been at least two
different interpretations of precisely what you task is.

I had assumed that all the different "header" lines contained data for
the same fields in the same order, and similarly that all the "detail"
lines contained data for the same fields in the same order.


Indeed, you are correct. Peter's version is interesting in its own
right, but not precisely what I had in mind. However, from his example
I saw what I was missing: I didn't realize that you could reassign the
DictReader field names on the fly. Here is a rudimentary example of my
working code and the data it can parse.

-------------------------------------
John|Smith
Beef|Potatos|Dinner Roll|Ice Cream
Susan|Jones
Chicken|Peas|Biscuits|Cake
Roger|Miller
Pork|Salad|Muffin|Cookies
-------------------------------------

import csv

HeaderFields = ["First Name", "Last Name"]
DetailFields = ["Entree", "Side Dish", "Starch", "Desert"]

reader = csv.DictReader(open("testdata.txt"), [], delimiter="|")

while True:
try:
# Read next "header" line (if there isn't one then exit the
loop)
reader.fieldnames = HeaderFields
header = reader.next()

# Read the next "detail" line
reader.fieldnames = DetailFields
detail = reader.next()

# Print the parsed data
print '-' * 40
print "Header (%d fields): %s" % (len(header), header)
print "Detail (%d fields): %s" % (len(detail), detail)

except StopIteration: break

Regards,
-Stephan

Aug 8 '05 #8
Stephan wrote:
DictReader field names on the fly. Here is a rudimentary example of my
working code and the data it can parse.

-------------------------------------
John|Smith
Beef|Potatos|Dinner Roll|Ice Cream
Susan|Jones
Chicken|Peas|Biscuits|Cake
Roger|Miller
Pork|Salad|Muffin|Cookies
-------------------------------------


That sample data would have been valuable information in your original post.
Here's what becomes of your code if you apply the "zip trick" from my first
post (yes, I am sometimes stubborn):

import itertools
import csv

HeaderFields = ["First Name", "Last Name"]
DetailFields = ["Entree", "Side Dish", "Starch", "Desert"]

instream = open("testdata.txt")

heads = csv.DictReader(instream, HeaderFields, delimiter="|")
details = csv.DictReader(instream, DetailFields, delimiter="|")

for header, detail in itertools.izip(heads, details):
print "Header (%d fields): %s" % (len(header), header)
print "Detail (%d fields): %s" % (len(detail), detail)

Peter

Aug 8 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Ron Stephens | last post by:
The newly rechristened Python Learning Foundation is a web site dedicated to the assistance of people learning the Python programming language. Features include: 1. Daily lists of new and recent...
25
by: Xah Lee | last post by:
Python Doc Problem Example: gzip Xah Lee, 20050831 Today i need to use Python to compress/decompress gzip files. Since i've read the official Python tutorial 8 months ago, have spent 30...
16
by: Thomas Nelson | last post by:
I just purchased a new macbook (os 10.4.6), and I'm trying to install python 2.4 on it. I downloaded and ran the two installers recommended at http://www.python.org/download/mac/. Now I have...
5
by: Michael Sperlle | last post by:
Is it possible? Bestcrypt can supposedly be set up on linux, but it seems to need changes to the kernel before it can be installed, and I have no intention of going through whatever hell that would...
0
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 430 open ( -4) / 3447 closed (+17) / 3877 total (+13) Bugs : 922 open ( -7) / 6316 closed (+31) / 7238 total (+24) RFE : 245 open...
17
by: krishnakant Mane | last post by:
hello all, I am stuck with a strange requirement. I need a library that can help me display a pdf file as a report and also want a way to print the same pdf file in a platform independent way....
18
by: Matt Garman | last post by:
I'm trying to use Python to work with large pipe ('|') delimited data files. The files range in size from 25 MB to 200 MB. Since each line corresponds to a record, what I'm trying to do is...
0
by: neroliang | last post by:
PyLucene Homepage: pylucene.osafoundation.org 1.Quotas from PyLucene: """ Technically, the PyLucene programmer is not providing an 'extension' but a Python implementation of a set of methods...
30
by: Ivan Reborin | last post by:
Hello everyone, I was wondering if anyone here has a moment of time to help me with 2 things that have been bugging me. 1. Multi dimensional arrays - how do you load them in python For...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.