Pythonic use of CSV module to skip headers?

Ramon Felciano

Hi --

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

The only thing I could think of was to specialize the default reader
class with an extra skipHeaderLine constructor parameter so that its
next() method can skip the first line appropriate. Is there any other
cleaner way to do it w/out subclassing the stdlib?

Thanks!

Ramon

Jul 18 '05 #1

Subscribe Reply

15762

Steve Holden

Ramon Felciano wrote:

Hi --

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

The only thing I could think of was to specialize the default reader
class with an extra skipHeaderLine constructor parameter so that its
next() method can skip the first line appropriate. Is there any other
cleaner way to do it w/out subclassing the stdlib?

Thanks!

Ramon

How about

line = 0
reader = csv.reader(file(filename))
headerline = reader.next()
for row in reader:
line = line+1
# do something with row

regards
Steve
--
http://www.holdenweb.com
http://pydish.holdenweb.com
Holden Web LLC +1 800 494 3119

Jul 18 '05 #2

Marc 'BlackJack' Rintsch

In <76**************************@posting.google.com >, Ramon Felciano
wrote:

Hi --

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

What about:

reader = csv.reader(file(filename))
reader.next() # Skip header line.
for row in reader:
# do something with row

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #3

Peter Otten

Ramon Felciano wrote:

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

The only thing I could think of was to specialize the default reader
class with an extra skipHeaderLine constructor parameter so that its
next() method can skip the first line appropriate. Is there any other
cleaner way to do it w/out subclassing the stdlib?

import csv
f = file("tmp.csv")
f.next() '# header\n' for row in csv.reader(f):

.... print row
....
['a', 'b', 'c']
['1', '2', '3']

This way the reader need not mess with the header at all.

Peter

Jul 18 '05 #4

Peter Otten

Ramon Felciano wrote:

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

The only thing I could think of was to specialize the default reader
class with an extra skipHeaderLine constructor parameter so that its
next() method can skip the first line appropriate. Is there any other
cleaner way to do it w/out subclassing the stdlib?

import csv
f = file("tmp.csv")
f.next() '# header\n' for row in csv.reader(f):

.... print row
....
['a', 'b', 'c']
['1', '2', '3']

This way the reader need not mess with the header at all.

Peter

Jul 18 '05 #5

Skip Montanaro

Ramon> I'm using the csv module to parse a tab-delimited file and
Ramon> wondered whether there was a more elegant way to skip an possible
Ramon> header line.

Assuming the header line has descriptive titles, I prefer the DictReader
class. Unfortunately, it requires you to specify the titles in its
constructor. My usual idiom is the following:

f = open(filename, "rb") # don't forget the 'b'!
reader = csv.reader(f)
titles = reader.next()
reader = csv.DictReader(f, titles)
for row in reader:
...

The advantage of the DictReader class is that you get dictionaries keyed by
the titles instead of tuples. The code to manipulate them is more readable
and insensitive to changes in the order of the columns. On the down side,
if the titles aren't always named the same you lose.

Skip

Jul 18 '05 #6

Michael Hoffman

Skip Montanaro wrote:

Assuming the header line has descriptive titles, I prefer the DictReader
class. Unfortunately, it requires you to specify the titles in its
constructor. My usual idiom is the following:

I deal so much with tab-delimited CSV files that I found it useful to
create a subclass of csv.DictReader to deal with this, so I can just write:

for row in tabdelim.DictReader(file(filename)):
...

I think this is a lot easier than trying to remember this cumbersome
idiom every single time.
--
Michael Hoffman

Jul 18 '05 #7

Nick Coghlan

Michael Hoffman wrote:

I deal so much with tab-delimited CSV files that I found it useful to
create a subclass of csv.DictReader to deal with this, so I can just write:

for row in tabdelim.DictReader(file(filename)):
...

I think this is a lot easier than trying to remember this cumbersome
idiom every single time.

Python 2.4 makes the fieldnames paramter optional:
"If the fieldnames parameter is omitted, the values in the first row of the
csvfile will be used as the fieldnames."

i.e. the following should work fine in 2.4:

for row in csv.DictReader(file(filename)):
print sorted(row.items())

Cheers,
Nick.

Jul 18 '05 #8

Skip Montanaro

Assuming the header line has descriptive titles, I prefer the
DictReader class. Unfortunately, it requires you to specify the
titles in its constructor. My usual idiom is the following:

Michael> I deal so much with tab-delimited CSV files that I found it
Michael> useful to create a subclass of csv.DictReader to deal with
Michael> this, so I can just write:

Michael> for row in tabdelim.DictReader(file(filename)):
Michael> ...

Michael> I think this is a lot easier than trying to remember this
Michael> cumbersome idiom every single time.

I'm not sure what the use of TABs as delimiters has to do with the OP's
problem. In my example I flubbed and failed to specify the delimiter to the
constructors (comma is the default delimiter).

You can create a subclass of DictReader that plucks the first line out as a
set of titles:

class SmartDictReader(csv.DictReader):
def __init__(self, f, *args, **kwds):
rdr = csv.reader(*args, **kwds)
titles = rdr.next()
csv.DictReader.__init__(self, f, titles, *args, **kwds)

Is that what you were suggesting? I don't find the couple extra lines of
code in my original example all that cumbersome to type though.

Skip

Jul 18 '05 #9

Michael Hoffman

Skip Montanaro wrote:

I'm not sure what the use of TABs as delimiters has to do with the OP's
problem.
Not much. :) I just happen to use tabs more often than commas, so my
subclass defaults to
You can create a subclass of DictReader that plucks the first line out as a
set of titles:

class SmartDictReader(csv.DictReader):
def __init__(self, f, *args, **kwds):
rdr = csv.reader(*args, **kwds)
titles = rdr.next()
csv.DictReader.__init__(self, f, titles, *args, **kwds)

Is that what you were suggesting?
Exactly.
I don't find the couple extra lines of
code in my original example all that cumbersome to type though.

If you started about half of the programs you write with those extra
lines, you might <wink>. I'm a strong believer in OnceAndOnlyOnce.

Thanks to Nick Coghlan for pointing out that I no longer need do this in
Python 2.4.
--
Michael Hoffman

Jul 18 '05 #10

Skip Montanaro

I don't find the couple extra lines of code in my original example
all that cumbersome to type though.

Michael> If you started about half of the programs you write with those
Michael> extra lines, you might <wink>. I'm a strong believer in
Michael> OnceAndOnlyOnce.

You're right of course. I do use csv a lot, but only from a couple
specialized programs.

Skip

Jul 18 '05 #11

Similar topics

2306

Python-2.3b1 bugs on Windows2000 with: the new csv module, stringreplace, and the re module

by: Daniel Ortmann | last post by:

These problems only happen on Windows. On Linux everything works fine. Has anyone else run into these bugs? Any suggestions? Where do I find out the proper bug reporting process? Problem...

Python

4246

python 2.3, cvs module specific question

by: Bernard Delmée | last post by:

Hello, I can't seem to be able to specify the delimiter when building a DictReader() I can do: inf = file('data.csv') rd = csv.reader( inf, delimiter=';' ) for row in rd: # ...

Python

2338

module for parsing email Received headers?

by: Fortepianissimo | last post by:

Does anyone know the existence of such module? I bet 90% of the chance that the wheel was invented before. Thanks!

Python

2253

Pythonic Nirvana - towards a true Object Oriented Environment [visionary rambling, long]

by: Ville Vainio | last post by:

Pythonic Nirvana - towards a true Object Oriented Environment ============================================================= IPython (by Francois Pinard) recently (next release - changes are...