By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,548 Members | 1,513 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,548 IT Pros & Developers. It's quick & easy.

CSV with comments

P: n/a
In csv.reader, is there any way of skip lines that start whith '#' or
empty lines
I would add comments at my CSV file

Jul 18 '06 #1
Share this Question
Share on Google+
11 Replies


P: n/a

GinTon wrote:
In csv.reader, is there any way of skip lines that start whith '#' or
empty lines
I would add comments at my CSV file
For skip comment I get a dirty trick:

reader = csv.reader(open(csv_file))
for csv_line in reader:
if csv_line[0].startswith('#'):
continue

But not possible let blank lines.

I think that CSV class should to let skip comments and new lines of
auto. way.

Jul 18 '06 #2

P: n/a
GinTon wrote:
GinTon wrote:
>In csv.reader, is there any way of skip lines that start whith '#' or
empty lines
I would add comments at my CSV file

For skip comment I get a dirty trick:

reader = csv.reader(open(csv_file))
for csv_line in reader:
if csv_line[0].startswith('#'):
continue

But not possible let blank lines.

I think that CSV class should to let skip comments and new lines of
auto. way.
write an iterator that filters line to your liking and use it as input
to cvs.reader:

def CommentStripper (iterator):
for line in iterator:
if line [:1] == '#':
continue
if not line.strip ():
continue
yield line

reader = csv.reader (CommentStripper (open (csv_file)))

CommentStripper is actually quite useful for other files. Of course
there might be differences if a comment starts
- on the first character
- on the first non-blank character
- anywhere in the line

Daniel
Jul 18 '06 #3

P: n/a
>In csv.reader, is there any way of skip lines that start whith '#' or
>empty lines
Nope. When we wrote the module we weren't aware of any "spec" that
specified comments or blank lines. You can easily write a file wrapper to
filter them out though:

class BlankCommentCSVFile:
def __init__(self, fp):
self.fp = fp

def __iter__(self):
return self

def next(self):
line = self.fp.next()
if not line.strip() or line[0] == "#":
return self.next()
return line

Use it like so:

reader = csv.reader(BlankCommentCSVFile(open("somefile.csv" )))
for row in reader:
print row

Skip
Jul 18 '06 #4

P: n/a
On 19/07/2006 5:34 AM, sk**@pobox.com wrote:
>In csv.reader, is there any way of skip lines that start whith '#' or
>empty lines

Nope. When we wrote the module we weren't aware of any "spec" that
specified comments or blank lines. You can easily write a file wrapper to
filter them out though:

class BlankCommentCSVFile:
def __init__(self, fp):
self.fp = fp

def __iter__(self):
return self

def next(self):
line = self.fp.next()
if not line.strip() or line[0] == "#":
return self.next()
This is recursive. Unlikely of course, but if the file contained a large
number of empty lines, might this not cause the recursion limit to be
exceeded?

return line

Use it like so:

reader = csv.reader(BlankCommentCSVFile(open("somefile.csv" )))
for row in reader:
print row
Hi Skip,

Is there any reason to prefer this approach to Daniel's, apart from
being stuck with an older (pre-yield) version of Python?

A file given to csv.reader is supposed to be opened with "rb" so that
newlines embedded in data fields can be handled properly, and also
(according to a post by Andrew MacNamara (IIRC)) for DIY emulation of
"rU". It is not apparent how well this all hangs together when a filter
is interposed, nor whether there are any special rules about what the
filter must/mustn't do. Perhaps a few lines for the docs?

Cheers,
John

Jul 18 '06 #5

P: n/a

JohnThis is recursive. Unlikely of course, but if the file contained a
Johnlarge number of empty lines, might this not cause the recursion
Johnlimit to be exceeded?

Sure, but I was lazy. ;-)

Skip
Jul 19 '06 #6

P: n/a
Whoops, missed the second part.

JohnIs there any reason to prefer this approach to Daniel's, apart
Johnfrom being stuck with an older (pre-yield) version of Python?

No, it's just what I came up with off the top of my head.

JohnA file given to csv.reader is supposed to be opened with "rb" so
Johnthat newlines embedded in data fields can be handled properly, and
Johnalso (according to a post by Andrew MacNamara (IIRC)) for DIY
Johnemulation of "rU". It is not apparent how well this all hangs
Johntogether when a filter is interposed, nor whether there are any
Johnspecial rules about what the filter must/mustn't do. Perhaps a few
Johnlines for the docs?

Yeah, I was also aware of that. In the common case though it's not too big
a deal. If the OP is editing a CSV file manually it probably isn't too
complex (no newlines inside fields, for example).

Skip

Jul 19 '06 #7

P: n/a
Daniel Dittmar <da************@sap.corpwrote:
if line [:1] == '#':
What's wrong with line[0] == '#' ? (For one thing, it's fractionally
faster than [:1].)

--
\S -- si***@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomež se bera eadward ofdun hlęddre heafdes bęce bump bump bump
Jul 19 '06 #8

P: n/a
Sion Arrowsmith wrote:
Daniel Dittmar <da************@sap.corpwrote:
> if line [:1] == '#':

What's wrong with line[0] == '#' ? (For one thing, it's fractionally
faster than [:1].)
line[0] assumes that the line isn't blank. If the input iterator is a file
then that will hold true, but if you were ever to reuse CommentStripper on
a list of strings which didn't have a trailing newline it would break at
the first blank string.

Personally I would use:

if line.startswith('#'):

which takes about three times as long to execute but I think reads more
clearly.

timeit.py -s "line=' hello world'" "line[:1]=='#'"
1000000 loops, best of 3: 0.236 usec per loop

timeit.py -s "line=' hello world'" "line[0]=='#'"
1000000 loops, best of 3: 0.218 usec per loop

timeit.py -s "line=' hello world'" "line.startswith('#')"
1000000 loops, best of 3: 0.639 usec per loop
Jul 19 '06 #9

P: n/a
Sion Arrowsmith wrote:
Daniel Dittmar <da************@sap.corpwrote:
> if line [:1] == '#':


What's wrong with line[0] == '#' ? (For one thing, it's fractionally
faster than [:1].)

For that matter, what's wrong with

line.startswith('#')

which expresses the intent rather better as well.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Jul 19 '06 #10

P: n/a
Sion Arrowsmith wrote:
Daniel Dittmar <da************@sap.corpwrote:
> if line [:1] == '#':

What's wrong with line[0] == '#' ? (For one thing, it's fractionally
faster than [:1].)
Matter of taste. Occasionally, I use line iterators that strip the '\n'
from the end of each line, so empty lines have to be handled. Of course,
in my example code, one could have moved the check for the empty line
before the check for the comment.

Daniel
Jul 19 '06 #11

P: n/a
and which method is the best, Daniel's generator or the subclass?

Jul 20 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.