472,096 Members | 1,531 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,096 software developers and data experts.

finding out the number of rows in a CSV file

anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance
Aug 27 '08 #1
17 59728
On Aug 27, 12:16 pm, SimonPalmer <simon.pal...@gmail.comwrote:
anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance
You have to iterate each row and count them -- there's no other way
without supporting information (since each row length is naturally
variable, you can't even use the file size as an indicator).

Something like:

row_count = sum(1 for row in csv.reader( open('filename.csv') ) )

hth
Jon.
Aug 27 '08 #2
2008/8/27 SimonPalmer <si**********@gmail.com>:
anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))

--
Cheers,
Simon B.
si***@brunningonline.net
http://www.brunningonline.net/simon/blog/
Aug 27 '08 #3
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.

len(list(csv.reader(open('my.csv'))))

--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Aug 27 '08 #4
2008/8/27 Jon Clements <jo****@googlemail.com>:
>len(list(csv.reader(open('my.csv'))))
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
I do try to avoid premature optimization. ;-)

--
Cheers,
Simon B.
Aug 27 '08 #5
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/

Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
Aug 27 '08 #6
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
Aug 27 '08 #7
On Aug 27, 12:48 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 Jon Clements <jon...@googlemail.com>:
len(list(csv.reader(open('my.csv'))))
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

I do try to avoid premature optimization. ;-)

--
Cheers,
Simon B.
:)
Aug 27 '08 #8
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
If you're sure that the number of rows is always less than 200.

Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something


Aug 27 '08 #9
On Aug 27, 9:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.
OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?
You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John

Aug 27 '08 #10
On Aug 27, 1:15 pm, John Machin <sjmac...@lexicon.netwrote:
On Aug 27, 9:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
actually not resolved...
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.

OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?

You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John
this is all good, and thanks for your time. I need the number of rows
because of the nature of the data and what I do with it on reading. I
need to initialise some data structures and that is *much* more
efficient if I know in advance the number of rows of data. The cost
of reading the file is probably less than incrementally extending my
internal structures because of their complexity.

To be honest these are all good solutions and I think I have a a view
of csv reading that comes form different technologies plus lack of
experience with python which just means that I don't know where to
look for answers.

Very happy that I can now proceed.
Aug 27 '08 #11
TYR
Use csv.DictReader to get a list of dicts (you get one for each row,
with the values as the vals and the column headings as the keys) and
then do a len(list)?

Aug 27 '08 #12
Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

If you're sure that the number of rows is always less than 200.
Or 2000. Or 20000...

Actually any number that doesn't make your machine fall into a coma will do.
Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something
Peter
Aug 27 '08 #13
[OP] Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

This is similar to what Jon Clements said, but you don't need the csv
module.

num_rows = sum(1 for line in open("myfile.csv"))

As other posters have said, there is no free lunch. When you use
csv.reader, it reads the lines, so once it's finished you're at the
end of the file.

Aug 27 '08 #14
John S wrote:
[OP] Jon Clements wrote:
>On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
Wrong. A field may have embedded newlines:
>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>sum(1 for line in open("tmp.csv"))
11

Peter
Aug 27 '08 #15
John S wrote:
>>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
$ more sample.csv
"Except
when it
isn't."
>>import csv
len(list(csv.reader(open('sample.csv'))))
1
>>len(list(open('sample.csv')))
3

</F>

Aug 27 '08 #16
Peter Otten wrote:
John S wrote:
>[OP] Jon Clements wrote:
>>On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

Wrong. A field may have embedded newlines:
>>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>>sum(1 for line in open("tmp.csv"))
11

Peter
--
http://mail.python.org/mailman/listinfo/python-list
=============================
Well..... a semantics's problem here.
A blank line is just an EOL by its self. Yes.
I may want to count these. Could be indicative of a problem.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...

Again - it's how you look at it, but I don't want EOLs in my dbase
fields. csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.

Steve
no******@hughes.net
Aug 27 '08 #17
On Aug 28, 7:51 am, norseman <norse...@hughes.netwrote:
Peter Otten wrote:
John S wrote:
[OP] Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
Wrong. A field may have embedded newlines:
>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>sum(1 for line in open("tmp.csv"))
11
Peter
--
http://mail.python.org/mailman/listinfo/python-list

=============================
Well..... a semantics's problem here.

A blank line is just an EOL by its self. Yes.
Or a line containing blanks. Yes what?
I may want to count these. Could be indicative of a problem.
If you use the csv module to read the file, a "blank line" will come
out as a row with one field, the contents of which you can check.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...
What is "tossing", apart from the English slang meaning?
What re-opening?
>
Again - it's how you look at it, but I don't want EOLs in my dbase
fields.
<rant>
Most people don't want them, but many do have them, as well as Ctrl-Zs
and NBSPs and dial-up line noise (and umlauts/accents/suchlike
inserted by the temporarily-employed backpacker to ensure that her
compatriots' names and addresses were spelled properly) ... and the IT
department fervently believes the content is ASCII even though they
have done absolutely SFA to ensure that.
</rant>
csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.
IMHO it is very hard to discern from all that ramble what the alleged
problem is, let alone what are the ways to approach it.
Aug 28 '08 #18

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Petr Man | last post: by
2 posts views Thread by Kururu | last post: by
25 posts views Thread by Daniel Kraft | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.