473,486 Members | 2,136 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

finding out the number of rows in a CSV file

anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance
Aug 27 '08 #1
17 59999
On Aug 27, 12:16 pm, SimonPalmer <simon.pal...@gmail.comwrote:
anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.

Thanks in advance
You have to iterate each row and count them -- there's no other way
without supporting information (since each row length is naturally
variable, you can't even use the file size as an indicator).

Something like:

row_count = sum(1 for row in csv.reader( open('filename.csv') ) )

hth
Jon.
Aug 27 '08 #2
2008/8/27 SimonPalmer <si**********@gmail.com>:
anyone know how I would find out how many rows are in a csv file?

I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))

--
Cheers,
Simon B.
si***@brunningonline.net
http://www.brunningonline.net/simon/blog/
Aug 27 '08 #3
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.

len(list(csv.reader(open('my.csv'))))

--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Aug 27 '08 #4
2008/8/27 Jon Clements <jo****@googlemail.com>:
>len(list(csv.reader(open('my.csv'))))
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
I do try to avoid premature optimization. ;-)

--
Cheers,
Simon B.
Aug 27 '08 #5
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/

Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
Aug 27 '08 #6
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

Thanks to everyone for their suggestions.

In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
Aug 27 '08 #7
On Aug 27, 12:48 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 Jon Clements <jon...@googlemail.com>:
len(list(csv.reader(open('my.csv'))))
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!

I do try to avoid premature optimization. ;-)

--
Cheers,
Simon B.
:)
Aug 27 '08 #8
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
If you're sure that the number of rows is always less than 200.

Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something


Aug 27 '08 #9
On Aug 27, 9:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution

actually not resolved...

after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.
OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?
You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John

Aug 27 '08 #10
On Aug 27, 1:15 pm, John Machin <sjmac...@lexicon.netwrote:
On Aug 27, 9:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:50 pm, SimonPalmer <simon.pal...@gmail.comwrote:
On Aug 27, 12:41 pm, Jon Clements <jon...@googlemail.comwrote:
On Aug 27, 12:29 pm, "Simon Brunning" <si...@brunningonline.net>
wrote:
2008/8/27 SimonPalmer <simon.pal...@gmail.com>:
anyone know how I would find out how many rows are in a csv file?
I can't find a method which does this on csv.reader.
len(list(csv.reader(open('my.csv'))))
--
Cheers,
Simon B.
si...@brunningonline.nethttp://www.brunningonline.net/simon/blog/
Not the best of ideas if the row size or number of rows is large!
Manufacture a list, then discard to get its length -- ouch!
Thanks to everyone for their suggestions.
In my case the number of rows is never going to be that large (<200)
so it is a practical if slightly inelegant solution
actually not resolved...
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows.

OK, I'll bite: Why do you think you need to know the number of rows in
advance?
How do I reset the row iterator?

You don't. You throw it away and get another one. You need to seek to
the beginning of the file first. E.g.:

C:\junk>type foo.csv
blah,blah
waffle
q,w,e,r,t,y

C:\junk>type csv2iters.py
import csv
f = open('foo.csv', 'rb')
rdr = csv.reader(f)
n = 0
for row in rdr:
n += 1
print n, f.tell()
f.seek(0)
rdr = csv.reader(f)
for row in rdr:
print row

C:\junk>csv2iters.py
3 32
['blah', 'blah']
['waffle']
['q', 'w', 'e', 'r', 't', 'y']

HTH,
John
this is all good, and thanks for your time. I need the number of rows
because of the nature of the data and what I do with it on reading. I
need to initialise some data structures and that is *much* more
efficient if I know in advance the number of rows of data. The cost
of reading the file is probably less than incrementally extending my
internal structures because of their complexity.

To be honest these are all good solutions and I think I have a a view
of csv reading that comes form different technologies plus lack of
experience with python which just means that I don't know where to
look for answers.

Very happy that I can now proceed.
Aug 27 '08 #11
TYR
Use csv.DictReader to get a list of dicts (you get one for each row,
with the values as the vals and the column headings as the keys) and
then do a len(list)?

Aug 27 '08 #12
Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

If you're sure that the number of rows is always less than 200.
Or 2000. Or 20000...

Actually any number that doesn't make your machine fall into a coma will do.
Slightly modify Simon Brunning's example and do:

rows = list( csv.reader(open('filename.csv')) )
row_count = len(rows)
for row in rows:
# do something
Peter
Aug 27 '08 #13
[OP] Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

This is similar to what Jon Clements said, but you don't need the csv
module.

num_rows = sum(1 for line in open("myfile.csv"))

As other posters have said, there is no free lunch. When you use
csv.reader, it reads the lines, so once it's finished you're at the
end of the file.

Aug 27 '08 #14
John S wrote:
[OP] Jon Clements wrote:
>On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
>>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
Wrong. A field may have embedded newlines:
>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>sum(1 for line in open("tmp.csv"))
11

Peter
Aug 27 '08 #15
John S wrote:
>>after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?

A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
$ more sample.csv
"Except
when it
isn't."
>>import csv
len(list(csv.reader(open('sample.csv'))))
1
>>len(list(open('sample.csv')))
3

</F>

Aug 27 '08 #16
Peter Otten wrote:
John S wrote:
>[OP] Jon Clements wrote:
>>On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).

Wrong. A field may have embedded newlines:
>>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>>sum(1 for line in open("tmp.csv"))
11

Peter
--
http://mail.python.org/mailman/listinfo/python-list
=============================
Well..... a semantics's problem here.
A blank line is just an EOL by its self. Yes.
I may want to count these. Could be indicative of a problem.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...

Again - it's how you look at it, but I don't want EOLs in my dbase
fields. csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.

Steve
no******@hughes.net
Aug 27 '08 #17
On Aug 28, 7:51 am, norseman <norse...@hughes.netwrote:
Peter Otten wrote:
John S wrote:
[OP] Jon Clements wrote:
On Aug 27, 12:54 pm, SimonPalmer <simon.pal...@gmail.comwrote:
after reading the file throughthe csv.reader for the length I cannot
iterate over the rows. How do I reset the row iterator?
A CSV file is just a text file. Don't use csv.reader for counting rows
-- it's overkill. You can just read the file normally, counting lines
(lines == rows).
Wrong. A field may have embedded newlines:
>>import csv
csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
sum(1 for row in csv.reader(open("tmp.csv")))
1
>>sum(1 for line in open("tmp.csv"))
11
Peter
--
http://mail.python.org/mailman/listinfo/python-list

=============================
Well..... a semantics's problem here.

A blank line is just an EOL by its self. Yes.
Or a line containing blanks. Yes what?
I may want to count these. Could be indicative of a problem.
If you use the csv module to read the file, a "blank line" will come
out as a row with one field, the contents of which you can check.
Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
counting blanks and still avoids tossing, re-opening etc...
What is "tossing", apart from the English slang meaning?
What re-opening?
>
Again - it's how you look at it, but I don't want EOLs in my dbase
fields.
<rant>
Most people don't want them, but many do have them, as well as Ctrl-Zs
and NBSPs and dial-up line noise (and umlauts/accents/suchlike
inserted by the temporarily-employed backpacker to ensure that her
compatriots' names and addresses were spelled properly) ... and the IT
department fervently believes the content is ASCII even though they
have done absolutely SFA to ensure that.
</rant>
csv was designed to 'dump' data base fields into text for those
not affording a data base program and/or to convert between data base
programs. By the way - has anyone seen a good spread sheet dumper? One
that dumps the underlying formulas and such along with the display
value? That would greatly facilitate portability, wouldn't it? (Yeah -
the receiving would have to be able to read it. But it would be a start
- yes?) Everyone got the point? Just because it gets abused doesn't
mean .... Are we back on track? Number of lines equals number of
reads - which is what was requested. No bytes magically disappearing. No
slight of hand, no one dictating how to or what with ....

The good part is everyone who reads this now knows two ways to approach
the problem and the pros/cons of each. No loosers.
IMHO it is very hard to discern from all that ramble what the alleged
problem is, let alone what are the ways to approach it.
Aug 28 '08 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
1527
by: esmith2112 | last post by:
I have a situation that I can't explain. Boiled down to its essence, I have a query of the form SELECT A.COL1, A.COL2, B.COL1 FROM A LEFT JOIN B ON A.KEY = B.KEY This query produces 5383...
0
1093
by: Petr Man | last post by:
Hello everyone, I have a repeatedly running process, which always creates a new logfile with an ending n+1. What I need is to find the last file, the one with highest number at the end. The...
6
4006
by: rshivaraman | last post by:
CREATE TABLE ( (10) NULL ) CREATE TABLE ( (10) NULL )
2
1179
by: Kururu | last post by:
Hi Does anyone know how to make a form application which can get no. of file in the directory and size??? Thousand Thanks Kururu
1
9312
by: sharadadutt1981 | last post by:
hi all, I was trying to fetch even or odd number rows from database. currently i am usying DB2 ver 8.0. any one can help me out.
11
4895
by: John | last post by:
Is there a way to find the number of processors on a machine (on linux/ windows/macos/cygwin) using python code (using the same code/cross platform code)?
25
4526
by: Daniel Kraft | last post by:
Hi, I do need to implement something similar to C++'s std::bitset in C; for this, I use an array of int's to get together any desired number of bits, possibly larger than 32/64 or anything like...
1
1822
by: clickingwires | last post by:
How do you consecutively number rows in an aggregate query?
1
2223
by: ranaharis | last post by:
how can i get line number of file where error has occured?
1
2080
by: jeddiki | last post by:
Hi, I want to get a subset from my table that includes rows that have an item (cb_id) with a unique ip address ( ip_adr). To be in the subset there should be at least two rows and if there are more...
0
7100
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6964
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7126
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
6842
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7330
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5434
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4865
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4559
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3070
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.