471,344 Members | 1,502 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,344 software developers and data experts.

Getting values out of a CSV

How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.

Jul 13 '07 #1
10 12108
On Fri, 13 Jul 2007 05:59:53 +0300, <Ca********@gmail.comwrote:
>
How do I access the value in the second row in the first position of a
CSV? Or the 3rd row, in the fifth position?

a,b,c,d,e,f,g,h,i
j,k,l,m,n,o,p,q,r
r,s,t,v,w,x,y,z

I'd want to get at "j" and "w". I know I can do

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
print row[0]

to get the first value in EVERY row, but I don't want that. Thanks for
the help.
data = [row for row in csv.reader(open('some.csv', 'rb'))

then you can access like so:
>>data[1][4]
'n'
>>data[0][0]
'a'
>>data[2][0]
'r'

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jul 13 '07 #2
En Fri, 13 Jul 2007 02:10:17 -0300, Daniel <no@no.noescribió:
data = [row for row in csv.reader(open('some.csv', 'rb'))
Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

--
Gabriel Genellina

Jul 13 '07 #3
On Fri, 13 Jul 2007 08:51:25 +0300, Gabriel Genellina
<ga*******@yahoo.com.arwrote:
>data = [row for row in csv.reader(open('some.csv', 'rb'))

Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))
Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.
Jul 13 '07 #4
On 7/12/07, Daniel <no@no.nowrote:
On Fri, 13 Jul 2007 08:51:25 +0300, Gabriel Genellina
<ga*******@yahoo.com.arwrote:
data = [row for row in csv.reader(open('some.csv', 'rb'))
Note that every time you see [x for x in ...] with no condition, you can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Clearer? Maybe, but list comprehensions are clearer (at least for me)

Faster? No. List Comprehensions are faster.
--
http://mail.python.org/mailman/listinfo/python-list
kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

--
Kelvie
Jul 13 '07 #5
Daniel wrote:
On Fri, 13 Jul 2007 08:51:25 +0300, Gabriel Genellina
<ga*******@yahoo.com.arwrote:
>Note that every time you see [x for x in ...] with no condition, you
can write list(...) instead - more clear, and faster.

Faster? No. List Comprehensions are faster.
Why do you think that?
--
Michael Hoffman
Jul 13 '07 #6
Note that every time you see [x for x in ...] with no condition, you
>can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.
$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.
Jul 13 '07 #7
On Fri, 13 Jul 2007 15:05:29 +0300, Daniel wrote:
>Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

kelvie@valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
100 loops, best of 3: 7.5 msec per loop
kelvie@valour pdfps $ python -m timeit -c 'data = [line for line in
open("make.ps")]'
100 loops, best of 3: 9.2 msec per loop

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I, too, think it's faster to just use list() instead of 'line for line
in iterable', as it seems kind of redundant.

$ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list comps
are python are very heavily optimised.
Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25 msec
per loop

Ciao,
Marc 'BlackJack' Rintsch
Jul 13 '07 #8
On Fri, 13 Jul 2007 16:18:38 +0300, Marc 'BlackJack' Rintsch
<bj****@gmx.netwrote:
>$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop
No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python
Jul 13 '07 #9
Hrm. Repeating the test several more times, it seems that the value
fluctuates, sometimes one's faster than the other, and sometimes
they're the same.

Perhaps the minute difference between the two is statistically
insignificant? Or perhaps the mechanism underlying both (i.e. the
implementation) is the same?

On 7/13/07, Daniel <no@no.nowrote:
On Fri, 13 Jul 2007 16:18:38 +0300, Marc 'BlackJack' Rintsch
<bj****@gmx.netwrote:
$ python -m timeit -c 'import csv; data =
list(csv.reader(open("some.csv",
"rb")))'
10000 loops, best of 3: 44 usec per loop
$ python -m timeit -c 'import csv; data = [row for row in
csv.reader(open("some.csv", "rb"))]'
10000 loops, best of 3: 37 usec per loop

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.
Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is
it
dynamically stepped if there's load on the processor? Do both tests read
the data always from cache or has the very first loop had to fetch the
CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25
msec
per loop

No SpeedStep - tried a few repeats just in case files were cached,
consistent 35usec for comp 40usec for list

Python 2.5.1 on Linux 1.2ghz

Even replacing the csv lookup with a straight variable declaration:
[range(10)*3], same results

Weird.

Python
--
http://mail.python.org/mailman/listinfo/python-list

--
Kelvie
Jul 13 '07 #10
En Fri, 13 Jul 2007 09:05:29 -0300, Daniel <no@no.noescribió:
>Note that every time you see [x for x in ...] with no condition, you
can
write list(...) instead - more clear, and faster.

data = list(csv.reader(open('some.csv', 'rb')))

Faster? No. List Comprehensions are faster.

On my system just putting into a list is faster. I think this is
because you don't need to assign each line to the variable 'line' each
time in the former case.

I don't know why there seems to be a differece, but I know that list
comps
are python are very heavily optimised.
In principle both ways have to create and populate a list, and a list
comprehension surely is better than a loop using append() - but it still
has to create and bind the intermediate variable on each iteration.
I think that testing with a csv file can't show the difference between
both ways of creating the list because of the high overhead due to csv
processing.
Using another example, with no I/O involved (a generator for the first
10000 fibonacci numbers):

C:\TEMP>python -m timeit -s "import fibo" "list(fibo.fibo())"
10 loops, best of 3: 39.4 msec per loop

C:\TEMP>python -m timeit -s "import fibo" "[x for x in fibo.fibo()]"
10 loops, best of 3: 40.7 msec per loop

(Generating less values shows larger differences - anyway they're not
terrific)

So, as always, one should measure in each specific case if optimization is
worth the pain - and if csv files are involved I'd say the critical points
are elsewhere, not on how one creates the list of rows.

--
Gabriel Genellina

Jul 14 '07 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by WIWA | last post: by
8 posts views Thread by SenthilVel | last post: by
2 posts views Thread by Kiran Kumar Pinjala | last post: by
3 posts views Thread by Nathan Sokalski | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.