By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,550 Members | 1,161 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,550 IT Pros & Developers. It's quick & easy.

Import .csv to numpy instead of list to numpy

P: 4
Currently when I import a data file, I create a list, and then use the
Expand|Select|Wrap|Line Numbers
  1. numpy.array(lst)
method to convert the list into a numpy array.

However, ideally I would like to not have to use a transition data type before converting it to a numpy array (and import the .csv file directly into a numpy array).

I have tried the
Expand|Select|Wrap|Line Numbers
  1. numpy.genfromtext()
method, however, that does not keep the inherent matrix structure I am attempting to preserve (it creates a single vector instead of maintaining the distinct columns and rows from the .csv).

My import method basically looks like this:
Expand|Select|Wrap|Line Numbers
  1. import csv
  2. f = open('fileName.csv','rb')
  3. rdr = csv.reader(f,delimeter=',')
  4. lst = []
  5. for row in rdr:
  6. lst.append(row)
When I try to add something like:
Expand|Select|Wrap|Line Numbers
  1. numArray = numpy.array([])
  2. for row in rdr:
  3. numpy.append(numArray,row)
However, I am finding that the result is just an empty numpy array at the completion of the loop.

I have also tried
Expand|Select|Wrap|Line Numbers
  1. numpy.row_stack
, however it seems to me like the number of columns must be known apriori to use this (and I don't know how to use the
Expand|Select|Wrap|Line Numbers
  1. csv.reader()
to determine the number of columns.

Any ideas / assistance you could provide would be most greatly appreciated.!!!
Jun 5 '11 #1
Share this Question
Share on Google+
7 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
The following bypasses the intermediate assignment to lst and typecasts each str element to float.
Expand|Select|Wrap|Line Numbers
  1. import csv
  2. import numpy
  3.  
  4. f = open("array.txt")
  5.  
  6. numArray = numpy.array([map(float, item) for item in csv.reader(f)])
  7.  
  8. f.close()
Jun 6 '11 #2

P: 4
Thanks bvdet!

If have different types in each column, (say of string() and float()) is there any easy way for me to use the map functionality in the list comprehension you provided to account for the different data types?

Thanks for your help!
Jun 6 '11 #3

bvdet
Expert Mod 2.5K+
P: 2,851
Encapsulate the creation of the array in a function with the data type as an argument.
Expand|Select|Wrap|Line Numbers
  1. def create_array(fileObj, dataType):
  2.     return numpy.array([map(dataType, item) for item in csv.reader(fileObj)])
Jun 6 '11 #4

bvdet
Expert Mod 2.5K+
P: 2,851
Oops, I misread your previous message. I am no expert on Numpy, but a Numpy array can only have one data type. Why not create a list of lists or your own container object?
Jun 7 '11 #5

P: 4
I starting off using a list of lists. However, I need to pass pairs of vectors (a date column along with a float column) into another function for processing.

In a numpy array, this consist of:
Expand|Select|Wrap|Line Numbers
  1. numpyArray[:,0:2]
It doesn't seem like there's as clean a way to do this with lists when I'm reading in an entire matrix of data. That's why i was thinking that the numpy.array() was probably the correct data type for me to use.
Jun 7 '11 #6

bvdet
Expert Mod 2.5K+
P: 2,851
You can create a class to contain your list of lists. If you have __getitem__ and __iter__ overloads, the information you want to pass could be accessed like this:
Expand|Select|Wrap|Line Numbers
  1. [item[0:2] for item in arrayObj]
It should work the same for a list of lists.
Jun 7 '11 #7

P: 4
I'm fairly new to Python (programming) so I don't feel all that comfortable creating classes right now. Thanks so much for your input and suggestions.
Jun 7 '11 #8

Post your reply

Sign in to post your reply or Sign up for a free account.