468,537 Members | 1,826 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,537 developers. It's quick & easy.

reading a specific column from file

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco
Jan 11 '08 #1
7 25947
On 2008-01-11, cesco <fd**********@gmail.comwrote:
Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?
the csv module may do what you want.
Jan 11 '08 #2
On Jan 11, 2:15 pm, cesco <fd.calabr...@gmail.comwrote:
Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco
for (i, each_line) in enumerate(open('input_file.txt','rb')):
try:
column_3 = each_line.split('\t')[2].strip()
except IndexError:
print 'Not enough columns on line %i of file.' % (i+1)
continue

do_something_with_column_3()
Jan 11 '08 #3
cesco wrote:
I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?
use the "split" method and plain old indexing:

for line in open("file.txt"):
columns = line.split("\t")
print columns[2] # indexing starts at zero

also see the "csv" module, which can read all sorts of
comma/semicolon/tab-separated spreadsheet-style files.
I've found quite interesting the linecache module
the "linecache" module seems to be quite popular on comp.lang.python
these days, but it's designed for a very specific purpose (displaying
Python code in tracebacks), and is a really lousy way to read text files
in the general case. please unlearn.

</F>

Jan 11 '08 #4
A.T.Hofkamp wrote:
On 2008-01-11, cesco <fd**********@gmail.comwrote:
>Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

the csv module may do what you want.
Here's an example:
>>print open("tmp.csv").read()
alpha beta gamma delta
one two three for
>>records = csv.reader(open("tmp.csv"), delimiter="\t")
[record[2] for record in records]
['gamma', 'three']

Peter
Jan 11 '08 #5
On Jan 11, 4:15 am, cesco <fd.calabr...@gmail.comwrote:
Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?
You say you would like to "read" a specific column. I wonder if you
meant read all the data and then just seperate out the 3rd column or
if you really mean only do disk IO for the 3rd column of data and
thereby making your read faster. The second seems more interesting
but much harder and I wonder if any one has any ideas. As for the
just filtering out the third column, you have been given many
suggestions already.

Regards,
Ivan Novick
http://www.0x4849.net
Jan 11 '08 #6
Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.
Jan 17 '08 #7
On Jan 17, 8:47 pm, Hai Vu <wuh...@gmail.comwrote:
Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.
Using the maxsplit arg could speed it up a little:

line[:-1].split('\t', col+1)[col]

Jan 17 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by The Cool Giraffe | last post: by
reply views Thread by Problematic coder | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.