469,352 Members | 1,794 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,352 developers. It's quick & easy.

Problems with csv module

Hello,
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:
File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation
The command

print "Adelimiter: ", Adelimiter, len(Adelimiter)

prints

Adelimiter: , 1

So I think Adelimiter is ok?!

What is wrong there?

It is Python 2.3.5.

Thx,

Florian
Jul 19 '05 #1
15 1803

[Florian]
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:
File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation


Is this your problem?:
Adelimiter = u','
reader = csv.reader(f, delimiter=Adelimiter) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation print type(Adelimiter)

<type 'unicode'>

--
Richie Hindle
ri****@entrian.com

Jul 19 '05 #2
Richie Hindle wrote:

[Florian]
I've one problem using the csv module.
The code:

self.reader = csv.reader(f, delimiter = ",")

works perfectly. But when I use a variable for delimiter:

self.reader = csv.reader(f, delimiter = Adelimiter)

I get the traceback:
File "/home/florian/visualizer/ConfigReader.py", line 13, in __init__
self.reader = csv.reader(f, delimiter = Adelimiter)
TypeError: bad argument type for built-in operation


Is this your problem?:
Adelimiter = u','
reader = csv.reader(f, delimiter=Adelimiter) Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation print type(Adelimiter)

<type 'unicode'>


Yes, thats my problem.

You mean that csv.reader can't work with unicode as the delimiter parameter?
Sorry, I don't really get your point what you're saying...

Florian
Jul 19 '05 #3

[Florian]
You mean that csv.reader can't work with unicode as the delimiter parameter?


Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters. Accordingly,
all input should generally be printable ASCII to be safe. These restrictions
will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.

--
Richie Hindle
ri****@entrian.com

Jul 19 '05 #4
Richie Hindle wrote:

[Florian]
You mean that csv.reader can't work with unicode as the delimiter
parameter?


Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should generally be printable ASCII to be safe.
These restrictions will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.


Uhh.. thanks!

How can I convert Unicode to ASCII?

Thx,

Florian
Jul 19 '05 #5

[Florian]
How can I convert Unicode to ASCII?


You're writing code using Unicode and you don't know how to convert it
ASCII? You need to do some reading. Here are a few links - Google can
provide many more:

http://docs.python.org/tut/node5.htm...00000000000000
http://diveintopython.org/xml_processing/unicode.html
http://www.jorendorff.com/articles/unicode/python.html

The short answer to your question is this:
U = u'My string'
A = U.encode('ascii')
print U, type(U), A, type(A)

My string <type 'unicode'> My string <type 'str'>

but you should really do some reading.

--
Richie Hindle
ri****@entrian.com

Jul 19 '05 #6
You mean that csv.reader can't work with unicode as the delimiter
parameter?


Richie> Exactly....

Richie> "Note: This version of the csv module doesn't support Unicode
Richie> input....

Richie> That note is still there in the current development docs, so it
Richie> looks like it hasn't yet been fixed.

I can confirm this. While the note as written focused on csv files encoded
using non-ASCII codecs, it also holds true for the API. Manipulating
Unicode from C isn't as simple as simple as from Python and none of those of
us with our fingerprints on the csv module code had much/any Unicode
experience.

Skip

Jul 19 '05 #7
Richie Hindle wrote:
[Florian]
You mean that csv.reader can't work with unicode as the delimiter parameter?


Exactly. http://www.python.org/doc/2.3.5/lib/module-csv.html says:

"Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters. Accordingly,
all input should generally be printable ASCII to be safe. These restrictions
will be removed in the future. "

That note is still there in the current development docs, so it looks like
it hasn't yet been fixed.


does the CSV format even support Unicode-encoded data streams?

(in contrast to, say, Latin-1 or UTF-8 encoded string fields)

this is a very common XML confusion, where people think that just be-
cause a file format can be used to store Unicode data, a parser for that
format ought to be able to parse Unicode strings...

</F>

Jul 19 '05 #8

Fredrik> does the CSV format even support Unicode-encoded data streams?

Based on the requests I've seen here and on the cs*@mojam.com mailing list,
it appears people are certainly generating CSV files which contain
Unicode-encoded data.

Skip
Jul 19 '05 #9
Skip Montanaro wrote:
Fredrik> does the CSV format even support Unicode-encoded data streams?

Based on the requests I've seen here and on the cs*@mojam.com mailing list,
it appears people are certainly generating CSV files which contain Unicode-
encoded data.


in what encodings?

is the encoding specified inside the file? if so, how?

(it should be noted that the phrase "Unicode-encoded data" that I
used doesn't make much sense, even in the original context. what
I meant to say was that CSV, as far as I know, isn't defined as a
stream of Unicode character, but rather as a stream of bytes in an
ASCII-compatible encoding. this means that you can use e.g. ISO-
8859-1 or UTF-8 for string values, but not that you can encode the
whole thing as, say UTF-16 or UCS-4).

</F>

Jul 19 '05 #10
Based on the requests I've seen here and on the cs*@mojam.com mailing
list, it appears people are certainly generating CSV files which
contain Unicode- encoded data.


Fredrik> in what encodings?

I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000
supports utf-8. Whether Excel can dump csv files in utf-8 or not, I don't
know, though I'd suppose so.

Fredrik> is the encoding specified inside the file? if so, how?

Not that I'm aware of. AFAIK, you just have to know the file's encoding.

Skip
Jul 19 '05 #11
On Wed, 11 May 2005 14:08:08 -0500, Skip Montanaro <sk**@pobox.com>
wrote:
>> Based on the requests I've seen here and on the cs*@mojam.com mailing
>> list, it appears people are certainly generating CSV files which
>> contain Unicode- encoded data.

Fredrik> in what encodings?

I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000
supports utf-8.


I have Excel 2002 and have done some experimentation. It "supports"
utf-8 only to the extent that most times it doesn't mangle the data
(i.e. you can save it again without loss); you just can't make any
sense out of what's on the screen. Specifically:

open a file with CSV extension: Excel assumes blindly that it's
encoded according to your locale (e.g. cp1252).

open a file with TXT extension: Excel gives you the option of
specifying which one of a large number of *legacy* encodings -- yes,
that's correct, utf-* are not on the list!

NOTE: the above applies even if you have a utf-8-encoded BOM at the
start of the file.

This behaviour appears to be Excel-specific; MS Word, Wordpad and even
the humble Notepad recognise the utf-8-encoded BOM and display
sensibly (with a Unicode font, of course).

Whether Excel can dump csv files in utf-8 or not, I don't
know, though I'd suppose so.


Unfortunately, your supposition is incorrect. There is no way of
specifying the encoding directly. The nearest available options are:

(1) csv : encoded in your locale-specific legacy encoding. "illegal"
characters are silently replaced by "?" on Windows and (I deduce)
underscore on a Macintosh.
(2) text : ditto
(3) Unicode text: utf-16 -- it *does* subsequently open these
correctly i.e. silently detects the encoding and displays properly.


Jul 19 '05 #12
On Wed, 11 May 2005 20:02:25 +0200, "Fredrik Lundh"
<fr*****@pythonware.com> wrote:
Skip Montanaro wrote:
Fredrik> does the CSV format even support Unicode-encoded data streams?

Based on the requests I've seen here and on the cs*@mojam.com mailing list,
it appears people are certainly generating CSV files which contain Unicode-
encoded data.


in what encodings?

is the encoding specified inside the file? if so, how?

(it should be noted that the phrase "Unicode-encoded data" that I
used doesn't make much sense, even in the original context. what
I meant to say was that CSV, as far as I know, isn't defined as a
stream of Unicode character, but rather as a stream of bytes in an
ASCII-compatible encoding. this means that you can use e.g. ISO-
8859-1 or UTF-8 for string values, but not that you can encode the
whole thing as, say UTF-16 or UCS-4).


The CSV format is not defined at all, AFAIK.

Empirically, writing CSV works more-or-less like this, for each row:
# pseudocode, untested
control_chars = '\r\n' # or maybe more or maybe just '\n'
out_list = []
for each field:
if field contains quote_char:
out_field = quote_char + \
field.replace(quote_char, quote_char + quote_char) + \
quote_char
elif field contains any one of delimiter or control_chars:
out_field = quote_char + field + quote_char
else:
out_field = field
out_list.append(out_field)

then you write delimiter.join(out_list) followed by "\r\n"

So there is no reason at all why a writer and a reader couldn't use
the above quoting mechanism to transfer columnar data containing
Unicode -- they just have to agree on the encoding, control
characters, quote_char, delimiter, and line terminator.

Excel (see my other post in this thread) provides a writing ("save as
Unicode text") and reading mechanism which uses u'\t' as the
delimiter, u'\r\n' as the line terminator, u'\"' as the quote_char,
and utf-16 as the encoding. I haven't done an exhaustive check to see
what its definition of control_chars would be.
Jul 19 '05 #13
John Machin <sj******@lexicon.net> writes:
The CSV format is not defined at all, AFAIK.
Just for the record, <URL:
http://www.ietf.org/internet-drafts/...ime-csv-05.txt. You'll also see application that deal with the application/csv MIME

type.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Jul 19 '05 #14
On Wed, 11 May 2005 23:52:56 -0500, Mike Meyer <mw*@mired.org> wrote:
John Machin <sj******@lexicon.net> writes:
The CSV format is not defined at all, AFAIK.
Just for the record, <URL:
http://www.ietf.org/internet-drafts/...ime-csv-05.txt

" Bletch" ,""" Bletch""", "Bletch "

Fortunately, being a draft, it's not "for the record".

Anyway, thanks for pointing that out.

. You'll also see application that deal with the application/csv MIME
type.


I'm sorry; my parser's on the fritz: I'll see (when? where?) what
application(s?) that deal with the (proposed?) MIME type?
Jul 19 '05 #15
"Fredrik Lundh" <fr*****@pythonware.com> wrote:

does the CSV format even support Unicode-encoded data streams?


Since there is no RFC or ISO standard for CSV, I'd say the answer was
"yes".

I just tried it with Excel, which is probably as close as we can get to the
canonical csv application. It can read a UCS-16 csv file, but it
mishandles it. It doesn't split at the commas. It treats each line as a
single cell.
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Jul 19 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Mikael Olofsson | last post: by
2 posts views Thread by Felix | last post: by
1 post views Thread by Loading name... | last post: by
4 posts views Thread by mechphisto | last post: by
3 posts views Thread by John Dann | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.