469,957 Members | 2,457 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,957 developers. It's quick & easy.

codecs, csv issues

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
....
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?

George
Aug 22 '08 #1
2 1953
George Sakkis wrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
...
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?
Looking into the documentation

"""
Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
the examples in section 9.1.5. These restrictions will be removed in the
future.
"""

and into the source code

if encoding is not None and \
'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'

I'd be willing to say that both are implementation limitations.

Peter
Aug 22 '08 #2
On Aug 22, 11:52 pm, George Sakkis <george.sak...@gmail.comwrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()
This is documented behaviour:
"""
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing.
"""
Aug 22 '08 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Your Name | last post: by
reply views Thread by Steven Bethard | last post: by
3 posts views Thread by Ivan Van Laningham | last post: by
3 posts views Thread by Paul Watson | last post: by
7 posts views Thread by Mike Currie | last post: by
1 post views Thread by David Hughes | last post: by
reply views Thread by yrogirg | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.