472,353 Members | 1,411 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,353 software developers and data experts.

codecs, csv issues

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
....
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?

George
Aug 22 '08 #1
2 2037
George Sakkis wrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
...
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?
Looking into the documentation

"""
Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
the examples in section 9.1.5. These restrictions will be removed in the
future.
"""

and into the source code

if encoding is not None and \
'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'

I'd be willing to say that both are implementation limitations.

Peter
Aug 22 '08 #2
On Aug 22, 11:52 pm, George Sakkis <george.sak...@gmail.comwrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()
This is documented behaviour:
"""
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing.
"""
Aug 22 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Your Name | last post by:
Hi, I have been trying to generate codecs for my language in Python using gencodec.py. The problem is the codec created does not work. Here is...
0
by: Steven Bethard | last post by:
I just wanted to thank Python for making encodings so easy! I recently discovered that one of the tools I use stores everything in UTF-8, and so...
3
by: Eric Brunel | last post by:
Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be...
3
by: Ivan Van Laningham | last post by:
Hi All-- As far as I can tell, after looking only at the documentation (and not searching peps etc.), you cannot query the codecs to give you a...
3
by: Paul Watson | last post by:
$ python Python 2.4.1 (#1, May 16 2005, 15:19:29) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import...
1
by: Zhongjian Lu | last post by:
Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't...
1
by: David Hughes | last post by:
I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program...
0
by: yrogirg | last post by:
Actually, I need utf-8 to utf-8 encoding which would change the text to another keyboard layout (e.g. from english to russian ghbdtn -> привет) and...
1
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
0
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand....
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.