473,396 Members | 1,768 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

codecs, csv issues

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
....
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?

George
Aug 22 '08 #1
2 2091
George Sakkis wrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
...
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?
Looking into the documentation

"""
Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
the examples in section 9.1.5. These restrictions will be removed in the
future.
"""

and into the source code

if encoding is not None and \
'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'

I'd be willing to say that both are implementation limitations.

Peter
Aug 22 '08 #2
On Aug 22, 11:52 pm, George Sakkis <george.sak...@gmail.comwrote:
I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()
This is documented behaviour:
"""
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing.
"""
Aug 22 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Your Name | last post by:
Hi, I have been trying to generate codecs for my language in Python using gencodec.py. The problem is the codec created does not work. Here is the process that I followed. I created a directory...
0
by: Steven Bethard | last post by:
I just wanted to thank Python for making encodings so easy! I recently discovered that one of the tools I use stores everything in UTF-8, and so I was getting some off-by-one errors because I was...
3
by: Eric Brunel | last post by:
Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns...
3
by: Ivan Van Laningham | last post by:
Hi All-- As far as I can tell, after looking only at the documentation (and not searching peps etc.), you cannot query the codecs to give you a list of registered codecs, or a list of possible...
3
by: Paul Watson | last post by:
$ python Python 2.4.1 (#1, May 16 2005, 15:19:29) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.lookup('ascii') (<built-in...
1
by: Zhongjian Lu | last post by:
Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code 1============================ for i in...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...
1
by: David Hughes | last post by:
I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...
0
by: yrogirg | last post by:
Actually, I need utf-8 to utf-8 encoding which would change the text to another keyboard layout (e.g. from english to russian ghbdtn -> ÐÒÉ×ÅÔ) and would not affect other symbols. I`m totally...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.