codecs, csv issues

George Sakkis

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
....
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?

George

Aug 22 '08 #1

Subscribe Post Reply

2091

Peter Otten

George Sakkis wrote:

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This doesn't happen for the default encoding (=None).

2) csv.writer doesn't seem to work as expected when being passed a
codecs object; it treats it as if encoding is ascii:

import codecs, csv
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
# this works fine
print >f, s
# this doesn't
csv.writer(f).writerow([s])
f.close()

Traceback (most recent call last):
...
csv.writer(f).writerow([s])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0391' in
position 0: ordinal not in range(128)

Is this the expected behavior or are these bugs ?

Looking into the documentation

"""
Note: This version of the csv module doesn't support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe; see
the examples in section 9.1.5. These restrictions will be removed in the
future.
"""

and into the source code

if encoding is not None and \
'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'

I'd be willing to say that both are implementation limitations.

Peter

Aug 22 '08 #2

John Machin

On Aug 22, 11:52 pm, George Sakkis <george.sak...@gmail.comwrote:

I'm trying to use codecs.open() and I see two issues when I pass
encoding='utf8':

1) Newlines are hardcoded to LINEFEED (ascii 10) instead of the
platform-specific byte(s).

import codecs
f = codecs.open('tmp.txt', 'w', encoding='utf8')
s = u'\u0391\u03b8\u03ae\u03bd\u03b1'
print >f, s
print >f, s
f.close()

This is documented behaviour:
"""
Note
Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-
bit values. This means that no automatic conversion of '\n' is done on
reading and writing.
"""

Aug 22 '08 #3

Similar topics

On Creating Codecs

by: Your Name | last post by:

Hi, I have been trying to generate codecs for my language in Python using gencodec.py. The problem is the codec created does not work. Here is the process that I followed. I created a directory...

Python

singing the praises of unicode and codecs

by: Steven Bethard | last post by:

I just wanted to thank Python for making encodings so easy! I recently discovered that one of the tools I use stores everything in UTF-8, and so I was getting some off-by-one errors because I was...

Python

Python 2.1 / 2.3: xreadlines not working with codecs.open

by: Eric Brunel | last post by:

Hi all, I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns...

Python

Codecs

by: Ivan Van Laningham | last post by:

Hi All-- As far as I can tell, after looking only at the documentation (and not searching peps etc.), you cannot query the codecs to give you a list of registered codecs, or a list of possible...

Python

Fail codecs.lookup() on 'mbcs' and 'tactis'

by: Paul Watson | last post by:

$ python Python 2.4.1 (#1, May 16 2005, 15:19:29) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.lookup('ascii') (<built-in...

Python

split() can help to read UTF-16 encoded file without codecs support,why?

by: Zhongjian Lu | last post by:

Hi Guys, I was processing a UTF-16 coded file with BOM and was not aware of the codecs package at first. I wrote the following code: ===== Code 1============================ for i in...

Python

Python UTF-8 and codecs

by: Mike Currie | last post by:

I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...

Python

Using codecs.EncodedFile() with Python 2.5

by: David Hughes | last post by:

I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...

Python

How to create python codecs?

by: yrogirg | last post by:

Actually, I need utf-8 to utf-8 encoding which would change the text to another keyboard layout (e.g. from english to russian ghbdtn -> ÐÒÉ×ÅÔ) and would not affect other symbols. I`m totally...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice