Python 2.1 / 2.3: xreadlines not working with codecs.open

Eric Brunel

Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:

import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()] ['\xe9\xe0\xe7\xf9\n']

But:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
f.readlines()

[u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)

Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.

TIA
--
python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"

Jul 19 '05 #1

Subscribe Post Reply

4721

Eric Brunel

On Thu, 23 Jun 2005 14:23:34 +0200, Eric Brunel <er*********@despammed.com> wrote:

Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()] ['\xe9\xe0\xe7\xf9\n']

But:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
f.readlines() [u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

Replying to myself. One more funny thing:

import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

[u'\ufffd\ufffd']

So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help.

At least, it does provide a workaround...
--
python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"

Jul 19 '05 #2

Peter Otten

Eric Brunel wrote:

I just found a problem in the xreadlines method/module when used with
codecs.open: the codec specified in the open does not seem to be taken
into account by xreadlines which also returns byte-strings instead of
unicode strings. So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this
happens in Python 2.3, but also in Python 2.1, where the implementation
for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's
escaping me here... Reading the source didn't help.
codecs.StreamReaderWriter seems to delegate everything it doesn't implement
itself to the underlying file instance which is ignorant of the encoding.
The culprit:

def __getattr__(self, name,
getattr=getattr):

""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
At least, it does provide a workaround...

Note that the xreadlines module hasn't made it into Python 2.4.

Peter

Jul 19 '05 #3

Richard Brodie

"Eric Brunel" <er*********@despammed.com> wrote in message news:op**************@eb.pragmadev...

Replying to myself. One more funny thing:
import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

[u'\ufffd\ufffd']

You've specified utf-8 as the encoding instead of iso8859-1,
by the way.

Jul 19 '05 #4

by: Petr Prikryl | last post by:

I did observe the problem when using the -U option on Windows 2000. Seems like some infinite recursion in cp1250.py -- see below. I did not try it with earlier versions of Python. Can this...

Python

Python UTF-8 and codecs

by: Mike Currie | last post by:

I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...

Python

Weekly Python Patch/Bug Summary

by: Kurt B. Kaiser | last post by:

Patch / Bug Summary ___________________ Patches : 430 open ( -4) / 3447 closed (+17) / 3877 total (+13) Bugs : 922 open ( -7) / 6316 closed (+31) / 7238 total (+24) RFE : 245 open...

Python

Using codecs.EncodedFile() with Python 2.5

by: David Hughes | last post by:

I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...

Python

Building Python 2.5.0 on AIX 5.3 - Undefined symbol: .__floor

by: Justin Johnson | last post by:

Hello, I'm trying to build Python 2.5.0 on AIX 5.3 using IBM's compiler (VisualAge C++ Professional / C for AIX Compiler, Version 6). I run configure and make, but makes fails with undefined...

Python

Python object overhead?

by: Matt Garman | last post by:

I'm trying to use Python to work with large pipe ('|') delimited data files. The files range in size from 25 MB to 200 MB. Since each line corresponds to a record, what I'm trying to do is...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Python 2.1 / 2.3: xreadlines not working with codecs.open

Similar topics