473,396 Members | 2,004 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Python 2.1 / 2.3: xreadlines not working with codecs.open

Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()] ['\xe9\xe0\xe7\xf9\n']

But:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
f.readlines()

[u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)

Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.

TIA
--
python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"
Jul 19 '05 #1
3 4721
On Thu, 23 Jun 2005 14:23:34 +0200, Eric Brunel <er*********@despammed.com> wrote:
Hi all,

I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.

For example, if a file foo.txt contains some text encoded in latin1:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()] ['\xe9\xe0\xe7\xf9\n']

But:
import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
f.readlines() [u'\ufffd\ufffd']

The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.


Replying to myself. One more funny thing:
import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

[u'\ufffd\ufffd']

So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help.

At least, it does provide a workaround...
--
python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"
Jul 19 '05 #2
Eric Brunel wrote:
I just found a problem in the xreadlines method/module when used with
codecs.open: the codec specified in the open does not seem to be taken
into account by xreadlines which also returns byte-strings instead of
unicode strings. So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this
happens in Python 2.3, but also in Python 2.1, where the implementation
for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's
escaping me here... Reading the source didn't help.
codecs.StreamReaderWriter seems to delegate everything it doesn't implement
itself to the underlying file instance which is ignorant of the encoding.
The culprit:

def __getattr__(self, name,
getattr=getattr):

""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
At least, it does provide a workaround...


Note that the xreadlines module hasn't made it into Python 2.4.

Peter

Jul 19 '05 #3

"Eric Brunel" <er*********@despammed.com> wrote in message news:op**************@eb.pragmadev...

Replying to myself. One more funny thing:
import codecs, xreadlines
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in xreadlines.xreadlines(f)]

[u'\ufffd\ufffd']


You've specified utf-8 as the encoding instead of iso8859-1,
by the way.
Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Petr Prikryl | last post by:
I did observe the problem when using the -U option on Windows 2000. Seems like some infinite recursion in cp1250.py -- see below. I did not try it with earlier versions of Python. Can this...
7
by: Mike Currie | last post by:
I'm trying to write out files that have utf-8 characters 0x85 and 0x08 in them. Every configuration I try I get a UnicodeError: ascii codec can't decode byte 0x85 in position 255: oridinal not in...
0
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 430 open ( -4) / 3447 closed (+17) / 3877 total (+13) Bugs : 922 open ( -7) / 6316 closed (+31) / 7238 total (+24) RFE : 245 open...
1
by: David Hughes | last post by:
I used this function successfully with Python 2.4 to alter the encoding of a set of database records from latin-1 to utf-8, but the same program raises an exception using Python 2.5. This small...
1
by: Justin Johnson | last post by:
Hello, I'm trying to build Python 2.5.0 on AIX 5.3 using IBM's compiler (VisualAge C++ Professional / C for AIX Compiler, Version 6). I run configure and make, but makes fails with undefined...
18
by: Matt Garman | last post by:
I'm trying to use Python to work with large pipe ('|') delimited data files. The files range in size from 25 MB to 200 MB. Since each line corresponds to a record, what I'm trying to do is...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.