471,342 Members | 1,878 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,342 software developers and data experts.

cStringIO unicode weirdness

Python 2.5 (r25:51908, Oct 6 2006, 15:24:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import StringIO, cStringIO
StringIO.StringIO('a').getvalue()
'a'
>>cStringIO.StringIO('a').getvalue()
'a'
>>StringIO.StringIO(u'a').getvalue()
u'a'
>>cStringIO.StringIO(u'a').getvalue()
'a\x00\x00\x00'
>>>
I would have thought StringIO and cStringIO would return the
same result for this ascii-encodeable string. Worse:
>>StringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
u'a'

does the right thing, but
>>cStringIO.StringIO(u'a').getvalue().encode('ut f-8').decode('utf-8')
u'a\x00\x00\x00'

looks bogus. Am I misunderstanding something?
Jun 18 '07 #1
3 3118
On Jun 19, 8:56 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
Python 2.5 (r25:51908, Oct 6 2006, 15:24:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import StringIO, cStringIO
>>StringIO.StringIO('a').getvalue()
'a'
>>cStringIO.StringIO('a').getvalue()
'a'
>>StringIO.StringIO(u'a').getvalue()
u'a'
>>cStringIO.StringIO(u'a').getvalue()
'a\x00\x00\x00'
>>>

I would have thought StringIO and cStringIO would return the
same result for this ascii-encodeable string.
Looks like a bug to me.
Worse:
>>StringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
u'a'

does the right thing, but
>>cStringIO.StringIO(u'a').getvalue().encode('ut f-8').decode('utf-8')
u'a\x00\x00\x00'

looks bogus. Am I misunderstanding something?
Not worse, no more bogus than before. Note that an explicit design
feature of utf8 is that ASCII characters (ord(c) < 128) are unchanged
by the transformation.
>>'a\x00\x00\x00'.encode('utf-8')
# IMPLICIT conversion to unicode (effectively .decode('ascii')), then
encoding as utf8
'a\x00\x00\x00' # no change to original buggy result
>>>
'a\x00\x00\x00'.decode('utf-8')
u'a\x00\x00\x00' # as expected
>>>
Jun 18 '07 #2
Paul Rubin wrote:
Python 2.5 (r25:51908, Oct 6 2006, 15:24:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import StringIO, cStringIO
>>StringIO.StringIO('a').getvalue()
'a'
>>cStringIO.StringIO('a').getvalue()
'a'
>>StringIO.StringIO(u'a').getvalue()
u'a'
>>cStringIO.StringIO(u'a').getvalue()
'a\x00\x00\x00'
>>>

I would have thought StringIO and cStringIO would return the
same result for this ascii-encodeable string. Worse:
You would be wrong. The behavior of StringIO and cStringIO are
different under certain circumstances, and those differences are
intended. Among them is when they are confronted with unicode, as you
saw. Another is when provided with an initializer...
>>cs = cStringIO.StringIO('a')
cs.write('b')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'cStringIO.StringI' object has no attribute 'write'
>>s = StringIO.StringIO('a')
s.write('b')
There is a summer of code project that is working towards making them
behave the same, but the results will need to wait until Python 2.6
and/or 3.0 . Note that there are a few "closed, won't fix" bug reports
regarding these exact same issues in the Python bug tracker at sourceforge.

- Josiah
Jun 19 '07 #3
Josiah Carlson <jo************@sbcglobal.netwrites:
You would be wrong. The behavior of StringIO and cStringIO are
different under certain circumstances, and those differences are
intended. Among them is when they are confronted with unicode, as you
saw. Another is when provided with an initializer...
The doc says there's only supposed to be a difference if the unicode
can't be represented as ascii. That is not the case with the example
I posted.
There is a summer of code project that is working towards making them
behave the same, but the results will need to wait until Python 2.6
and/or 3.0 . Note that there are a few "closed, won't fix" bug
reports regarding these exact same issues in the Python bug tracker at
sourceforge.
Thanks, this helps. At minimum the 2.5 docs should be updated to
explain the issues.
Jun 19 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by David Thielen | last post: by
3 posts views Thread by Laszlo Nagy | last post: by
1 post views Thread by garyjefferson123 | last post: by
3 posts views Thread by =?ISO-8859-1?Q?Markus_Sch=F6pflin?= | last post: by
5 posts views Thread by peppergrower | last post: by
reply views Thread by rosydwin | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.