By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,100 Members | 2,958 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,100 IT Pros & Developers. It's quick & easy.

strange behaviour of str()

P: n/a
Hello,

I'm wondering about the following behaviour of str() with strings
containing non-ASCII characters:

str(u'foo') returns 'foo' as expected.

str('lää') returns 'lää' as expected.

str(u'lää') raises UnicodeEncodeError

Is this behaviour sane? Possibly, but not documented at all. Somehow
you'd expect str() to never fail. I'm hesitating about sending a bug
report about a third party application, which fails because it is
relaying on this.

Cheers,
Juho Vuori
Aug 31 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Juho Vuori wrote:
str(u'lää') raises UnicodeEncodeError Is this behaviour sane? Possibly, but not documented at all.
str() on a Unicode string attempts to convert the string to an 8-bit
string using Python's default encoding, which is ASCII. "ä" is not
an ASCII character.

if this problem appears in a 3rd party program, that program has
not been properly internationalized.
Somehow you'd expect str() to never fail.


except for id() and type(), virtually all builtins can fail. If you want
to convert something to a string no matter what it contains, repr() is
a better choice. If you want to convert Unicode strings to a given
byte encoding, you have to use the encode method.

</F>

Aug 31 '05 #2

P: n/a
>=20
Hello,
=20
I'm wondering about the following behaviour of str() with strings=20
containing non-ASCII characters:
=20
str(u'foo') returns 'foo' as expected.
=20
str('l=E4=E4') returns 'l=E4=E4' as expected.
=20
str(u'l=E4=E4') raises UnicodeEncodeError
=20


This does not work, because you need an encoder to convert
unicode to str. str() does not know a priori which encoder
to use. There are many ways to encode a unicode string
to a classic byte-stream based string.

you have to procede as follows:
s=3Du"=E4=E4=E4"
print s.encode("latin-1")

=E4=E4=E4

try "utf-8" and "utf-16" instead of "latin-1"

Greetings, Uwe.


Aug 31 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.