471,313 Members | 1,924 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,313 software developers and data experts.

printing unicode strings

Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)

Jul 24 '07 #1
3 2120
7stud wrote:
Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)
For format % args, if the format or any arg is a unicode string, the result
will be unicode, too. This implies that byte strings have to be decoded,
and for that process the default ascii codec is used. In your example
print "%s %s" % (symbol, price.encode("utf-8") )
symbol is a unicode, so python tries to decode "%s %s" and "4 \xc2\xbd"
(the result of price.encode("utf8")). The latter contains non-ascii chars
and fails.

Solution: use unicode throughout and let the print statement do the
encoding.
>>symbol = u"ibm"
price = u"4 \xbd"
print u"%s %s" % (symbol, price)
ibm 4 ?

Sometimes, e. g. if you redirect stdout, the above can fail. Here's a
workaround that uses utf8 in such cases.

import sys
if sys.stdout.encoding is None:
import codecs
sys.stdout = codecs.lookup("utf8").streamwriter(sys.stdout)

Peter

Jul 24 '07 #2
On Jul 25, 6:56 am, 7stud <bbxx789_0...@yahoo.comwrote:
Can anyone tell me why I can print out the individual variables in the
following code, but when I print them out combined into a single
string, I get an error?

symbol = u'ibm'
price = u'4 \xbd' # 4 1/2

print "%s" % symbol
print "%s" % price.encode("utf-8")
print "%s %s" % (symbol, price.encode("utf-8") )

--output:--
ibm
4 1/2
File "pythontest.py", line 6, in ?
print "%s %s" % (symbol, price.encode("utf-8") )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal not in range(128)
Because the first part is Unicode and the second part (after encoding
in utf8) is str.

It is trying to convert the second part to Unicode, using the default
codec (ascii), which of course must fail:
>>price = u"4 \xbd"
price.encode("utf8")
'4 \xc2\xbd'
>>>>>price.encode("utf8").decode("ascii")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
2: ordinal
not in range(128)
>>>

Jul 24 '07 #3
Thanks.

Jul 25 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Pekka Niiranen | last post: by
11 posts views Thread by Marian Aldenhövel | last post: by
2 posts views Thread by Fuzzyman | last post: by
4 posts views Thread by webdev | last post: by
2 posts views Thread by Neil Schemenauer | last post: by
29 posts views Thread by Ron Garret | last post: by
1 post views Thread by sheldon.regular | last post: by
5 posts views Thread by Xah Lee | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.