"Jerry Hill" <ma*********@gmail.comwrote in message
news:ma***********************************@python. org...
On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <ti***************@gmail.com>
wrote:
>if I say units=unicode("°"). I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)
If I try x=unicode.decode(x,'utf-8'). I get
TypeError: descriptor 'decode' requires a 'unicode' object but received
a 'str'
What is the correct way to interpret these symbols that come to me as a
string?
Part of it depends on where you're getting them from. If they are in
your source code, just define them like this:
>>>units = u"°"
print units
°
>>>print repr(units)
u'\xb0'
If they're coming from an external source, you have to know the
encoding they're being sent in. Then you can decode them into
unicode, like this:
>>>units = "°"
unicode_units = units.decode('Latin-1')
print repr(unicode_units)
u'\xb0'
>>>print unicode_units
°
--
Jerry
Even with source code you have to know the encoding. for pre-3.x, Python
defaults to ascii encoding for source files:
test.py contains:
units = u"°"
>>import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 1
SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no
encoding declared; see
http://www.python.org/peps/pep-0263.html for details
The encoding of the source file can be declared:
# coding: latin-1
units = u"°"
>>import test
test.units
u'\xb0'
>>print test.units
°
Make sure to use the correct encoding! Here the file was saved in latin-1,
but declared utf8:
# coding: utf8
units = u"°"
>>import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
unexpected code byte
>>>
--
Mark