know if that behaves differently?)
import codecs
u'\ud800' # part of surrogate pair
u'\ud800'
codecs.utf_16_b e_encode(_)[0]
'\xd8\x00'
codecs.utf_16_b e_decode(_)[0]
Traceback (most recent call last):
File "<input>", line 1, in ?
UnicodeDecodeEr ror: 'utf16' codec can't decode bytes in position 0-1:
unexpected end of data
If the ascii can't be recognized as UTF16, then surely the codec
shouldn't have allowed it to be encoded in the first place? I could
understand if it was trying to decode ascii into (native) UTF32.
On a similar note, if you are using UTF32 natively, are you allowed to
have raw surrogate escape sequences (paired or otherwise) in unicode
literals?
Thanks
John