Hi And.
In article <11**********************@u72g2000cwu.googlegroups .com>,
an********@doxdesk.com writes:
and-google> Akihiro KAYAMA wrote:
and-google> > As the character set is wider than UTF-16(U+10FFFF), I can't use
and-google> > Python's native unicode string class.
and-google>
and-google> Have you tried using Python compiled in Wide Unicode mode
and-google> (--enable-unicode=ucs4)? You get native UTF-32/UCS-4 strings then,
and-google> which should be enough for most purposes.
From my quick survey, Python's Unicode support is restricted to
UTF-16 range(U+0000...U+10FFFF) intentionally, regardless of
--enable-unicode=ucs4 option.
Python 2.4.1 (#2, Sep 3 2005, 22:35:47)
[GCC 2.95.4 20020320 [FreeBSD]] on freebsd4
Type "help", "copyright", "credits" or "license" for more information. u"\U0010FFFF" u'\U0010ffff' len(u"\U0010FFFF") 1 u"\U00110000"
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character
Simple patch to unicodeobject.c which disables unicode range checking
could solve this, but I don't want to maintenance specialized Python
binary for my project.
-- kayama