character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that this
must be a problem of the Unicode 3.2 specification and not Python's.
However, I haven't found out how the decomp_data (in unicodedata_db.h)
is built, and neither did I find much more info about the specifics of
Unicode 3.2. I thought about posting here; anyone more knowing could
give it a look.
If we find out that it's a problem with Python, I'll open a bug report
(and volunteer work).
*** Example ***
import unicodedata as ud
def report(utext): for uchar in utext:
print ord(uchar), ud.name(uchar)
u1=u'\N{greek small letter alpha with oxia}'
report(u1) 8049 GREEK SMALL LETTER ALPHA WITH OXIA u2=ud.normalize('NFD', u1)
report(u2) 945 GREEK SMALL LETTER ALPHA
769 COMBINING ACUTE ACCENT u3=ud.normalize('NFC', u2)
report(u3) 940 GREEK SMALL LETTER ALPHA WITH TONOS
*** End of Example ***
I can understand this confusion; if, as I have found, there is no
COMBINING GREEK TONOS or COMBINING TONOS ACCENT in the Unicode table,
decombining, one has to use the 'oxeia' (acute) accent...
--
TZOTZIOY, I speak England very best,
"Tssss!" --Brad Pitt as Achilles in unprecedented Ancient Greek