combining marks; it gives inconsistent results between equivalent forms
of some regular expressions:
sys.version '2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)]'re.match('\w',unicodedata.normalize('NFD',u'\xf 1'),re.UNICODE).group(0) u'n're.match('\w',unicodedata.normalize('NFC',u'\xf 1'),re.UNICODE).group(0)
u'\xf1'
In the above example, u'\xf1' is n-with-tilde (). NFC happens to be a
no-op, and NFD decomposes it into u'n\u0303', which splits out the tilde
as a combining mark.
Is this a limitation-by-design, or a bug? If the latter, is it already
known/to-be-fixed?