471,056 Members | 1,603 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,056 software developers and data experts.

unicodedata implementation

Hi,

[Originally posted this to the dev list, but the moderator advised
posting here first]

I'm looking into implementing this module for Jython, and I'm trying
to understand the contracts promised by the various methods. Please
bear in mind that means I'm probably targeting the CPython
implementation as of 2.3, although I would obviously be quite happy if
my implementation doesn't need too much extra to fit the 2.5
functionality!

As someone has previously posted [1], the documentation is a little
thin and they were pointed at the Unicode specification [2]. I've done
a little reading there, and have a little knowledge now, which is
always dangerous. There are still gaps, and I was hoping someone here
might be able to point out what I'm missing.

My problem, described here [3], but I'll summarise and add a little to it.

2468;CIRCLED DIGIT NINE;No;0;EN; 0039;;9;9;N;;;;;

(UnicodeData.txt [4] for Unicode 3.2.0 [5] entry for code-point 0x2468)

verify(unicodedata.decimal(u'\u2468',None) is None)
verify(unicodedata.digit(u'\u2468') == 9)
verify(unicodedata.numeric(u'\u2468') == 9.0)

That works fine, and I can see in the UnicodeData.txt file (the
mirrored property N towards the end is a fine marker; go back three
fields and then start working forward from there) that the decimal
property isn't defined, the digit property is 9 and the numeric
property is also 9.

However, this next bit is what confuses me:

325F;CIRCLED NUMBER THIRTY FIVE;No;0;ON; 0033 0035;;;35;N;;;;;

(UnicodeData.txt for Unicode 3.2.0 entry for code-point 0x325F)

verify(unicodedata.decimal(u'\u325F',None) is None)
verify(unicodedata.digit(u'\u325F', None) is None)
verify(unicodedata.numeric(u'\u325F') == 35.0)

The last one fails - ValueError: not a numeric character.

Now, again looking at the UnicodeData.txt entry and the mirrored N
property, working back three fields and going forward from there shows
that the decimal property isn't set, the digit property isn't set and
the numeric property appears to be 35.

So from my understanding of the Unicode (3.2.0) spec, the code point
0x325F has a numeric property with a value of 35, but the python (2.3
and 2.4 - I haven't put 2.5 onto my box yet) implementation of
unicodedata disagrees, presumably for good reason.

I can't see where I'm going wrong.

Cheers,

James

[1] http://groups.google.com/group/comp....bdda27be118836
[2] http://www.unicode.org/
[3] http://eternusuk.blogspot.com/2007/0...-overview.html
[4] http://www.unicode.org/Public/3.2-Up...Data-3.2.0.txt
[5] http://www.unicode.org/Public/3.2-Up...ata-3.2.0.html
Feb 18 '07 #1
1 1305
James Abley schrieb:
So from my understanding of the Unicode (3.2.0) spec, the code point
0x325F has a numeric property with a value of 35, but the python (2.3
and 2.4 - I haven't put 2.5 onto my box yet) implementation of
unicodedata disagrees, presumably for good reason.

I can't see where I'm going wrong.
You might not be wrong at all. CPython has a hard-coded list for the
numeric mapping (see Object/unicodectype.c), and that hadn't been
updated even when the rest of the character database was updated.
Patch #1494554 corrected this and updated the numeric properties to
Unicode 4.1, for Python 2.5.

There is still a patch pending generating this function, instead
of maintaining it manually.

HTH,
Martin
Feb 22 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by David Opstad | last post: by
9 posts views Thread by Ken Beesley | last post: by
3 posts views Thread by Christos TZOTZIOY Georgiou | last post: by
29 posts views Thread by Enrico `Trippo' Porreca | last post: by
2 posts views Thread by Szabolcs Nagy | last post: by
3 posts views Thread by James Abley | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.