Scott David Daniels wrote:
>>> int(u"\N{DEVANAGARI DIGIT SEVEN}") 7
OK, That much I have handled. I am fiddling with direct-to-number
conversions and wondering about cases like >>> int(u"\N{DEVANAGARI DIGIT SEVEN}" + XXX + u"\N{DEVANAGARI DIGIT SEVEN}")
int() passes NULL as error mode, equalling strict. So if you get an
unencodable character, you get the UnicodeError.
I don't really understand how the "ignore" or "something_else"
cases get caused by python source [where they come from]. Are they
only there for C-program access?
Neither, nor. This code is dead.
In the "ignore" case, no output is produced at all, for the unencodable
character; this is the same way that '?' would be treated (it is
also unencodable).
If I understand you correctly -- I can consider the digit stream to stop
as soon as I hit a non-digit (except for handling bases 11-36).
No. In "ignore" mode, a codec doesn't stop at the unencodable character.
Instead, it skips it, continuing with the next character.
I mistakenly said that this would happen to '?' (question mark) also;
this is incorrect: PyUnicode_EncodeDecimal copies all Latin-1 characters
to the output, latin-1-encoded. So '?' would appear in the output,
even in "ignore" mode.
Handling of bases is not done in the function at all. Instead, the
callers of PyUnicode_EncodeDecimal will deal with number formats
(base, prefix, exponent syntax, etc.) They will assume ASCII
bytes.
Regards,
Martin