Paul McGuire wrote:
On Apr 6, 8:53 am, "Martin v. Löwis" <mar...@v.loewis.dewrote:
>>>I know I could use:-
if lower(string1) in lower(string2):
<do something>
but it somehow feels there ought to be an easier (tidier?) way.
Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.
So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?
[ ... ]
>>>[i for i in range(65536) if unichr(i).lower().upper() !=
... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]
Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.
I don't know what meaning is carried by all those differences in
lower-case glyphs. Converting to upper seems to fold together a lot
of variant pi's and rho's which I think would be roughly a good thing.
I seem to recall that the tiny iota (ypogegrammeni) has or had
grammatical significance. The other effect would be conflating
physics' Angstron unit and Kelvin unit signs with ring-a and K.
Applicaton programmers beware.
Mel.