473,686 Members | 3,494 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

convert Unicode to lower/uppercase?

Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?

What I actually need to do is to compare a number of strings in a
case-insensitive manner, so I assume it's simplest to convert to
lower/upper first.

Possibly all strings will be from the latin-1 character set, so I could
convert to 8-bit latin-1, map to lowercase, and convert back, but that
seems rather cumbersome.

--
Hallvard
Jul 18 '05 #1
23 25929
nospam wrote:
Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?


Toiled and came up with:
print u"abc".uppe r() ABC
u"ABC".lower ()

u'abc\xe4\xf6\x fc'

Peter
Jul 18 '05 #2
Thanks!

--
Hallvard
Jul 18 '05 #3
Peter Otten <__*******@web. de> wrote in message news:<bk******* ******@news.t-online.com>...
nospam wrote:
Has someone got a Python routine or module which converts Unicode
strings to lowercase (or uppercase)?


Toiled and came up with:
print u"abc".uppe r() ABC
u"ABC".lower ()

u'abc\xe4\xf6\x fc'

Peter


But that really doesn't work properly. According to Unicode specs and
German usage the uppercase of "" is actually "SS", that is the single
character "" should uppercase to two characters.

Jim Allan
Jul 18 '05 #4
jallan wrote:
But that really doesn't work properly. According to Unicode specs and
German usage the uppercase of "" is actually "SS", that is the single
character "" should uppercase to two characters.


Can you cite exact chapter and verse of the Unicode specs that say so?
According to the Unicode database,

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

has neither an uppercase mapping, nor a lowercase mapping.

Also, in German, the uppercase mapping of is of ongoing debate.
For example, the Duden from 1919 says

| Fr wird in groer Schrift SZ angewandt [...]. Die Verwendung
| _zweier_ Buchstaben fr _einen_ Laut ist nur ein Notbehelf, der
| aufhren mu, sobald ein geeigneter Druckbuchstabe fr das
| groe geschaffen ist.

The usage of SZ has only been eliminated in the recent change of
the amtliche Rechtschreibung .

Regards,
Martin

Jul 18 '05 #5
"Martin v. Lwis" <ma****@v.loewi s.de> wrote in message news:<bk******* ******@news.t-online.com>...
The usage of SZ has only been eliminated in the recent change of
the amtliche Rechtschreibung .


And replaced with what? ie. is there now a single capital for SZ?
Jul 18 '05 #6
Asun Friere wrote:
"Martin v. Lwis" <ma****@v.loewi s.de> wrote in message news:<bk******* ******@news.t-online.com>...
The usage of SZ has only been eliminated in the recent change of
the amtliche Rechtschreibung .


And replaced with what? ie. is there now a single capital for SZ?


(sz) has not been completely eliminated. After *short* vocals it has
been replace with ss (Ku => Kuss, Flu, => Fluss). But after *long*
vocals, it is still used (Ma, Gru, ...).

-- Gerhard

PS: I was quite disappointed with the reform of German ortography. I'd
have favoured much more radical steps, like elimination of
capitalization of the noun.

Jul 18 '05 #7
"Martin v. Lwis" wrote:
jallan wrote:
But that really doesn't work properly. According to Unicode specs and
German usage the uppercase of "" is actually "SS", that is the single
character "" should uppercase to two characters.
Can you cite exact chapter and verse of the Unicode specs that say so?
According to the Unicode database,

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

has neither an uppercase mapping, nor a lowercase mapping.


It seems like UnicodeData.txt does not give the full story. Quoting from
http://www.unicode.org/Public/UNIDAT...ialCasing.txt:

[...]
# (For compatibility, the UnicodeData.txt file only contains case mappings
for
# characters where they are 1-1, and does not have locale-specific
mappings.)
[...]
# <code>; <lower> ; <title> ; <upper> ; (<condition_lis t> ;)? # <comment>
[...]
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to
titlecase(upper case(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
[...]

Thus, to comply with the standard, "".upper() --> "SS" is required.
Also, in German, the uppercase mapping of is of ongoing debate.


My personal impression is that, even before the orthography reform in 1998,
the SZ variant was seldom used.
For the "official" rule see http://www.ids-mannheim.de/reform/a2-3.html.

Peter
Jul 18 '05 #8
Peter Otten <__*******@web. de> wrote in message news:<bk******* ******@news.t-online.com>...
"Martin v. Lwis" wrote:
jallan wrote:
But that really doesn't work properly. According to Unicode specs and
German usage the uppercase of "" is actually "SS", that is the single
character "" should uppercase to two characters.
Can you cite exact chapter and verse of the Unicode specs that say so?
According to the Unicode database,

http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

has neither an uppercase mapping, nor a lowercase mapping.


It seems like UnicodeData.txt does not give the full story. Quoting from
http://www.unicode.org/Public/UNIDAT...ialCasing.txt:

[...]

# (For compatibility, the UnicodeData.txt file only contains case mappings
for
# characters where they are 1-1, and does not have locale-specific
mappings.)
[...]
# <code>; <lower> ; <title> ; <upper> ; (<condition_lis t> ;)? # <comment>
[...]
# The German es-zed is special--the normal mapping is to SS.
# Note: the titlecase should never occur in practice. It is equal to
titlecase(upper case(<es-zed>))

00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
[...]

Thus, to comply with the standard, "".upper() --> "SS" is required.


Yes.

Also the Unicode main charts in the annotation for 00DF state:

uppercase is "SS"

See http://www.unicode.org/charts/PDF/U0080.pdf

This note on the character first appeared in Unicode 1.0 (published in
1991) and has been in every revision.

Unicode 1.0, Volume One also lists this in the lower case to upper
case casing tables on page 453.

There is nothing new about this casing requirement.

A further mention occurs in the Unicode 4.0 specifications in Table
4-1 in section 4.2 Case--Normative. See
http://www.unicode.org/versions/Unicode4.0.0/ch04.pdf

This contains the warning:

<< Only legacy implementations that cannot handle case mappings that
increase sring lengths should use UnicodeData case mappings alone. The
single-character mappings are insufficient for languages such as
German. >>

So is Python just another shit legacy implementation?

Jim Allan
Jul 18 '05 #9
af*****@yahoo.c o.uk (Asun Friere) writes:
The usage of SZ has only been eliminated in the recent change of
the amtliche Rechtschreibung .


And replaced with what? ie. is there now a single capital for SZ?


Unfortunately, I don't have a current Duden here, but I *think* you
now have to write double-S. There is, of course, the old MASSE vs
MASZE issue - I don't know whether this is considered relevant, as
capitalization is rare, anyway, and ambiguities can be clarified from
the context.

Regards,
Martin

Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
79638
by: Julia | last post by:
Hi, I need to convert unicode string to ansi string Thanks in adavance.
6
47290
by: BSHELTON | last post by:
How do I convert existing lowercase data to uppercase in Access 2000? I used the following which did not work? UPDATE HousingTowns SET tMunis=UPPER(tMunis); "HousingTowns" is the table name. "tMunis" is the field name in which the lowercase text is stored in 154 000 rows. I get the error message "Undefined function 'UPPER' in expression".
2
9509
by: kath | last post by:
Hi, 38938.0 <type 'unicode'> Traceback (most recent call last): File "D:\Python23\Testing area\Python and Excel\xlrdRead.py", line 30, in ? temp=xlrd.xldate_as_tuple(sh.cell_value(rowx=r,colx=c),0) File "D:\PYTHON23\Lib\site-packages\xlrd\xldate.py", line 61, in
6
12545
by: Aneesh E Warrier | last post by:
How can I convert Unicode Codepoint (dec) value to ASCII char? For example: 49324 is 사 a Korean alphabet, and I want to convert it to ASCII letter. Thanks!
3
5201
by: ldng | last post by:
Hi, I'm looking for a way to convert en unicode string encoded in UTF-8 to a raw string escaped with HTML Entities. I can't seem to find an easy way to do it. Quote from urllib will only work on ascii (which kind of defeat the purpose imho) and escape from cgi doesn't seems to do anything with my string.
2
3556
by: enginious | last post by:
Hi, I'm not sure if I'm barking up the wrong tree or not, but I gather that there could be potential security flaws by allowing unicode text to form part of an SQL query. Currently to prevent SQL injection attacks I use a script to remove script tags, apostrophes etc, but from what I understand if someone tried to use the unicode value of an apostrophe it would still have the same effect. If this is the case, is there a way to either...
9
6938
by: vaskar | last post by:
I have got one problem in VB.NET 2005 using unicode I want to add two numbers in vb.net 2005 through UNICODE I didn't know how to convert UNICODE string to integer. I hope you can solve my problem
0
571
by: M.-A. Lemburg | last post by:
On 2008-07-01 20:31, Peter Bulychev wrote: You could write a codec which translates Unicode into a ASCII lookalike characters, but AFAIK there is no standard for doing this. I guess the best choice is to use the Unicode code point names as basis. These can be accessed via unicodedata.name(). You can then create a mapping which can be processed by the character map codec.
2
2034
by: lenniekuah | last post by:
Hi Friends, I am very puzzle as how to convert a TextBox Data after ENTER keypress to Uppercase. This is wrong:- txtCustomerID.text = ucase(txtCustomerID.text) Please help me. I need your help
0
8516
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9054
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8778
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6440
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5796
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4308
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4532
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2945
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2205
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.