Hi !
I know that this topic has been discussed in the past, but I could not
find a working solution for my problem: sorting (lists of) strings
containing special characters like "ä", "ü",... (german umlaute).
Consider the following list:
l = ["Aber", "Beere", "Ärger"]
For sorting the letter "Ä" is supposed to be treated like "Ae",
therefore sorting this list should yield
l = ["Aber, "Ärger", "Beere"]
I know about the module locale and its method strcoll(string1,
string2), but currently this does not work correctly for me. Consider
>>locale.strcoll("Ärger", "Beere")
1
Therefore "Ärger" ist sorted after "Beere", which is not correct IMO.
Can someone help?
Btw: I'm using WinXP (german) and
>>locale.getdefaultlocale()
prints
('de_DE', 'cp1252')
TIA.
Dierk 8 8808 Di**********@mail.com wrote:
Hi !
I know that this topic has been discussed in the past, but I could not
find a working solution for my problem: sorting (lists of) strings
containing special characters like "ä", "ü",... (german umlaute).
Consider the following list:
l = ["Aber", "Beere", "Ärger"]
For sorting the letter "Ä" is supposed to be treated like "Ae",
therefore sorting this list should yield
l = ["Aber, "Ärger", "Beere"]
I know about the module locale and its method strcoll(string1,
string2), but currently this does not work correctly for me. Consider
>>locale.strcoll("Ärger", "Beere")
1
Therefore "Ärger" ist sorted after "Beere", which is not correct IMO.
Can someone help?
Btw: I'm using WinXP (german) and
>>>locale.getdefaultlocale()
prints
('de_DE', 'cp1252')
TIA.
Dierk
we tried this in a javascript version and it seems to work sorry for long line
and possible bad translation to Python
#coding: cp1252
def _deSpell(a):
u = a.decode('cp1252')
return
u.replace(u'\u00C4','Ae').replace(u'\u00e4','ae'). replace(u'\u00D6','OE').replace(u'\u00f6','oe').re place(u'\u00DC','Ue').replace(u'\u00fc','ue').repl ace(u'\u00C5','Ao').replace(u'\u00e5','ao')
def deSort(a,b):
return cmp(_deSpell(a),_deSpell(b))
l = ["Aber", "Ärger", "Beere"]
l.sort(deSort)
print l
--
Robin Becker Di**********@mail.com wrote:
I know that this topic has been discussed in the past, but I could not
find a working solution for my problem: sorting (lists of) strings
containing special characters like "ä", "ü",... (german umlaute).
Consider the following list:
l = ["Aber", "Beere", "Ärger"]
For sorting the letter "Ä" is supposed to be treated like "Ae",
I don't think so:
>>sorted(["Ast", "Ärger", "Ara"], locale.strcoll)
['Ara', '\xc3\x84rger', 'Ast']
>>sorted(["Ast", "Aerger", "Ara"])
['Aerger', 'Ara', 'Ast']
therefore sorting this list should yield
l = ["Aber, "Ärger", "Beere"]
I know about the module locale and its method strcoll(string1,
string2), but currently this does not work correctly for me. Consider
>>locale.strcoll("Ärger", "Beere")
1
Therefore "Ärger" ist sorted after "Beere", which is not correct IMO.
Can someone help?
Btw: I'm using WinXP (german) and
>>>locale.getdefaultlocale()
prints
('de_DE', 'cp1252')
The default locale is not used by default; you have to set it explicitly
>>import locale locale.strcoll("Ärger", "Beere")
1
>>locale.setlocale(locale.LC_ALL, "")
'de_DE.UTF-8'
>>locale.strcoll("Ärger", "Beere")
-1
By the way, you will avoid a lot of "Ärger"* if you use unicode right from
the start.
Finally, for efficient sorting, a key function is preferable over a cmp
function:
>>sorted(["Ast", "Ärger", "Ara"], key=locale.strxfrm)
['Ara', '\xc3\x84rger', 'Ast']
Peter
(*) German for "trouble" Di**********@mail.com writes:
For sorting the letter "Ä" is supposed to be treated like "Ae",
therefore sorting this list should yield
l = ["Aber, "Ärger", "Beere"]
Are you sure? Maybe I'm thinking of another language, I thought Ä shold
be sorted together with A, but after A if the words are otherwise equal.
E.g. Antwort, Ärger, Beere. A proper strcoll handles that by
translating "Ärger" to e.g. ["Arger", <something like "E\0\0\0\0">],
then it can sort first by the un-accentified name and then by the rest.
--
Hallvard
Hallvard B Furuseth wrote:
Di**********@mail.com writes:
>For sorting the letter "Ä" is supposed to be treated like "Ae", therefore sorting this list should yield l = ["Aber, "Ärger", "Beere"]
Are you sure? Maybe I'm thinking of another language, I thought Ä
shold be sorted together with A, but after A if the words are
otherwise equal.
In German, there are some different forms:
- the classic sorting for e.g. word lists: umlauts and plain vowels
are of same value (like you mentioned): ä = a
- name list sorting for e.g. phone books: umlauts have the same
value as their substitutes (like Dierk described): ä = ae
There are others, too, but those are the most widely used.
Regards,
Björn
--
BOFH excuse #277:
Your Flux Capacitor has gone bad.
On 2 Mrz., 15:25, Peter Otten <__pete...@web.dewrote:
DierkErdm...@mail.com wrote:
For sorting the letter "Ä" is supposed to be treated like "Ae",
There are several way of defining the sorting order. The variant "ä
equals ae" follows DINDIN 5007 (according to wikipedia); defining (a
equals ä) complies with DIN 5007-1. Therefore both options are
possible.
The default locale is not used by default; you have to set it explicitly
>import locale locale.strcoll("Ärger", "Beere")
1
>locale.setlocale(locale.LC_ALL, "")
'de_DE.UTF-8'
>locale.strcoll("Ärger", "Beere")
-1
On my machine
>>locale.setlocale(locale.LC_ALL, "")
gives
'German_Germany.1252'
But this does not affect the sorting order as it does on your
computer.
>>locale.strcoll("Ärger", "Beere")
yields 1 in both cases.
Thank you for your hint using unicode from the beginning on, see the
difference:
>>s1 = unicode("Ärger", "latin-1") s2 = unicode("Beere", "latin-1") locale.strcoll(s1, s2)
1
>>locale.setlocale(locale.LC_ALL, "")
-1
compared to
>>s1 = "Ärger" s2 = "Beere" locale.strcoll(s1, s2)
1
>>locale.setlocale(locale.LC_ALL, "")
'German_Germany.1252'
>>locale.strcoll(s1, s2)
1
Thanks for your help.
Dierk
>
['Ara', '\xc3\x84rger', 'Ast']
Peter
(*) German for "trouble"
Bjoern Schliessmann wrote:
Hallvard B Furuseth wrote:
>Di**********@mail.com writes:
........
>
In German, there are some different forms:
- the classic sorting for e.g. word lists: umlauts and plain vowels
are of same value (like you mentioned): ä = a
- name list sorting for e.g. phone books: umlauts have the same
value as their substitutes (like Dierk described): ä = ae
There are others, too, but those are the most widely used.
Björn, in one of our projects we are sorting in javascript in several languages
English, German, Scandinavian languages, Japanese; from somewhere (I cannot
actually remember) we got this sort spelling function for scandic languages
a
..replace(/\u00C4/g,'A~') //A umlaut
..replace(/\u00e4/g,'a~') //a umlaut
..replace(/\u00D6/g,'O~') //O umlaut
..replace(/\u00f6/g,'o~') //o umlaut
..replace(/\u00DC/g,'U~') //U umlaut
..replace(/\u00fc/g,'u~') //u umlaut
..replace(/\u00C5/g,'A~~') //A ring
..replace(/\u00e5/g,'a~~'); //a ring
does this actually make sense?
--
Robin Becker
Robin Becker wrote:
Björn, in one of our projects we are sorting in javascript in
several languages English, German, Scandinavian languages,
Japanese; from somewhere (I cannot actually remember) we got this
sort spelling function for scandic languages
a
.replace(/\u00C4/g,'A~') //A umlaut
.replace(/\u00e4/g,'a~') //a umlaut
.replace(/\u00D6/g,'O~') //O umlaut
.replace(/\u00f6/g,'o~') //o umlaut
.replace(/\u00DC/g,'U~') //U umlaut
.replace(/\u00fc/g,'u~') //u umlaut
.replace(/\u00C5/g,'A~~') //A ring
.replace(/\u00e5/g,'a~~'); //a ring
does this actually make sense?
If I'm not mistaken, this would sort all umlauts after the "pure"
vowels. This is, according to < http://de.wikipedia.org/wiki/
Alphabetische_Sortierung>, used in Austria.
If you can't understand german, the rules given there in
section "Einsortierungsregeln" (roughly: ordering rules) translate
as follows:
"X und Y sind gleich": "X equals Y"
"X kommt nach Y": "X comes after Y"
Regards&HTH,
Björn
--
BOFH excuse #146:
Communications satellite used by the military for star wars.
Robin Becker kirjoitti:
>
Björn, in one of our projects we are sorting in javascript in several
languages English, German, Scandinavian languages, Japanese; from
somewhere (I cannot actually remember) we got this sort spelling
function for scandic languages
a
.replace(/\u00C4/g,'A~') //A umlaut
.replace(/\u00e4/g,'a~') //a umlaut
.replace(/\u00D6/g,'O~') //O umlaut
.replace(/\u00f6/g,'o~') //o umlaut
.replace(/\u00DC/g,'U~') //U umlaut
.replace(/\u00fc/g,'u~') //u umlaut
.replace(/\u00C5/g,'A~~') //A ring
.replace(/\u00e5/g,'a~~'); //a ring
does this actually make sense?
I think this order is not correct for Finnish, which is one of the
Scandinavian languages. The Finnish alphabet in alphabetical order is:
a-z, å, ä, ö
If I understand correctly your replacements cause the order of the last
3 characters to be
ä, å, ö
which is wrong.
HTH,
Jussi This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Robert Zierhofer |
last post by:
Hi all,
I currently face a problem with htmlentities and german "umlaute".
After moving my scripts to a new box (from Linux to FreeBSD) I had to
see that htmlentities is not working anymore....
|
by: Erlend Fuglum |
last post by:
Hi everyone,
I'm having some trouble sorting lists. I suspect this might have
something to do with locale settings and/or character
encoding/unicode.
Consider the following example, text...
|
by: Federico G. Babelis |
last post by:
Hi All:
I have this line of code, but the syntax check in VB.NET 2003 and also in
VB.NET 2005 Beta 2 shows as unknown:
Dim local4 As Byte
Fixed(local4 = AddressOf dest(offset))
...
|
by: Andrew L |
last post by:
Hello all,
What strategy should I use in solving the following problem? I have a list
of unicode strings which I would like to compare with its English language
'equivalent.' eg
"reykjavík"...
|
by: mike |
last post by:
If I had a date in the format "01-Jan-05" it does not sort properly
with my sort routine:
function compareDate(a,b)
{
var date_a = new Date(a);
var date_b = new Date(b);
if (date_a < date_b)...
| |
by: Carlo Marchesoni |
last post by:
In order to have a mutli-lingual page, I load all .Text, .ToolTip etc from a
resource, if the user does not work with the default-language, like this:
Thread.CurrentThread.CurrentCulture = new...
|
by: news.online.de |
last post by:
Hello everybody,
probably it's a FAQ but I didn't find anything so far concerning my problem,
so I am asking here :-)
I am facing the following problem:
- I have developed a webservice client...
|
by: Wim Cossement |
last post by:
Hello,
I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.
Since I'm not used to using PHP a lot, I already found...
|
by: AMT India |
last post by:
I am having a list of countries, among which some of them starts with German special characters ( like Umplot). I want to sort the list independent of this German characters. So that Umplot will come...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |