470,604 Members | 2,179 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,604 developers. It's quick & easy.

true alphabetic sort...

At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the land Islands... and this is at the end of the list.

I'm not sure it's supposed to be.

Now I could just alter my comparison so it ignores the top bit, but this
would then put it at the top of the list, even before Albania...
Alternatively, should I put after A?

In short, is there a preferred way of ordering these?

Thanks,

Ian
Jul 23 '05 #1
13 4772
Ian Richardson <za*****@chaos.org.uk> skrev :
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the land Islands... and this is at the end of the list.


Yes, and it's correct.

In swedish, danish and norwegian is "" the last letter in the
alphabet.
--
Knud
Jul 23 '05 #2
Knud Gert Ellentoft wrote on 24 apr 2004 in comp.lang.javascript:
In swedish, danish and norwegian is "" the last letter in the
alphabet.


Just curious:

This will write "" overhere:

document.write(''.toLowercase)

Does this work for all European alphabets?

=============================

When should I use:

document.write(''.toLocaleLowerCase())

?

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #3
"Evertjan." <ex**************@interxnl.net> writes:
Just curious:

This will write "" overhere:

document.write(''.toLowercase)

Does this work for all European alphabets?
It works for any Unicode letter, using the Unicode character database
for the translation.
=============================

When should I use:

document.write(''.toLocaleLowerCase())


Never, for the letter "".
In ECMA 262, secion 15.5.4.17, the reason given for using
toLocaleLowerCase, is for languages where the language rules conflict
with the regular Unicode mapping. Tukish is given as an example.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 23 '05 #4
Ivo
"Knud Gert Ellentoft" wrote
Ian Richardson skrev :
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the land Islands... and this is at the end of the list.


Yes, and it's correct.
In swedish, danish and norwegian is "" the last letter in the
alphabet.


This is interesting. It may be that the follows Z in those languages, but
this is new for me and probably the rest of the world. In a long
alphabetical list, I and the OP would look for after A, and so I think in
a web-environment it probably should be put there. Where do the French put
the character in the French alphabet? Where do the Germans put the ? I
would look for it after the B.

As for a javascript solution, the easiest would probably be replacing all
occurances of and perhaps with an A prior to sorting the list. This
would result in a mix of accented and normal A's which is not perfect. land
must come after Aruba but before Bermuda. We must write our own comparison.
It involves

var abc = 'ABCDEFGHIJ' +
'KLMNOPQRSSTVWXYYZ';

and abc.toLowerCase() and testing for indexOf but I 'm quite not sure how.
The following covers first letters only:

function compare(a, b) {
if (abc.indexOf(a.charAt(0)) < abc.indexOf(b.charAt(0)))
{
return -1;
}
if (abc.indexOf(a.charAt(0)) > abc.indexOf(b.charAt(0)))
{
return 1;
}
return 0;
}
var islands=['Curaao','Bonaire','land','Aruba'];
alert(islands.sort(compare));

HTH
v
Jul 23 '05 #5
"Ivo" <no@thank.you> skrev :
This is interesting. It may be that the follows Z in those languages, but
this is new for me and probably the rest of the world. In a long
alphabetical list, I and the OP would look for after A, and so I think in
a web-environment it probably should be put there. Where do the French put
the character in the French alphabet? Where do the Germans put the ? I
would look for it after the B.


I know only the scandinavian languages and a scandinavian would
look for "" (and .. and ) at the the end of the alfabet, so
therefor I would let it be as the last letter.
--
Knud
Jul 23 '05 #6
"Ivo" <no@thank.you> writes:
This is interesting. It may be that the follows Z in those languages,
That would be all languages that actually have "" as a letter.
but this is new for me and probably the rest of the world.
Hard to say. Microsoft seems to know it. When they alphabetize Danish
words, the double-A, the original form which was turned into the new
letter "", comes last (with predictable incorrect results for the
foreign word Aardwark).
In a long alphabetical list, I and the OP would look for after A,
and so I think in a web-environment it probably should be put
there.
That entirely depends on the language. If you are sorting words from
different languages, I can see the problem, but would probably prefer
to have it last anyway. It is a letter in its own, not just a letter
with a accent.
Where do the French put the character in the French alphabet?
It's a c-cedilla, that is, a "c" with an accent. It is not a separate
letter.
Where do the Germans put the ? I would look for it after the B.
That would be a weird place to look for a sharp S. It is *not* a beta
(it is an s-z-ligature).
As for a javascript solution, the easiest would probably be replacing all
occurances of and perhaps with an A prior to sorting the list.
That's one choice. Since you cannot fix one language to work with, I
don't think there is an official way to alphabetize.
I would probably expand (the a-e-ligature) to AE.
This would result in a mix of accented and normal A's which is not
perfect.


Alas, perfect does not exist.
The closest to perfect for my tastes is to alphabetize letters according
to the language they come from, so Aalborg (Danish city using old spelling)
would be after Zaire, but Aardwark would be under "A".

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 23 '05 #7
Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:
In ECMA 262, secion 15.5.4.17, the reason given for using
toLocaleLowerCase, is for languages where the language rules conflict
with the regular Unicode mapping. Tukish is given as an example.


Not in
<http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
from 1997, which stops at 15.5.4.12

There should be a 3rd edition, but I cannot find it on the web.

Do you have an URL?
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #8
"Evertjan." <ex**************@interxnl.net> writes:
Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:
In ECMA 262, secion 15.5.4.17, the reason given for using
toLocaleLowerCase, is for languages where the language rules conflict
with the regular Unicode mapping. Tukish is given as an example.
Not in
<http://developer.netscape.com/docs/javascript/e262-pdf.pdf>
from 1997, which stops at 15.5.4.12

There should be a 3rd edition, but I cannot find it on the web.

Do you have an URL?


I use this one:
<URL:http://www.mozilla.org/js/language/E262-3.pdf>
It seems to be more recent, and better formatted, than the official
version from ECMA itself. I fail to imaginie an explanation for that :)
<URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 23 '05 #9
Lasse Reichstein Nielsen wrote on 25 apr 2004 in comp.lang.javascript:
I use this one:
<URL:http://www.mozilla.org/js/language/E262-3.pdf>


tnx,

Interesting reading.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Jul 23 '05 #10
JRS: In article <c6************@ID-99375.news.uni-berlin.de>, seen in
news:comp.lang.javascript, Ian Richardson <za*****@chaos.org.uk> posted
at Sat, 24 Apr 2004 20:17:30 :
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the land Islands... and this is at the end of the list.

I'm not sure it's supposed to be.

Now I could just alter my comparison so it ignores the top bit, but this
would then put it at the top of the list, even before Albania...
Alternatively, should I put after A?

In short, is there a preferred way of ordering these?

I don't think those Islands *are* a country, but ICBW; are they not
loose bits of Finland - or are they a country in the same sense as Wales
& Scotland are? I have enough difficulty in determining which parts of
the globe are in the EU, or associated, or whatever, for
<URL:http://www.merlyn.demon.co.uk/european.htm>.
However, while &Aring; may well sort to the end of the alphabet in all
languages that use it, that does not necessarily mean that all letters
of the extended Roman Alphabet sort to identical positions in all
countries that use them. It is possible that Potaniland sorts &AElig;
between A & B, while Erewhon puts it at the end.

I think all likely extended-roman letters can be mapped in an obvious
manner to one or two English letters; it is probably best to use that,
then sort. After all, even foreigners will probably not know the proper
sort order for languages other than their own; but they will be used to
what the Anglos do with their names. My fair-sized atlas indexes those
Islands as "Aland", in the middle of the "A" section.

Remember that the proper names of Asian and North African countries need
transliteration to be readable by the average Anglo - and may be quite
different too : one does not necessarily seek Bharat or Nippon among the
B or N sections.

<URL:http://www.merlyn.demon.co.uk/quotes.htm#FredHoyle> :-)

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> Jim Ley's FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Jul 23 '05 #11
Dr John Stockton wrote:
JRS: In article <c6************@ID-99375.news.uni-berlin.de>, seen in
news:comp.lang.javascript, Ian Richardson <za*****@chaos.org.uk> posted
at Sat, 24 Apr 2004 20:17:30 :
At the moment I'm using a quicksort algorithm to sort a list of
countries in alphabetic order. This worked wonderfully until someone
came up with the land Islands... and this is at the end of the list.

I'm not sure it's supposed to be.

<snip>
I don't think those Islands *are* a country, but ICBW


<snip>

According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

<snip>

I guess what I'm looking for is a language-specific dictionary sort, if
such a thing exists, defaulting to a Unicode or some other default order
if not.

Ian
Jul 23 '05 #12
Ian Richardson wrote:

According to ftp://ftp.ripe.net/iso3166-countrycodes.txt, it's a country.

<snip>

I guess what I'm looking for is a language-specific dictionary sort, if
such a thing exists, defaulting to a Unicode or some other default order
if not.

Ian


land is part of Finland, and Finland is an independent country. Member
of UN.

Jul 23 '05 #13
Lasse Reichstein Nielsen wrote:
"Evertjan." <ex**************@interxnl.net> writes:
There should be a 3rd edition, but I cannot find it on the web.

Do you have an URL?


I use this one:
<URL:http://www.mozilla.org/js/language/E262-3.pdf>
It seems to be more recent, and better formatted, than the official
version from ECMA itself. I fail to imaginie an explanation for that :)
<URL:http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf>


Well, Netscape is (was?) developing the next version of JavaScript (v2.0)
which should (have?) become the next edition of ECMAScript (ed. 4). Since
AOLTW (apparently only temporarily) closed the Netscape browser division[1]
and consequently Netscape is (currently) no longer a member of ECMA and
AOLTW is neither, that might be a reason.
PointedEars
___________
[1] <http://www.holgermetzger.de/Netscape_History.html>
Jul 23 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

46 posts views Thread by Scott Chapman | last post: by
11 posts views Thread by Arpan | last post: by
2 posts views Thread by Bit byte | last post: by
7 posts views Thread by emre esirik(hacettepe computer science and enginee | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.