473,427 Members | 1,819 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,427 software developers and data experts.

Alphabetical sorts


I have several applications where I want to sort lists in alphabetical order.
Most examples of sorting usually sort on the ord() order of the character set as
an approximation. But that is not always what you want.

The solution of converting everything to lowercase or uppercase is closer, but
it would be nice if capitalized words come before lowercase words of the same
spellings. And I suspect ord() order may not be correct for some character sets.

So I'm wandering what others have done and weather there is something in the
standard library I haven't found for doing this.

Below is my current way of doing it, but I think it can probably be improved
quite a bit.

This partial solution also allows ignoring leading characters such as spaces,
tabs, and underscores by specifying what not to ignore. So '__ABC__' will be
next to 'ABC'. But this aspect isn't my main concern.

Maybe some sort of alphabetical order string could be easily referenced for
various alphabets instead of having to create them manually?

Also it would be nice if strings with multiple words were ordered correctly.
Cheers,
_Ron

def stripto(s, goodchrs):
""" Removes leading and trailing characters from string s
which are not in the string goodchrs.
"""
badchrs = set(s)
for c in goodchrs:
if c in badchrs:
badchrs.remove(c)
badchrs = ''.join(badchrs)
return s.strip(badchrs)
def alpha_sorted(seq):
""" Sort a list of strings in 123AaBbCc... order.
"""
order = ( '0123456789AaBbCcDdEeFfGgHhIiJjKkLlMmNn'
'OoPpQqRrSsTtUuVvWwXxYyZz' )
def chr_index(value, sortorder):
""" Make a sortable numeric list
"""
result = []
for c in stripto(value, order):
cindex = sortorder.find(c)
if cindex == -1:
cindex = len(sortorder)+ord(c)
result.append(cindex)
return result

deco = [(chr_index(a, order), a) for a in seq]
deco.sort()
return list(x[1] for x in deco)
Oct 16 '06 #1
7 2700
On 2006-10-16, Ron Adam <rr*@ronadam.comwrote:
>
I have several applications where I want to sort lists in
alphabetical order. Most examples of sorting usually sort on
the ord() order of the character set as an approximation. But
that is not always what you want.
Check out strxfrm in the locale module.
>>a = ["Neil", "Cerutti", "neil", "cerutti"]
a.sort()
a
['Cerutti', 'Neil', 'cerutti', 'neil']
>>import locale
locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>a.sort(key=locale.strxfrm)
a
['cerutti', 'Cerutti', 'neil', 'Neil']

--
Neil Cerutti
Oct 16 '06 #2

My application needs to handle different language sorts. Do you know a
way to apply strxfrm dynamically i.e. without setting the locale?

Tuomas

Neil Cerutti wrote:
On 2006-10-16, Ron Adam <rr*@ronadam.comwrote:
>>I have several applications where I want to sort lists in
alphabetical order. Most examples of sorting usually sort on
the ord() order of the character set as an approximation. But
that is not always what you want.


Check out strxfrm in the locale module.

>>>>a = ["Neil", "Cerutti", "neil", "cerutti"]
a.sort()
a

['Cerutti', 'Neil', 'cerutti', 'neil']
>>>>import locale
locale.setlocale(locale.LC_ALL, '')

'English_United States.1252'
>>>>a.sort(key=locale.strxfrm)
a

['cerutti', 'Cerutti', 'neil', 'Neil']
Oct 16 '06 #3


On Oct 16, 2:39 pm, Tuomas <tuomas.vesteri...@pp.inet.fiwrote:
My application needs to handle different language sorts. Do you know a
way to apply strxfrm dynamically i.e. without setting the locale?
Collation is almost always locale dependant. So you have to set locale.
One day I needed collation that worked on Windows and Linux. It's not
that polished and not that tested but it worked for me:

import locale, os, codecs

current_encoding = 'ascii'
current_locale = ''

def get_collate_encoding(s):
'''Grab character encoding from locale name'''
split_name = s.split('.')
if len(split_name) != 2:
return 'ascii'
encoding = split_name[1]
if os.name == "nt":
encoding = 'cp' + encoding
try:
codecs.lookup(encoding)
return encoding
except LookupError:
return 'ascii'

def setup_locale(locale_name):
'''Switch to new collation locale or do nothing if locale
is the same'''
global current_locale, current_encoding
if current_locale == locale_name:
return
current_encoding = get_collate_encoding(
locale.setlocale(locale.LC_COLLATE, locale_name))
current_locale = locale_name

def collate_key(s):
'''Return collation weight of a string'''
return locale.strxfrm(s.encode(current_encoding, 'ignore'))

def collate(lst, locale_name):
'''Sort a list of unicode strings according to locale rules.
Locale is specified as 2 letter code'''
setup_locale(locale_name)
return sorted(lst, key = collate_key)
words = u'c ch f'.split()
print ' '.join(collate(words, 'en'))
print ' '.join(collate(words, 'cz'))

Prints:

c ch f
c f ch

Oct 16 '06 #4
Neil Cerutti wrote:
On 2006-10-16, Ron Adam <rr*@ronadam.comwrote:
>I have several applications where I want to sort lists in
alphabetical order. Most examples of sorting usually sort on
the ord() order of the character set as an approximation. But
that is not always what you want.

Check out strxfrm in the locale module.
>>>a = ["Neil", "Cerutti", "neil", "cerutti"]
a.sort()
a
['Cerutti', 'Neil', 'cerutti', 'neil']
>>>import locale
locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
>>>a.sort(key=locale.strxfrm)
a
['cerutti', 'Cerutti', 'neil', 'Neil']
Thanks, that helps.

The documentation for local.strxfrm() certainly could be more complete. And the
name isn't intuitive at all. It also coorisponds to the C funciton for
translating strings which isn't the same thing.

For that matter locale.strcoll() isn't documented any better.

I see this is actually a very complex subject. A littler searching, found the
following link on Wikipedia.

http://en.wikipedia.org/wiki/Alphabe...ial_characters

And from there a very informative report:

http://www.unicode.org/unicode/reports/tr10/
It looks to me this would be a good candidate for a configurable class.
Something preferably in the string module where it could be found easier.

Is there anyway to change the behavior of strxfrm or strcoll? For example have
caps before lowercase, instead of after?
Cheers,
Ron
Oct 17 '06 #5
On 2006-10-17, Ron Adam <rr*@ronadam.comwrote:
Neil Cerutti wrote:
>On 2006-10-16, Ron Adam <rr*@ronadam.comwrote:
>>I have several applications where I want to sort lists in
alphabetical order. Most examples of sorting usually sort on
the ord() order of the character set as an approximation.
But that is not always what you want.

Check out strxfrm in the locale module.

It looks to me this would be a good candidate for a
configurable class. Something preferably in the string module
where it could be found easier.

Is there anyway to change the behavior of strxfrm or strcoll?
For example have caps before lowercase, instead of after?
You can probably get away with writing a strxfrm function that
spits out numbers that fit your definition of sorting.

--
Neil Cerutti
Whenever I see a homeless guy, I always run back and give him
money, because I think: Oh my God, what if that was Jesus?
--Pamela Anderson
Oct 17 '06 #6
On Mon, 16 Oct 2006 22:22:47 -0500, Ron Adam <rr*@ronadam.comwrote:
....
I see this is actually a very complex subject.
....
It looks to me this would be a good candidate for a configurable class.
Something preferably in the string module where it could be found easier.
/And/ choosing a locale shouldn't mean changing a process-global state.
Sometimes you want to perform something locale-depending in locale A,
followed by doing it in locale B. Switching locales today takes time and has
the same problems as global variables (unless there is another interface I
am not aware of).

But I suspect that is already a well-known problem.

/Jorgen

--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.dyndns.org R'lyeh wgah'nagl fhtagn!
Oct 17 '06 #7
Neil Cerutti wrote:
On 2006-10-17, Ron Adam <rr*@ronadam.comwrote:
>Neil Cerutti wrote:
>>On 2006-10-16, Ron Adam <rr*@ronadam.comwrote:
I have several applications where I want to sort lists in
alphabetical order. Most examples of sorting usually sort on
the ord() order of the character set as an approximation.
But that is not always what you want.
Check out strxfrm in the locale module.
It looks to me this would be a good candidate for a
configurable class. Something preferably in the string module
where it could be found easier.

Is there anyway to change the behavior of strxfrm or strcoll?
For example have caps before lowercase, instead of after?

You can probably get away with writing a strxfrm function that
spits out numbers that fit your definition of sorting.

Since that function is 'C' coded in the builtin _locale, it can't be modified by
python code.

Looking around some more I found the documentation for the corresponding C
functions and data structures. It looks like python may just wrap these.

http://opengroup.org/onlinepubs/0079...bd/locale.html
Here's one example of how to rewrite the Unicode collate in python.

http://jtauber.com/blog/2006/01

I haven't tried changing it's behavior, but I did notice it treats words with
hyphen in them differently than strxfrm.

Here's one way to change caps order.

a = ["Neil", "Cerutti", "neil", "cerutti"]

locale.setlocale(locale.LC_ALL, '')
tmp = [x.swapcase() for x in a]
tmp.sort(key=locale.strxfrm)
tmp = [x.swapcase() for x in tmp]
print tmp
['Cerutti', 'cerutti', 'Neil', 'neil']

Cheers,
Ron
Oct 17 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: R.G. Vervoort | last post by:
Does anyone have a suggestion how I can order a list with names in a secondary select string. The first string selects a number of locations where people work From this string I get...
1
by: Eric Lilja | last post by:
Hello, in my program I need to ask the user to input some alphabetical letters. Case should not matter. Any input that isn't an alphabetical letter should be rejected and the user prompted to try...
1
by: Richy Rich | last post by:
Hi, This may seem like a really stupid question, but I cannot get the source files to appear in alphabetical order in the solution explorer. I've clicked on the A...Z icon, but they don't...
2
by: David Veeneman | last post by:
Is there any way to change the default view in Visual Studio's property editors from categorical to alphabetized? I'm getting awfully tired of clicking the A-Z icon every time I open a collection...
1
by: quirdan | last post by:
Hi, I am after some advice about which data structures I should use. I'm developing a program and I am at the point where all the strings are being generated and printed one by one with...
4
Cyberdyne
by: Cyberdyne | last post by:
In your All Programs Menu, some of your programs are in alphabetical order and others are not. This makes it very difficult to seek out a program that may be hidden in a maze of program folders and...
7
by: canteyn | last post by:
Here is my problem: Structure typedef struct { char lname; char fname; int age; double salary; } employee_t;
3
by: eagerlearner | last post by:
I have the following code, which does not sort the list in alphabetical order, how can I sort everything in alphabetical order ? Before that, I want to ask, when an insertion occur, does it compare...
2
by: pavanip | last post by:
Hi, I have a problem with binding data to dropdownlist from database in alphabetical order. My database contains some fields like All,Air,Airline,Books,Cars etc. There are 2 dropdown...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.