469,288 Members | 2,353 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,288 developers. It's quick & easy.

pythonic way to sort

hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

......

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....

thanks...

May 4 '06 #1
3 1439
mi*******@hotmail.com wrote:
hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

.....

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....


In Python 2.4 and up, you can use the key= keyword to list.sort(). E.g.

In [2]: text = """1SOME STRING ~ABC~12311232432D~20060401~00000000
...: 2SOME STRING ~DEF~13534534543C~20060401~00000000
...: 3SOME STRING ~ACD~14353453554G~20060401~00000000"""

In [3]: lines = text.split('\n')

In [4]: lines
Out[4]:
['1SOME STRING ~ABC~12311232432D~20060401~00000000',
'2SOME STRING ~DEF~13534534543C~20060401~00000000',
'3SOME STRING ~ACD~14353453554G~20060401~00000000']

In [5]: lines.sort(key=lambda x: x.split('~')[1])

In [6]: lines
Out[6]:
['1SOME STRING ~ABC~12311232432D~20060401~00000000',
'3SOME STRING ~ACD~14353453554G~20060401~00000000',
'2SOME STRING ~DEF~13534534543C~20060401~00000000']

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

May 4 '06 #2

On May 4, 2006, at 12:12 AM, mi*******@hotmail.com wrote:
hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

.....

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....

thanks...


A couple ways. Assume that you have the lines in a list called 'lines',
as follows:

lines = [
"1SOME STRING ~ABC~12311232432D~20060401~00000000",
"3SOME STRING ~ACD~14353453554G~20060401~00000000",
"2SOME STRING ~DEF~13534534543C~20060401~00000000"]
The more traditional way would be to define your own comparison
function:

def my_cmp(x,y):
return cmp( x.split("~")[1], y.split("~")[1])

lines.sort(cmp=my_cmp)
The newer, faster way, would be to define your own key function:

def my_key(x):
return x.split("~")[1]

lines.sort(key=my_key)
The key function is faster because you only have to do the
split("~")[1] once for each line, whereas it will be done many times
for each line if you use a comparison function.

Jay P.

May 4 '06 #3
Jay Parlar wrote:

On May 4, 2006, at 12:12 AM, mi*******@hotmail.com wrote:
[...] Assume that you have the lines in a list called 'lines',
as follows:

lines = [
"1SOME STRING ~ABC~12311232432D~20060401~00000000",
"3SOME STRING ~ACD~14353453554G~20060401~00000000",
"2SOME STRING ~DEF~13534534543C~20060401~00000000"]
The more traditional way would be to define your own comparison function:

def my_cmp(x,y):
return cmp( x.split("~")[1], y.split("~")[1])

lines.sort(cmp=my_cmp)
The newer, faster way, would be to define your own key function:

def my_key(x):
return x.split("~")[1]

lines.sort(key=my_key)


and if the data is in a file rather than a list, you may write eg

lines = sorted(file("/path/tofile"),key=mike)

to create it sorted.
May 4 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Tom Evans | last post: by
10 posts views Thread by Bulba! | last post: by
11 posts views Thread by Charles Krug | last post: by
6 posts views Thread by Sean Berry | last post: by
2 posts views Thread by Tony Nelson | last post: by
4 posts views Thread by Carl J. Van Arsdall | last post: by
14 posts views Thread by Pythor | last post: by
5 posts views Thread by akameswaran | last post: by
2 posts views Thread by wink | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.