By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,587 Members | 1,677 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,587 IT Pros & Developers. It's quick & easy.

Sorting a list

P: n/a
Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based on
the year? Is there a better way to do this than making a list of tuples?

(So far I have a text file and on each line is a citation. I use an RE
to search for the year, then put this year and the entire citation in a
tuple, and add this tuple to a list. Perhaps this step can be changed to
be more efficient when I then need to sort them by date and write a new
file with the citations in order.)

Thanks.
Feb 1 '07 #1
Share this Question
Share on Google+
17 Replies


P: n/a
John Salerno wrote:
Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order
L.sort()

--
"kad ima¹ 7 godina glup si ko kurac, sve je predobro: autiæi i bageri u
kvartu.. to je ¾ivot"
Drito Konj
Feb 1 '07 #2

P: n/a
John Salerno a écrit :
Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based on
the year?
Calling sort() on the list should just work.
Is there a better way to do this than making a list of tuples?
Depends...
(So far I have a text file and on each line is a citation. I use an RE
to search for the year, then put this year and the entire citation in a
tuple, and add this tuple to a list.
You don't tell how these lines are formatted, but it's possible that you
don't even need a regexp here. But wrt/ sorting, the list of tuples with
the sort key as first element is one of the best solutions.
Feb 1 '07 #3

P: n/a
Bruno Desthuilliers wrote:
You don't tell how these lines are formatted, but it's possible that you
don't even need a regexp here. But wrt/ sorting, the list of tuples with
the sort key as first element is one of the best solutions.
Ah, so simply using sort() will default to the first element of each tuple?

The citations are like this:

lastname, firstname. (year). title. other stuff.
Feb 1 '07 #4

P: n/a
Bruno Desthuilliers wrote:
John Salerno a écrit :
>Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based
on the year?

Calling sort() on the list should just work.
Amazing, it was that easy. :)
Feb 1 '07 #5

P: n/a
John Salerno wrote:
Bruno Desthuilliers wrote:
>John Salerno a écrit :
>>Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based
on the year?

Calling sort() on the list should just work.

Amazing, it was that easy. :)
Here's what I did:

import re

file = open('newrefs.txt')
text = file.readlines()
file.close()

newfile = open('sortedrefs.txt', 'w')
refs = []

pattern = re.compile('\(\d{4}\)')

for line in text:
year = pattern.search(line).group()
refs.append((year, line))

refs.sort()

for ref in refs:
newfile.write(ref[1])

newfile.close()
Feb 1 '07 #6

P: n/a
John Salerno wrote:
Bruno Desthuilliers wrote:
>John Salerno a écrit :
>>Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based
on the year?

Calling sort() on the list should just work.

Amazing, it was that easy. :)
One more thing. What if I want them in reverse chronological order? I
tried reverse() but that seemed to put them in reverse alphabetical
order based on the second element of the tuple (not the year).
Feb 1 '07 #7

P: n/a
John Salerno a écrit :
Bruno Desthuilliers wrote:
>You don't tell how these lines are formatted, but it's possible that
you don't even need a regexp here. But wrt/ sorting, the list of
tuples with the sort key as first element is one of the best solutions.


Ah, so simply using sort() will default to the first element of each tuple?
Yes. Then on the second value if the first compares equal, etc...
The citations are like this:

lastname, firstname. (year). title. other stuff.
Then you theoretically don't even need regexps:
>>line = "lastname, firstname. (year). title. other stuff."
line.split('.')[1].strip().strip('()')
'year'

But since you may have a dot in the "lastname, firstname" part, I'd
still go for a regexp here just to make sure.
Feb 1 '07 #8

P: n/a
John Salerno a écrit :
Bruno Desthuilliers wrote:
>John Salerno a écrit :
>>Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order based
on the year?


Calling sort() on the list should just work.


Amazing, it was that easy. :)
A very common Python idiom is "decorate/sort/undecorate", which is just
what you've done here. It's usually faster than passing a custom
comparison callback function (cf a recent thread named "Sorting a List
of Lists, where Paddy posted a link to a benchmark).
Feb 1 '07 #9

P: n/a
John Salerno a écrit :
John Salerno wrote:
>Bruno Desthuilliers wrote:
>>John Salerno a écrit :

Hi everyone. If I have a list of tuples, and each tuple is in the form:

(year, text) as in ('1995', 'This is a citation.')

How can I sort the list so that they are in chronological order
based on the year?
Calling sort() on the list should just work.


Amazing, it was that easy. :)


One more thing. What if I want them in reverse chronological order? I
tried reverse() but that seemed to put them in reverse alphabetical
order based on the second element of the tuple (not the year).
Really ?
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'), ('1996', 'aaa')]
>>lines.sort()
lines
[('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc'), ('1996', 'aaa'),
('1996', 'ccc'), ('1997', 'aaa'), ('1997', 'bbb')]
>>lines.reverse()
lines
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'ccc'), ('1995', 'bbb'), ('1995', 'aaa')]
>>>
As you see, the list is being sorted on *both* items - year first, then
sentence. And then of course reversed, since we asked for it !-)

If you want to prevent this from happening and don't mind creating a
copy of the list, you can use the sorted() function with the key and
reverse arguments and operator.itemgetter:
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'), ('1996', 'aaa')]
>>from operator import itemgetter
sorted(lines, key=itemgetter(0), reverse=True)
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc')]

HTH.
Feb 1 '07 #10

P: n/a
Bruno Desthuilliers wrote:
If you want to prevent this from happening and don't mind creating a
copy of the list, you can use the sorted() function with the key and
reverse arguments and operator.itemgetter:
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'), ('1996', 'aaa')]
>>from operator import itemgetter
>>sorted(lines, key=itemgetter(0), reverse=True)
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc')]
You don't need to use sorted() -- sort() also takes the key= and
reverse= arguments::
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
... ('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'),
... ('1996', 'aaa')]
>>from operator import itemgetter
lines.sort(key=itemgetter(0), reverse=True)
lines
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc')]

STeVe
Feb 1 '07 #11

P: n/a
Steven Bethard a écrit :
Bruno Desthuilliers wrote:
>If you want to prevent this from happening and don't mind creating a
copy of the list, you can use the sorted() function with the key and
reverse arguments and operator.itemgetter:
(snip)
>
You don't need to use sorted() -- sort() also takes the key= and
reverse= arguments::
Yeps - thanks for the reminder.
Feb 1 '07 #12

P: n/a
Bruno Desthuilliers wrote:
>One more thing. What if I want them in reverse chronological order? I
tried reverse() but that seemed to put them in reverse alphabetical
order based on the second element of the tuple (not the year).

Really ?
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'), ('1996', 'aaa')]
>>lines.sort()
>>lines
[('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc'), ('1996', 'aaa'),
('1996', 'ccc'), ('1997', 'aaa'), ('1997', 'bbb')]
>>lines.reverse()
>>lines
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'ccc'), ('1995', 'bbb'), ('1995', 'aaa')]
>>>
Oh I didn't sort then reverse, I just replaced sort with reverse. Maybe
that's why!
Feb 1 '07 #13

P: n/a
John Salerno a écrit :
(snip)
Oh I didn't sort then reverse, I just replaced sort with reverse. Maybe
that's why!
Hmmm... Probably, yes...

!-)
Feb 1 '07 #14

P: n/a
John Salerno <jo******@NOSPAMgmail.comwrites:
Ah, so simply using sort() [on a list of tuples] will default to the
first element of each tuple?
More precisely, list.sort will ask the elements of the list to compare
themselves. Those elements are tuples; two tuples will compare based
on comparison of their corresponding elements.

--
\ "The cost of a thing is the amount of what I call life which is |
`\ required to be exchanged for it, immediately or in the long |
_o__) run." -- Henry David Thoreau |
Ben Finney

Feb 1 '07 #15

P: n/a
On Thu, 01 Feb 2007 14:52:03 -0500, John Salerno wrote:
Bruno Desthuilliers wrote:
>You don't tell how these lines are formatted, but it's possible that you
don't even need a regexp here. But wrt/ sorting, the list of tuples with
the sort key as first element is one of the best solutions.

Ah, so simply using sort() will default to the first element of each tuple?
No. It isn't that sort() knows about tuples. sort() knows how to sort a list
by asking the list items to compare themselves, whatever the items are.
Tuples compare themselves by looking at the first element (if any), and
in the event of a tie going on to the second element, then the third, etc.

--
Steven D'Aprano

Feb 2 '07 #16

P: n/a
Steven Bethard <st************@gmail.comwrote:
You don't need to use sorted() -- sort() also takes the key= and
reverse= arguments::
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
... ('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'),
... ('1996', 'aaa')]
>>from operator import itemgetter
>>lines.sort(key=itemgetter(0), reverse=True)
>>lines
[('1997', 'bbb'), ('1997', 'aaa'), ('1996', 'ccc'), ('1996', 'aaa'),
('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc')]
I suspect you want another line in there to give the OP what they actually
want: sort the list alphabetically first and then reverse sort on the year.

The important thing to note is that the reverse flag on the sort method
doesn't reverse elements which compare equal. This makes it possible to
sort on multiple keys comparatively easily.
>>lines = [('1995', 'aaa'), ('1997', 'bbb'), ('1995', 'bbb'),
('1997', 'aaa'), ('1995', 'ccc'), ('1996', 'ccc'),
('1996', 'aaa')]
>>from operator import itemgetter
lines.sort(key=itemgetter(1))
lines.sort(key=itemgetter(0), reverse=True)
lines
[('1997', 'aaa'), ('1997', 'bbb'), ('1996', 'aaa'), ('1996', 'ccc'),
('1995', 'aaa'), ('1995', 'bbb'), ('1995', 'ccc')]
Feb 2 '07 #17

P: n/a
Bruno Desthuilliers wrote:
John Salerno a écrit :
(snip)
>Oh I didn't sort then reverse, I just replaced sort with reverse.
Maybe that's why!

Hmmm... Probably, yes...

!-)
lol, this is what a couple months away from python does to me!
Feb 2 '07 #18

This discussion thread is closed

Replies have been disabled for this discussion.