473,395 Members | 1,616 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

sorting tuples...

Hello guys,

I made a script that extracts strings from a binary file. It works.

My next problem is sorting those strings.

Output is like:

---- snip ----
200501221530
John
*** long string here ***

200504151625
Clyde
*** clyde's long string here ***

200503130935
Jeremy
*** jeremy string here ****
---- snip ----

How can I go about sorting this list based on the date string that
marks the start of each message?

Should I be using lists, dictionaries or tuples?

What should I look into?

Is there a way to generate variables in a loop? Like:

x=0
while (x<10):
# assign variable-x = [...list...]
x = x+1

Thanks.

Sep 17 '05 #1
8 2240
Uhm, if the file is clean you can use something like this:

data = """\
200501221530
John
*** long string here ***

200504151625
Clyde
*** clyde's long string here ***

200503130935
Jeremy
*** jeremy string here ****"""

records = [rec.split("\n") for rec in data.split("\n\n")]
records.sort()
print records

If it's not clean, you have to put some more cheeks/cleanings.

Bye,
bearophile

Sep 17 '05 #2
On 17 Sep 2005 06:41:08 -0700, ni****@gmail.com wrote:
Hello guys,

I made a script that extracts strings from a binary file. It works.

My next problem is sorting those strings.

Output is like:

---- snip ----
200501221530
John
*** long string here ***

200504151625
Clyde
*** clyde's long string here ***

200503130935
Jeremy
*** jeremy string here ****
---- snip ----

How can I go about sorting this list based on the date string that
marks the start of each message?

Should I be using lists, dictionaries or tuples?

What should I look into?

Is there a way to generate variables in a loop? Like:

x=0
while (x<10):
# assign variable-x = [...list...]
x = x+1

Thanks.

Assuming your groups of strings are all non-blank lines delimited by blank lines,
and using StringIO as a line iterable playing the role of your source of lines,
(not tested beyond what you see ;-)
from StringIO import StringIO
lines = StringIO("""\ ... 200501221530
... John
... *** long string here ***
...
... 200504151625
... Clyde
... *** clyde's long string here ***
...
... 200503130935
... Jeremy
... *** jeremy string here ****
... """)
from itertools import groupby
for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k):
... print t
...
('200501221530\n', 'John\n', '*** long string here ***\n')
('200503130935\n', 'Jeremy\n', '*** jeremy string here ****\n')
('200504151625\n', 'Clyde\n', "*** clyde's long string here ***\n")

The lambda computes a grouping key that groupby uses to collect group members
as long as the value doesn't change, so this groups non-blank vs blank lines,
and the "if k" throws out the blank-line groups.

Obviously you could do something else with the sorted line tuples t, e.g.,
lines.seek(0) (just needed that to rewind the StringIO data here)
for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k):
... width = max(map(lambda x:len(x.rstrip()), t))
... topbot = '+-%s-+'%('-'*width)
... print topbot
... for line in t: print '| %s |' % line.rstrip().ljust(width)
... print topbot
... print
...
+--------------------------+
| 200501221530 |
| John |
| *** long string here *** |
+--------------------------+

+-----------------------------+
| 200503130935 |
| Jeremy |
| *** jeremy string here **** |
+-----------------------------+

+----------------------------------+
| 200504151625 |
| Clyde |
| *** clyde's long string here *** |
+----------------------------------+

Or of course you can just print the sorted groups bare:
lines.seek(0)
for t in sorted(tuple(g) for k, g in groupby(lines, ... lambda line:line.strip()!='') if k):
... print ''.join(t)
...
200501221530
John
*** long string here ***

200503130935
Jeremy
*** jeremy string here ****

200504151625
Clyde
*** clyde's long string here ***


If your source of line groups is not delimited by blank lines,
or has other non-blank lines, you will have to change the source
or change the lambda to some other key function that produces one
value for the lines to include (True if you want to use if k as above)
and another (False) for the ones to exclude.

HTH

Regards,
Bengt Richter
Sep 17 '05 #3

Thank you very much.

I'll look into this immediately.

I edited my code earlier and came up with stringing the groups
(200501202010, sender, message_string) into one string delimited by
'%%%'.

I could then sort the messages with the date string at the beginning as
the one being sorted with the big string in its "tail" being sorted
too.

200501202010%%%sender%%%message_string
200502160821%%%sender%%%message_string
....

After sorting this list of long strings, I could then split them up
using the '%%%' delimiter and arrange them properly for output.

It's crude but at least I achieve what I wanted done.

But both posters gave good advices, if not a bit too advanced for me.
I'll play with them and keep tweaking my code.

Thanks so much!

--
/nh

Sep 21 '05 #4
id***@gmail.com wrote:
I edited my code earlier and came up with stringing the groups
(200501202010, sender, message_string) into one string delimited by
'%%%'.
Why? It seems you are trying to use a string as some kind of container,
and Python has those in the box. Just use a list of tuples, rather than
a list of strings. That will work fine for .sort(), and it's much more
convenient to access your data. Using the typical tool for extracting
binary data from files/strings will give you tuples by default.
import struct # Check this out in library ref.
# I'm inventing a simple binary format with everything
# as strings in fixed positions. There's just one string
# below, adjacent string literals are concatenated by
# Python. I split it over three lines for readability.
bin = ( "200501221530John *** long string here *** "
"200504151625Clyde *** clyde's long string here ***"
"200503130935Jeremy *** jeremy string here **** ") fmt="@12s8s32s" # imagined binary format.
l=52 # 12+8+32, from previous line
msgs = []
for i in range(3): .... # struct.unpack will return a tuple. It works well
.... # with numeric data too.
.... msgs.append(struct.unpack(fmt, bin[i*l:(i+1)*l]))
msgs.sort()
for msg in msgs: .... print msg

('200501221530', 'John ', '*** long string here *** ')
('200503130935', 'Jeremy ', '*** jeremy string here **** ')
('200504151625', 'Clyde ', "*** clyde's long string here ***")
I could then sort the messages with the date string at the beginning as
the one being sorted with the big string in its "tail" being sorted
too.


This works equally well with a list of tuples. Another benefit of
the list of tuples approach is that you don't need to cast everything
to strings. If parts of your data is e.g. numeric, just let it be an
int, a long or a float in your struct, and sorting will work correctly
without any need to format the number in such a way as to make string
sorting work exactly as numeric sorting.

Here's an example with numeric data:
b = ( '\x00\x00\x07\xd5\x00\x00\x00\x01\x00\x00\x00\x16\ x00\x00\x00'
'\x0f\x00\x00\x00\x1eJohn\x00\x00\x00\x00*** long string here'
' ***\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\xd 5\x00\x00'
'\x00\x03\x00\x00\x00\r\x00\x00\x00\t\x00\x00\x00# Jeremy\x00'
'\x00*** jeremy string here ****\x00\x00\x00\x00\x00\x00\x00'
'\x07\xd5\x00\x00\x00\x04\x00\x00\x00\x0f\x00\x00\ x00\x10\x00'
'\x00\x00\x19Clyde\x00\x00\x00*** clyde\'s long string here ***') fmt="!iiiii8s32s"
l = 60 # five ints (5*4) + 8 + 32
bin_msgs=[]
for i in range(3): bin_msgs.append(struct.unpack(fmt, bin[i*l:(i+1)*l]))

bin_msgs.reverse() # unsort...
bin_msgs.sort()
for msg in bin_msgs:

print msg
(2005, 1, 22, 15, 30, 'John\x00\x00\x00\x00', '*** long string here
***\x00\x00\x00\x00\x00\x00\x00\x00')
(2005, 3, 13, 9, 35, 'Jeremy\x00\x00', '*** jeremy string here
****\x00\x00\x00\x00\x00')
(2005, 4, 15, 16, 25, 'Clyde\x00\x00\x00', "*** clyde's long string here
***")
Sep 26 '05 #5

Magnus Lycka wrote:
Why? It seems you are trying to use a string as some kind of container,
and Python has those in the box. Just use a list of tuples, rather than
a list of strings. That will work fine for .sort(), and it's much more
convenient to access your data. Using the typical tool for extracting
binary data from files/strings will give you tuples by default.


my problem with tuples & lists is that i don't know how to assign data
to them properly. i'm quite new in python ;)

with the binary stuff out of the way, what i have is this string data:

20050922 # date line
mike
mike's message...
20040825 # date line
jeremy
jeremy's message...
....

what i want to do is to use the date line as the first data in a tuple
and the succeeding lines goes into the tuple, like:

(20050922, mike, mike's message)

then when it matches another date line it makes another new tuple with
that date line as the header data and the succeeding data, etc..

(20050922, mike, mike's message)
(20040825, jeremy, jeremy's message)
....

then i would sort the tuples according to the date.

is there an easier/proper way of doing this without generating alot of
tuples?

thanks! for the help :)

Sep 28 '05 #6
On 27 Sep 2005 19:01:38 -0700,
ni****@gmail.com wrote:
with the binary stuff out of the way, what i have is this string data: 20050922 # date line
mike
mike's message...
20040825 # date line
jeremy
jeremy's message...
... what i want to do is to use the date line as the first data in a tuple
and the succeeding lines goes into the tuple, like: (20050922, mike, mike's message) then when it matches another date line it makes another new tuple with
that date line as the header data and the succeeding data, etc.. (20050922, mike, mike's message)
(20040825, jeremy, jeremy's message)
... then i would sort the tuples according to the date. is there an easier/proper way of doing this without generating alot of
tuples?


You want a dictionary. Python dictionaries map keys to values (in other
languages, these data structures are known as hashes, maps, or
associative arrays). The keys will be the dates; the values will depend
on whether or not you have multiple messages for one date.

If the dates are unique (which, looking at your data, is probably not
true), then each item in the dictionary can be just one (who, message)
tuple.

If the dates are not unique, then you'll have to manage each item of the
dictionary as a list of (who, message) tuples.

And before you ask: no, dictionaries are *not* sorted; you'll have to
sort a separate list of the keys or the items at the appropriate time.

Regards,
Dan

--
Dan Sommers
<http://www.tombstonezero.net/dan/>
Sep 28 '05 #7
Dan Sommers wrote:
On 27 Sep 2005 19:01:38 -0700,
ni****@gmail.com wrote:

with the binary stuff out of the way, what i have is this string data:


20050922 # date line
mike
mike's message...
20040825 # date line
jeremy
jeremy's message...
...


what i want to do is to use the date line as the first data in a tuple
and the succeeding lines goes into the tuple, like:


(20050922, mike, mike's message)


then when it matches another date line it makes another new tuple with
that date line as the header data and the succeeding data, etc..


(20050922, mike, mike's message)
(20040825, jeremy, jeremy's message)
...


then i would sort the tuples according to the date.


is there an easier/proper way of doing this without generating alot of
tuples?

You want a dictionary. Python dictionaries map keys to values (in other
languages, these data structures are known as hashes, maps, or
associative arrays). The keys will be the dates; the values will depend
on whether or not you have multiple messages for one date.

If the dates are unique (which, looking at your data, is probably not
true), then each item in the dictionary can be just one (who, message)
tuple.

If the dates are not unique, then you'll have to manage each item of the
dictionary as a list of (who, message) tuples.

And before you ask: no, dictionaries are *not* sorted; you'll have to
sort a separate list of the keys or the items at the appropriate time.

I'm not sure this advice is entirely helpful, since it introduces
complexities not really required by the simplistic tuple notation the OP
seems to be struggling for.

Following the old adage "First, make it work; then (if it doesn't work
fast enough) make it faster)", and making the *dangerous* assumption
that each message genuinely is exactly three lines, we might write:

msglist = []
f = open("theDataFile.txt", "r")
for date in f:
who = f.next() # pulls a line from the file
msg = f.next() # pulls a line from the file
msglist,append((date, who, msg))
# now have list of messages as tuples
msglist.sort()

After this, msglist should be date-sorted list of messages. Though who
knows what needs to happen to them next ...

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.pycon.org

Sep 28 '05 #8

Steve Holden wrote:
Dan Sommers wrote:
On 27 Sep 2005 19:01:38 -0700,
ni****@gmail.com wrote:

with the binary stuff out of the way, what i have is this string data:
20050922 # date line
mike
mike's message...
20040825 # date line
jeremy
jeremy's message...
...


what i want to do is to use the date line as the first data in a tuple
and the succeeding lines goes into the tuple, like:


(20050922, mike, mike's message)


then when it matches another date line it makes another new tuple with
that date line as the header data and the succeeding data, etc..


(20050922, mike, mike's message)
(20040825, jeremy, jeremy's message)
...


then i would sort the tuples according to the date.


is there an easier/proper way of doing this without generating alot of
tuples?

You want a dictionary. Python dictionaries map keys to values (in other
languages, these data structures are known as hashes, maps, or
associative arrays). The keys will be the dates; the values will depend
on whether or not you have multiple messages for one date.

If the dates are unique (which, looking at your data, is probably not
true), then each item in the dictionary can be just one (who, message)
tuple.

If the dates are not unique, then you'll have to manage each item of the
dictionary as a list of (who, message) tuples.

And before you ask: no, dictionaries are *not* sorted; you'll have to
sort a separate list of the keys or the items at the appropriate time.

I'm not sure this advice is entirely helpful, since it introduces
complexities not really required by the simplistic tuple notation the OP
seems to be struggling for.

Following the old adage "First, make it work; then (if it doesn't work
fast enough) make it faster)", and making the *dangerous* assumption
that each message genuinely is exactly three lines, we might write:

msglist = []
f = open("theDataFile.txt", "r")
for date in f:
who = f.next() # pulls a line from the file
msg = f.next() # pulls a line from the file
msglist,append((date, who, msg))
# now have list of messages as tuples
msglist.sort()

After this, msglist should be date-sorted list of messages. Though who
knows what needs to happen to them next ...


just to spit it all out to stdout in a nice formatted form so I can
save it to a file.

I'm still confused though, but I'm working on it. struct is nice.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.pycon.org


Oct 3 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: David Bear | last post by:
I have a list of tupples. Each tupple has 3 items. The first item is an integer. I'd like to sort the list(of tupples), based on the first element of the tupple. I thought about stringifying...
1
by: shalendra chhabra | last post by:
Hi, I just had a tryst with python. I was wondering if python is good enough to do this kind of job -- for it has extensive support of string and pattern matching, ordering and list handling. ...
6
by: praba kar | last post by:
Dear All, I have doubt regarding sorting. I have a list that list have another list (eg) list = ,,] I want to sort only numeric value having array field. How I need to do for that.
10
by: Philippe C. Martin | last post by:
Hi, I'm looking for an easy algorithm - maybe Python can help: I start with X lists which intial sort is based on list #1. I want to reverse sort list #1 and have all other lists sorted...
7
by: Ronny Mandal | last post by:
Hi! Assume we have a list l, containing tuples t1,t2... i.e. l = And now I want to sort l reverse by the second element in the tuple, i.e the result should ideally be: l =
7
by: apotheos | last post by:
I can't seem to get this nailed down and I thought I'd toss it out there as, by gosh, its got to be something simple I'm missing. I have two different database tables of events that use different...
17
by: John Salerno | last post by:
Hi everyone. If I have a list of tuples, and each tuple is in the form: (year, text) as in ('1995', 'This is a citation.') How can I sort the list so that they are in chronological order based...
1
by: Giovanni Toffoli | last post by:
Hi, I'm not in the mailing list. By Googling, I stepped into this an old post: (Thu Feb 14 20:40:08 CET 2002) of Jeff Shannon:...
7
by: Steve Bergman | last post by:
I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.