435,620 Members | 1,303 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,620 IT Pros & Developers. It's quick & easy.

# Efficiently Split A List of Tuples

 P: n/a I have a large list of two element tuples. I want two separate lists: One list with the first element of every tuple, and the second list with the second element of every tuple. Each tuple contains a datetime object followed by an integer. Here is a small sample of the original list: ((datetime.datetime(2005, 7, 13, 16, 0, 54), 315), (datetime.datetime(2005, 7, 13, 16, 6, 12), 313), (datetime.datetime(2005, 7, 13, 16, 16, 45), 312), (datetime.datetime(2005, 7, 13, 16, 22), 315), (datetime.datetime(2005, 7, 13, 16, 27, 18), 312), (datetime.datetime(2005, 7, 13, 16, 32, 35), 307), (datetime.datetime(2005, 7, 13, 16, 37, 51), 304), (datetime.datetime(2005, 7, 13, 16, 43, 8), 307)) I know I can use a 'for' loop and create two new lists using 'newList1.append(x)', etc. Is there an efficient way to create these two new lists without using a slow for loop? r Jul 21 '05 #1
14 Replies

 P: n/a Richard writes: I have a large list of two element tuples. I want two separate lists: One list with the first element of every tuple, and the second list with the second element of every tuple. I know I can use a 'for' loop and create two new lists using 'newList1.append(x)', etc. Is there an efficient way to create these two new lists without using a slow for loop? Not really. You could get a little cutesey with list comprehensions to keep the code concise, but the underlying process would be about the same: a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10)) x,y = [[z[i] for z in a] for i in (0,1)] # x is now (1,3,5,7,9) and y is (2,4,6,8,10) Jul 21 '05 #2

 P: n/a Richard wrote: I have a large list of two element tuples. I want two separate lists: One list with the first element of every tuple, and the second list with the second element of every tuple. Variant of Paul's example: a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10)) zip(*a) or [list(t) for t in zip(*a)] if you need lists instead of tuples. (I believe this is something Guido considers an "abuse of *args", but I just consider it an elegant use of zip() considering how the language defines *args. YMMV] -Peter Jul 21 '05 #3

 P: n/a Peter Hansen wrote: (I believe this is something Guido considers an "abuse of *args", but Ijust consider it an elegant use of zip() considering how the languagedefines *args. YMMV]-Peter An abuse?! That's one of the most useful things to do with it. It's transpose. Jul 21 '05 #4

 P: n/a Joseph Garvin wrote: Peter Hansen wrote: (I believe this is something Guido considers an "abuse of *args", but I just consider it an elegant use of zip() considering how the language defines *args. YMMV] -Peter An abuse?! That's one of the most useful things to do with it. It's transpose. Note that it's considered (as I understand) an abuse of "*args", not an abuse of "zip". I can see a difference... -Peter Jul 21 '05 #5

 P: n/a On Wed, 13 Jul 2005 20:53:58 -0400, Peter Hansen wrote: a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10)) zip(*a) This seems to work. Thanks. Where do I find documentation on "*args"? Jul 21 '05 #6

 P: n/a Richard wrote: On Wed, 13 Jul 2005 20:53:58 -0400, Peter Hansen wrote:a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))zip(*a) This seems to work. Thanks. Where do I find documentation on "*args"? In the language reference: http://docs.python.org/ref/calls.html#calls -Peter Jul 21 '05 #7

 P: n/a > Variant of Paul's example: a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10)) zip(*a) or[list(t) for t in zip(*a)] if you need lists instead of tuples. [Peter Hansen] (I believe this is something Guido considers an "abuse of *args", but I just consider it an elegant use of zip() considering how the language defines *args. YMMV] It is somewhat elegant in terms of expressiveness; however, it is also a bit disconcerting in light of the underlying implementation. All of the tuples are loaded one-by-one onto the argument stack. For a few elements, this is no big deal. For large datasets, it is a less than ideal way of transposing data. Guido's reaction makes sense when you consider that most programmers would cringe at a function definition with thousands of parameters. There is a sense that this doesn't scale-up very well (with each Python implementation having its own limits on how far you can push this idiom). Raymond Jul 21 '05 #8

 P: n/a [Richard] I know I can use a 'for' loop and create two new lists using 'newList1.append(x)', etc. Is there an efficient way to create these two new lists without using a slow for loop? If trying to optimize before writing and timing code, then at least validate your assumptions. In Python, for-loops are blazingly fast. They are almost never the bottleneck. Python is not Matlab -- "vectorizing" for-loops only pays-off when a high-speed functional happens to exactly match you needs (in this case, zip() happens to be a good fit). Even when a functional offers a speed-up, much of the gain is likely due to implementation specific optimizations which allocate result lists all at once rather than building them one at time. Also, for all but the most simple inner-loop operations, the for-loop overhead almost always dominated by the time to execute the operation itself. Executive summary: Python's for-loops are both elegant and fast. It is a mistake to habitually avoid them. Raymond Jul 21 '05 #9

 P: n/a Raymond Hettinger wrote:Variant of Paul's example:a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))zip(*a)or[list(t) for t in zip(*a)] if you need lists instead of tuples. [Peter Hansen](I believe this is something Guido considers an "abuse of *args", but Ijust consider it an elegant use of zip() considering how the languagedefines *args. YMMV] It is somewhat elegant in terms of expressiveness; however, it is also a bit disconcerting in light of the underlying implementation. All of the tuples are loaded one-by-one onto the argument stack. For a few elements, this is no big deal. For large datasets, it is a less than ideal way of transposing data. Guido's reaction makes sense when you consider that most programmers would cringe at a function definition with thousands of parameters. There is a sense that this doesn't scale-up very well (with each Python implementation having its own limits on how far you can push this idiom). Raymond Currently we can implicitly unpack a tuple or list by using an assignment. How is that any different than passing arguments to a function? Does it use a different mechanism? (Warning, going into what-if land.) There's a question relating to the above also so it's not completely in outer space. :-) We can't use the * syntax anywhere but in function definitions and calls. I was thinking the other day that using * in function calls is kind of inconsistent as it's not used anywhere else to unpack tuples. And it does the opposite of what it means in the function definitions. So I was thinking, In order to have explicit packing and unpacking outside of function calls and function definitions, we would need different symbols because using * in other places would conflict with the multiply and exponent operators. Also pack and unpack should not be the same symbols for obvious reasons. Using different symbols doesn't conflict with * and ** in functions calls as well. So for the following examples, I'll use '~' as pack and '^' as unpack. ~ looks like a small 'N', for put stuff 'in'. ^ looks like an up arrow, as in take stuff out. (Yes, I know they are already used else where. Currently those are binary operators. The '^' is used with sets also. I did say this is a "what-if" scenario. Personally I think the binary operator could be made methods of a bit type, then they ,including the '>>' '<<' pair, could be freed up and put to better use. The '<<' would make a nice symbol for getting values from an iterator. The '>>' is already used in print as redirect.) Simple explicit unpacking would be: (This is a silly example, I know it's not needed here but it's just to show the basic pattern.) x = (1,2,3) a,b,c = ^x # explicit unpack, take stuff out of x So, then you could do the following. zip(^a) # unpack 'a' and give it's items to zip. Would that use the same underlying mechanism as using "*a" does? Is it also the same implicit unpacking method used in an assignment using '='?. Would it be any less "a bit disconcerting in light of the underlying implementation"? Other possible ways to use them outside of function calls: Sequential unpacking.. x = [(1,2,3)] a,b,c = ^^x -> a=1, b=2, c=3 Or.. x = [(1,2,3),4] a,b,c,d = ^x[0],x[1] -> a=1, b=2, c=3, d=4 I'm not sure what it should do if you try to unpack an item not in a container. I expect it should give an error because a tuple or list was expected. a = 1 x = ^a # error! Explicit packing would not be as useful as we can put ()'s or []'s around things. One example that come to mind at the moment is using it to create single item tuples. x = ~1 -> (1,) Possible converting strings to tuples? a = 'abcd' b = ~^a -> ('a','b','c','d') # explicit unpack and repack and: b = ~a -> ('abcd',) # explicit pack whole string for: b = a, -> ('abcd',) # trailing comma is needed here. # This is an error opportunity IMO Choice of symbols aside, packing and unpacking are a very big part of Python, it just seems (to me) like having an explicit way to express it might be a good thing. It doesn't do anything that can't already be done, of course. I think it might make some code easier to read, and possibly avoid some errors. Would there be any (other) advantages to it beside the syntax sugar? Is it a horrible idea for some unknown reason I'm not seeing. (Other than the symbol choices breaking current code. Maybe other symbols would work just as well?) Regards, Ron Jul 21 '05 #10

 P: n/a Oooh.. you make my eyes bleed. IMO that proposal is butt ugly (and looks like the C++.NET perversions.) Jul 21 '05 #11

 P: n/a [Ron Adam] Currently we can implicitly unpack a tuple or list by using an assignment. How is that any different than passing arguments to a function? Does it use a different mechanism? It is the same mechanism, so it is also only appropriate for low volumes of data: a, b, c = *args # three elements, no problem f(*xrange(1000000)) # too much data, not scalable, bad idea Whenever you get the urge to write something like the latter, then take it as cue to be passing iterators instead of unpacking giant tuples. Raymond Jul 21 '05 #12

 P: n/a On Sun, 17 Jul 2005 19:38:29 -0700, Raymond Hettinger wrote: Executive summary: Python's for-loops are both elegant and fast. It is a mistake to habitually avoid them. And frequently much more readable and maintainable than the alternatives. I cringe when I see well-meaning people trying to turn Python into Perl, by changing perfectly good, fast, readable pieces of code into obfuscated one-liners simply out of some perverse desire to optimize for the sake of optimization. -- Steven. Jul 21 '05 #13

 P: n/a Raymond Hettinger wrote: [Ron Adam]Currently we can implicitly unpack a tuple or list by using anassignment. How is that any different than passing arguments to afunction? Does it use a different mechanism? It is the same mechanism, so it is also only appropriate for low volumes of data: a, b, c = *args # three elements, no problem f(*xrange(1000000)) # too much data, not scalable, bad idea Whenever you get the urge to write something like the latter, then take it as cue to be passing iterators instead of unpacking giant tuples. Raymond Ah... that's what I expected. So it better to transfer a single reference or object than a huge list of separated items. I suppose that would be easy to see in byte code. In examples like the above, the receiving function would probably be defined with *args also and not individual arguments. So is it unpacked, transfered to the function, and then repacked. or unpacked, repacked and then transfered to the function? And if the * is used on both sides, couldn't it be optimized to skip the unpacking and repacking? But then it would need to make a copy wouldn't it? That should still be better than passing individual references. Cheers, Ron Jul 21 '05 #14

 P: n/a Simon Dahlbacka wrote: Oooh.. you make my eyes bleed. IMO that proposal is butt ugly (and looks like the C++.NET perversions.) I haven't had the displeasure of using C++.NET fortunately. point = [5,(10,20,5)] size,t = point x,y,z = t size,x,y,z = point[0], point[1][0], point[1][1], point[1][2] size,x,y,z = point[0], ^point[1] # Not uglier than the above. size,(x,y,z) = point # Not as nice as this. I forget sometimes that this last one is allowed, so ()'s on the left of the assignment is an explicit unpack. Seems I'm tried to reinvent the wheel yet again. Cheers, Ron Jul 21 '05 #15

### This discussion thread is closed

Replies have been disabled for this discussion.