467,161 Members | 1,010 Online

# Tuple slices

 Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that tuples are immutable ? I ended up writing a custom ImmutableSequence class that does this, but I wonder why it is not implemented for tuples. George Jul 18 '05 #1
• viewed: 2914
Share:
29 Replies
 George Sakkis wrote: Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that tuples are immutable ? really? a = 1, 2, 3 b = a[:] a is b True Jul 18 '05 #2
 Fredrik Lundh wrote: George Sakkis wrote:Why does slicing a tuple returns a new tuple instead of a view of the existing one, given thattuples are immutable ? really?a = 1, 2, 3b = a[:]a is b True My impression was that full tuple copies didn't actually copy, but that slicing a subset of a tuple might. Not exactly sure how to test this, but: py> a = 1, 2, 3 py> a[:2] is a[:2] False So _something_ at least is different between the two slices... Steve Jul 18 '05 #3
 On Mon, 24 Jan 2005 18:45:46 +0100 "Fredrik Lundh" wrote: George Sakkis wrote: Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that tuples are immutable ? really? Well... seems like this case is optimized to return the original tuple just incrementing its reference count and returning tupleobject.c, 330-335 if (ilow == 0 && ihigh == a->ob_size && PyTuple_CheckExact(a)) { Py_INCREF(a); return (PyObject *)a; } a = 1, 2, 3 b = a[:] a is b True -- http://mail.python.org/mailman/listinfo/python-list Jul 18 '05 #4
 On Mon, 24 Jan 2005 18:45:46 +0100 "Fredrik Lundh" wrote: George Sakkis wrote: Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that tuples are immutable ? really? Well... seems like this case (slicing the whole tuple) is optimized to return the original tuple just incrementing its reference count and returning tupleobject.c, 330-335 if (ilow == 0 && ihigh == a->ob_size && PyTuple_CheckExact(a)) { Py_INCREF(a); return (PyObject *)a; } a = 1, 2, 3 b = a[:] a is b True -- http://mail.python.org/mailman/listinfo/python-list Jul 18 '05 #5
 Steven Bethard wrote: >a = 1, 2, 3>b = a[:]>a is b True My impression was that full tuple copies didn't actually copy, but that slicing a subset of a tuple might. Not exactly sure how to test this, but: py> a = 1, 2, 3 py> a[:2] is a[:2] False yup. and to figure out why things are done this way, consider this case: a = give_me_a_huge_tuple() len(a) (a rather large number) b = a[:2] del a (IIRC, I proposed to add "substrings" when I implemented the Unicode string type, but that idea was rejected, for the very same "and how do you get rid of the original object" reason) Jul 18 '05 #6
 Fredrik Lundh wrote: Steven Bethard wrote:My impression was that full tuple copies didn't actually copy, but that slicing a subset of atuple might. Not exactly sure how to test this, but:py> a = 1, 2, 3py> a[:2] is a[:2]False yup. and to figure out why things are done this way, consider this case: >>> a = give_me_a_huge_tuple() >>> len(a) (a rather large number) >>> b = a[:2] >>> del a (IIRC, I proposed to add "substrings" when I implemented the Unicode string type, but that idea was rejected, for the very same "and how do you get rid of the original object" reason) Ahh. Yeah, that seems sensible. I don't think I've ever written code like that, but if I did, I'd almost certainly want it to work as it does now... Steve Jul 18 '05 #7
 The cpython implementation stores tuples in memory like this: [common fields for all Python objects] [common fields for all variable-size python objects, including tuple size] [fields specific to tuple objects, if any] [array of PyObject*, one for each item in the tuple] This way of storing variable-size Python objects was chosen in part because it reuqires only one allocation for an object, not two. However, there is no way for one tuple to point to a slice of another tuple. there's no reason that some other python implementation couldn't make a different choice. Jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFB9V1/Jd01MZaTXX0RAnDMAJ0f2v26tba9j46KsYV3SkylB51KlQCfeT tK YYuTzz1nulvDc8Ad9p78AGo= =+SO0 -----END PGP SIGNATURE----- Jul 18 '05 #8
 "Fredrik Lundh" wrote in message news:ma***************************************@pyt hon.org... Steven Bethard wrote:>>a = 1, 2, 3>>b = a[:]>>a is b True My impression was that full tuple copies didn't actually copy, but that slicing a subset of a tuple might. Not exactly sure how to test this, but: py> a = 1, 2, 3 py> a[:2] is a[:2] False yup. and to figure out why things are done this way, consider this case: >>> a = give_me_a_huge_tuple() >>> len(a) (a rather large number) >>> b = a[:2] >>> del a (IIRC, I proposed to add "substrings" when I implemented the Unicode string type, but that idea was rejected, for the very same "and how do you get rid of the original object" reason) Fair enough. So perhaps the question is whether such cases are more regular than something like: a = give_me_a_huge_tuple() slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)] George Jul 18 '05 #9
 George Sakkis wrote: Fair enough. So perhaps the question is whether such cases are more regular than something like: a = give_me_a_huge_tuple() slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)] I believe the general usage of tuples tends to mean that "give_me_a_huge_tuple()" just doesn't happen except with those who insist on using tuples as though they were nothing other than read-only lists. If you use a tuple the way it was apparently intended, you are extraordinarily unlikely to find yourself with a huge one requiring slicing in such a way that you care whether it is a "view" or a new object. -Peter Jul 18 '05 #10
 "George Sakkis" wrote in message news:35*************@individual.net... Why does slicing a tuple returns a new tuple instead of a view of the existing one, given that tuples are immutable ? I ended up writing a custom ImmutableSequence class that does this, but I wonder why it is not implemented for tuples. Numpy and Numarray both do this -- generate views into arrays -- because they are specifically aimed at large datasets for which the memory savings overrides the complications. Aside from the problem of not being able to delete the underlying object, the view object for a tuple would have to be a new type of object with a new set of methods. So there would be one more thing to learn. And so 'type(o) is tuple' would generally have to be replaced by 'isinstance(type(o), (tuple,tupleview)). For slices that are used within expressions and then discarded, a= (1,2,3) b = a(1:) + a(:1) an internal tuple view might be an interesting optimization. Terry J. Reedy Jul 18 '05 #11
 "Peter Hansen" wrote in message news:A9********************@powergate.ca... George Sakkis wrote: Fair enough. So perhaps the question is whether such cases are more regular than something like: a = give_me_a_huge_tuple() slices = [a[i:j] for i in xrange(len(a)) for j in xrange(i+1, len(a)+1)] I believe the general usage of tuples tends to mean that "give_me_a_huge_tuple()" just doesn't happen except with those who insist on using tuples as though they were nothing other than read-only lists. If you use a tuple the way it was apparently intended, you are extraordinarily unlikely to find yourself with a huge one requiring slicing in such a way that you care whether it is a "view" or a new object. -Peter Actually my initial motivation was not a huge tuple I had to slice many times. It was something much less extraordinarily unlikely, a recursive function with a sequence parameter: def foo(sequence): # base_case # do_stuff() combine(foo(sequence[:n]), foo(sequence[n:])) Having each slice be a view of the original sequence instead of a fresh copy would be a Good Thing (tm). George Jul 18 '05 #12
 "Terry Reedy" wrote in message news:ma***************************************@pyt hon.org... Aside from the problem of not being able to delete the underlying object, the view object for a tuple would have to be a new type of object with a new set of methods. It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the triple being (0, len(sequence), 1). George Jul 18 '05 #13
 [My newsreader crapped out on sending this; apologies if it appears twice.] George Sakkis wrote: "Terry Reedy" wrote in message news:ma***************************************@pyt hon.org...Aside from the problem of not being able to delete the underlying object,the view object for a tuple would have to be a new type of object with anew set of methods. It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the triple being (0, len(sequence), 1). Except that that's not how Python tuples *are* constructed, and it'd be a pretty big deal to redo the architecture of all tuples to support this relatively special case. Even if this re-architecting were done, you're still constructing a new object -- the difference is that you're creating this (start,stop,step) triple instead of duplicating a set of PyObject* pointers, and then doing math based on those values instead of straightforward pointer access. I'm not at all convinced that this really saves you a significant amount for tuple slices (really, you're still constructing a new tuple anyhow, aren't you?), and it's going to cost a bit in both execution time and complexity in the common case (accessing tuples without slicing). If there's a big cost in object construction, it's probably going to be the memory allocation, and for a reasonable tuple the size of the memory required is not going to significantly affect the allocation time. Jeff Shannon Technician/Programmer Credit International Jul 18 '05 #14
 "Jeff Shannon" wrote in message news:10************@corp.supernews.com... [My newsreader crapped out on sending this; apologies if it appears twice.] George Sakkis wrote: "Terry Reedy" wrote in message news:ma***************************************@pyt hon.org...Aside from the problem of not being able to delete the underlying object,the view object for a tuple would have to be a new type of object with anew set of methods. It *could*, but it doesn't have to. One can represent a view as essentially an object with a pointer to a memory buffer and a (start,stop,step) triple. Then a "real tuple" is just a "view" with the triple being (0, len(sequence), 1). Except that that's not how Python tuples *are* constructed, and it'd be a pretty big deal to redo the architecture of all tuples to support this relatively special case. Even if this re-architecting were done, you're still constructing a new object -- the difference is that you're creating this (start,stop,step) triple instead of duplicating a set of PyObject* pointers, and then doing math based on those values instead of straightforward pointer access. I'm not at all convinced that this really saves you a significant amount for tuple slices (really, you're still constructing a new tuple anyhow, aren't you?), and it's going to cost a bit in both execution time and complexity in the common case (accessing tuples without slicing). If there's a big cost in object construction, it's probably going to be the memory allocation, and for a reasonable tuple the size of the memory required is not going to significantly affect the allocation time. Jeff Shannon Technician/Programmer Credit International You're probably right about the allocation time, but my main concern is the memory required for each slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at least if there was a way around to the problem of deleting the underlying sequence). George Jul 18 '05 #15
 "George Sakkis" wrote in message news:35*************@individual.net... Actually my initial motivation was not a huge tuple I had to slice many times. It was something much less extraordinarily unlikely, a recursive function with a sequence parameter: def foo(sequence): # base_case # do_stuff() combine(foo(sequence[:n]), foo(sequence[n:])) Having each slice be a view of the original sequence instead of a fresh copy would be a Good Thing Why? To save time? memory? Either would require more that a few bytes per slice. If they are, you can probably use virtual slices as follows. def foo(sequence): def _foo(seq, start, stop) # base_case # do_stuff() combine(_foo(seq, start, n), _foo(seq, n, stop)) _foo(sequence, 0, len(sequence) In other words, if you don't really want slices copied out of the sequence, then don't slice! Just use 2 ints to indicate the working region or view. Both this and using a nested function with additional params are standard techniques. This also works when the seq is mutable and you want changes to the 'slice' to change the original, as in quicksort. Terry J. Reedy Jul 18 '05 #16
 George Sakkis wrote: You're probably right about the allocation time, but my main concern is the memory required for each slice, which can be O(n) wrt the length of the whole tuple. I would like this to be constant (at least if there was a way around to the problem of deleting the underlying sequence). If you really want a view into a tuple (or any sequence for that matter): from itertools import islice islice(iter(seq), start, end, step) Then use the various functions in itertools to work with the resulting iterator (since the syntactic support for working with iterators is currently quite limited - slicing, concatenation and repetition are spelt with itertools.islice(), itertools.chain() and itertools.repeat(), rather than with their standard syntactic equivalents "[x:y:z]", "+" and "*"). The above approach does suffer from the problem of holding a reference to the original sequence, though. Cheers, Nick. -- Nick Coghlan | nc******@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net Jul 18 '05 #17
 Terry Reedy wrote: "George Sakkis" wrote in message news:35*************@individual.net... Actually my initial motivation was not a huge tuple I had to slice many times. It was something much less extraordinarily unlikely, a recursive function with a sequence parameter: def foo(sequence): # base_case # do_stuff() combine(foo(sequence[:n]), foo(sequence[n:])) Having each slice be a view of the original sequence instead of a fresh copy would be a Good Thing Why? To save time? memory? Either would require more that a few bytes per slice. If they are, you can probably use virtual slices as follows. def foo(sequence): def _foo(seq, start, stop) # base_case # do_stuff() combine(_foo(seq, start, n), _foo(seq, n, stop)) _foo(sequence, 0, len(sequence) In other words, if you don't really want slices copied out of the sequence, then don't slice! Just use 2 ints to indicate the working region or view. Both this and using a nested function with additional params are standard techniques. This also works when the seq is mutable and you want changes to the 'slice' to change the original, as in quicksort. Terry J. Reedy I would say these are standard *C/C++* techniques; slicing simplifies things and thus seems more 'pythonic' to me. George Jul 18 '05 #18
 An iterator is perfectly ok if all you want is to iterate over the elements of a view, but as you noted, iterators are less flexible than the underlying sequence. The view should be (or at least appear) identical in functionality (i.e. public methods) with its underlying sequence. George Jul 18 '05 #19
 George Sakkis wrote: An iterator is perfectly ok if all you want is to iterate over the elements of a view, but as you noted, iterators are less flexible than the underlying sequence. The view should be (or at least appear) identical in functionality (i.e. public methods) with its underlying sequence. So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy? As I see it, you get the *possibility* of saving a few bytes (which may go in the other direction) at a cost of complexity and speed. You have greater dependence of internal objects on each other, you can't get rid of the original tuple while any slice views of it exist, you gain nothing in the way of syntax/usage simplicity... so what's the point? To my mind, one of the best things about Python is that (for the most part) I don't have to worry about actual memory layout of objects. I don't *care* how tuples are implemented, they just work. It seems to me that you're saying that they work perfectly fine as-is, but that you have a problem with the implementation that the language tries its best to let you not care about. Is this purely abstract philosophy? Jeff Shannon Technician/Programmer Credit International Jul 18 '05 #21
 "Jeff Shannon" wrote in message news:10*************@corp.supernews.com... George Sakkis wrote: An iterator is perfectly ok if all you want is to iterate over the elements of a view, but as you noted, iterators are less flexible than the underlying sequence. The view should be (or at least appear) identical in functionality (i.e. public methods) with its underlying sequence. So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy? As I see it, you get the *possibility* of saving a few bytes (which It all comes down on what you mean by "a few bytes". Since many (most?) slices are linear wrt to the original sequence's length, it is not hard to think of algorithms that involve the creation of *many* slices (e.g. most recursive divide-and-conquer algorithms). Implementing these using slices simply does not scale as the input sequence gets larger. Of course, you can always use the standard C/C++ approach and pass the original sequence along with the (start,stop,step) indices of the slice, as Terry Reedy mentioned, but then you lose in expressiveness. may go in the other direction) at a cost of complexity and speed. You have greater dependence of internal objects on each other, you can't get rid of the original tuple while any slice views of it exist, you gain nothing in the way of syntax/usage simplicity... so what's the point? To my mind, one of the best things about Python is that (for the most part) I don't have to worry about actual memory layout of objects. I don't *care* how tuples are implemented, they just work. It seems to me that you're saying that they work perfectly fine as-is, but that you have a problem with the implementation that the language tries its best to let you not care about. Is this purely abstract philosophy? No, it's called "scalability", and it's not purely abstract philosophy AFAIK. I fully agree that for the most part you don't have to care about it, and I'm grateful that python does all its magic trasparently. However if you do care about it, and at the same time you are unwilling to sacrifice the elegance of slices, the current implementation is not ideal. Jeff Shannon Technician/Programmer Credit International George Jul 18 '05 #23
 Jeff Shannon wrote: So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy? I think views are good for 1) saving memory 2) saving time (as you don't have to copy the elements into the new tuple) And they are worth it. However, (as in other cases with slicing), it is very easy and fast to create a view for a slice with the default step '1', while it's a PITA and totally not worth it to create a view for a slice with non default step. I think it would be good to: if slice_step == 1 create_view else create_new_tuple Actually, i think that slices with step, is a bad feature in general and i think I will write a PEP to suggest their removal in python3k. Gerald Jul 18 '05 #24
 On Tue, 25 Jan 2005 19:25:55 -0500, "George Sakkis" wrote: "Jeff Shannon" wrote in message news:10*************@corp.supernews.com... George Sakkis wrote: > An iterator is perfectly ok if all you want is to iterate over the > elements of a view, but as you noted, iterators are less flexible than > the underlying sequence. The view should be (or at least appear) > identical in functionality (i.e. public methods) with its underlying > sequence. So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy? As I see it, you get the *possibility* of saving a few bytes (whichIt all comes down on what you mean by "a few bytes". Since many (most?) slices are linear wrt to theoriginal sequence's length, it is not hard to think of algorithms that involve the creation of*many* slices (e.g. most recursive divide-and-conquer algorithms). Implementing these using slicessimply does not scale as the input sequence gets larger. Of course, you can always use the standardC/C++ approach and pass the original sequence along with the (start,stop,step) indices of the slice,as Terry Reedy mentioned, but then you lose in expressiveness. I didn't see the leadup to this, but what is the problem with just subclassing tuple to give you the views you want? Regards, Bengt Richter Jul 18 '05 #25
 On Wed, 26 Jan 2005 11:55:59 -0800, jfj wrote: Jeff Shannon wrote: So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy?I think views are good for 1) saving memory 2) saving time (as you don't have to copy the elements into the new tuple)And they are worth it. However, (as in other cases with slicing), it isvery easy and fast to create a view for a slice with the default step'1', while it's a PITA and totally not worth it to create a view for aslice with non default step. I think it would be good to: if slice_step == 1 create_view else create_new_tupleActually, i think that slices with step, is a bad feature in generaland i think I will write a PEP to suggest their removal in python3k. What's the big deal with other than 1 steps? It is just adjusting a few numbers so that you can either index the new virtual slice with an integer and return the element, in which case the index into the original tuple will be someoffset+i*somefactor once you get past the limit checks for the virtual slice. By the same token, transforming a few numbers of one virtual slice into similar numbers for a a new virtual slice of that shouldn't be rocket science. And it wouldn't have to be done more than once. Don't have time to do it now, but there are plenty around here that could, I'm sure. Regards, Bengt Richter Jul 18 '05 #26
 jfj wrote: Jeff Shannon wrote: So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy? I think views are good for 1) saving memory 2) saving time (as you don't have to copy the elements into the new tuple) 1. Applies only if you are making large slices, or a lot of slices with each containing at least 3 elements. A view can also *cost* memory, when it looks at a small piece of a large item. The view will keep the entire item alive, even though it needs only a small piece. 2. Hell no. The *elements* aren't copied, pointers to the elements are. If you *don't* copy the pointers, then every item access through the view involves an indirection as the index into the original sequence gets calculated. So views *may* save memory in some applications, but are unlikely to save time in any application (except any saving resulting from the memory saving). Cheers, Nick. -- Nick Coghlan | nc******@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net Jul 18 '05 #27
 jfj wrote: Actually, i think that slices with step, is a bad feature in general and i think I will write a PEP to suggest their removal in python3k. I wouldn't bother. Extended slicing was added to support those doing serious numerical work in Python, and it won't get removed for all the reasons it was added in the first place. Cheers, Nick. -- Nick Coghlan | nc******@email.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.skystorm.net Jul 18 '05 #28
 "Bengt Richter" wrote in message news:41*****************@news.oz.net... On Wed, 26 Jan 2005 11:55:59 -0800, jfj wrote:Jeff Shannon wrote: So, what problem is it, exactly, that you think you'd solve by making tuple slices a view rather than a copy?I think views are good for 1) saving memory 2) saving time (as you don't have to copy the elements into the new tuple)And they are worth it. However, (as in other cases with slicing), it isvery easy and fast to create a view for a slice with the default step'1', while it's a PITA and totally not worth it to create a view for aslice with non default step. I think it would be good to: if slice_step == 1 create_view else create_new_tupleActually, i think that slices with step, is a bad feature in generaland i think I will write a PEP to suggest their removal in python3k. What's the big deal with other than 1 steps? It is just adjusting a few numbers so that you can either index the new virtual slice with an integer and return the element, in which case the index into the original tuple will be someoffset+i*somefactor once you get past the limit checks for the virtual slice. By the same token, transforming a few numbers of one virtual slice into similar numbers for a a new virtual slice of that shouldn't be rocket science. And it wouldn't have to be done more than once. Don't have time to do it now, but there are plenty around here that could, I'm sure. Regards, Bengt Richter Here's my (undocumented) version of it: http://rafb.net/paste/results/HkxmHp37.html and its unit test: http://rafb.net/paste/results/2LIInT68.html And some useless timing comparisons (I know it's a stupid example, don't flame me for this): \$ python /usr/lib/python2.3/timeit.py \ -s "x=tuple(xrange(10000))" \ "[x[1:-1] for n in xrange(100)]" 10 loops, best of 3: 3.84e+04 usec per loop \$ python /usr/lib/python2.3/timeit.py \ -s "from immutableseq import ImmutableSequence" \ -s "x=ImmutableSequence(xrange(10000))" \ "[x[1:-1] for n in xrange(100)]" 100 loops, best of 3: 5.85e+03 usec per loop Feel free to comment or suggest improvements. George Jul 18 '05 #29
 Nick Coghlan wrote: 1. Applies only if you are making large slices, or a lot of slices with each containing at least 3 elements. A view can also *cost* memory, when it looks at a small piece of a large item. The view will keep the entire item alive, even though it needs only a small piece. That is correct. 2. Hell no. The *elements* aren't copied, pointers to the elements are. If you *don't* copy the pointers, then every item access through the view involves an indirection as the index into the original sequence gets calculated. If you have x=(1,2,...100001) y=x[:-1] then you copy 100000 pointers AND you INCREF them AND you DECREF them when y dies. The unfortunate case by (1) would be: x=(1,2,...100001) x=x[:1] So views *may* save memory in some applications, but are unlikely to save time in any application (except any saving resulting from the memory saving). They do. If tp_dealloc of a tuple view doesn't decref the pointers. We should look for what is the most common case. Gerald -PS: the PEP for the removal ought to have a ":)" at the end. Jul 18 '05 #30

### This discussion thread is closed

Replies have been disabled for this discussion.