471,326 Members | 2,053 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,326 software developers and data experts.

Needless copying in iterations?

Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff[i])

James
Sep 15 '07 #1
16 997
On Sep 15, 11:58 pm, James Stroud <jstr...@mbi.ucla.eduwrote:
Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff[i])

James
itertools.islice does what you want

import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)

Sep 15 '07 #2
buffi wrote:
On Sep 15, 11:58 pm, James Stroud <jstr...@mbi.ucla.eduwrote:
>Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff[i])

James

itertools.islice does what you want

import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)
Thanks buffi!

So I guess the interpreter does no optimization in the latter?

James
Sep 15 '07 #3
On Sep 16, 12:20 am, James Stroud <jstr...@mbi.ucla.eduwrote:
buffi wrote:
On Sep 15, 11:58 pm, James Stroud <jstr...@mbi.ucla.eduwrote:
Hello all,
I was staring at a segment of code that looked like this today:
for something in stuff[x:y]:
whatever(something)
and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):
for i in xrange(x, y):
whatever(stuff[i])
James
itertools.islice does what you want
import itertools
for something in itertools.islice(stuff, x, y):
whatever(something)

Thanks buffi!

So I guess the interpreter does no optimization in the latter?

James
No, as far as I know it makes a new list out of the slice when you do
it like
for something in stuff[x:y]

- Björn Kempén

Sep 15 '07 #4
This is a case where its up to the type involved. For example,
xrange() slices the way you want but range() does not. Maybe a type
would return for slices a proxy object that got the value by index or
maybe it knows that it makes more sense to give you a copy because
changes during the iteration should not be reflected in the iteration.
It would be really easy to make a generic slicer.

On 9/15/07, James Stroud <js*****@mbi.ucla.eduwrote:
Hello all,

I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)? Or would it be more efficient to do the more clumsy (in
my opinion):

for i in xrange(x, y):
whatever(stuff[i])

James
--
http://mail.python.org/mailman/listinfo/python-list

--
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://ironfroggy-code.blogspot.com/
Sep 15 '07 #5
On Sep 16, 12:25 am, "Calvin Spealman" <ironfro...@gmail.comwrote:
This is a case where its up to the type involved. For example,
xrange() slices the way you want but range() does not.
Coul you explain this?
As far as I know you can't slice a xrange

- Björn Kempén

Sep 15 '07 #6
On Sat, 15 Sep 2007 14:58:15 -0700, James Stroud wrote:
I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)?
The compiler can't "optimize" this as it would change the semantics.
There's no way for the compiler to tell if this copy really is "needless".
`whatever()` may change `stuff[i]` where `i` is in `x:y` and this may lead
to different results wether it iterates over a copy or the original.

Ciao,
Marc 'BlackJack' Rintsch
Sep 16 '07 #7
On Sun, 16 Sep 2007 00:05:58 +0000, Marc 'BlackJack' Rintsch wrote:
On Sat, 15 Sep 2007 14:58:15 -0700, James Stroud wrote:
>I was staring at a segment of code that looked like this today:

for something in stuff[x:y]:
whatever(something)

and was wondering if the compiler really made a copy of the slice from
stuff as the code seems to suggest, or does it find some way to produce
an iterator without the need to make a copy (if stuff is a built-in
sequence type)?

The compiler can't "optimize" this as it would change the semantics.
There's no way for the compiler to tell if this copy really is
"needless". `whatever()` may change `stuff[i]` where `i` is in `x:y` and
this may lead to different results wether it iterates over a copy or the
original.
In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:
for item in alist[1:5]:
print item # no possible side-effects
for item in alist[1:5]:
function(alist, item) # there might be side-effects
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
for i, item in enumerate(alist[1:5]):
alist[i+1] = function(item) # side-effects DO matter
Of course this is all besides the point, since no such optimizing
compiler exists today, at least not for CPython. Any volunteers?
--
Steven.
Sep 16 '07 #8
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
In *general* the compiler can't tell, but in specific cases it
could. A (hypothetical) optimizing compiler would tell the
difference between:

for item in alist[1:5]:
print item # no possible side-effects
The 'print' statement converts the 'item' to a str. That conversion
could, in a pathological case, have a side-effect (say, if the class
of 'item' had an overridden '__str__' method with side effects), and
the compiler can't know this isn't a pathological case.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
The compiler doesn't know that, at the time of invocation,
'alist.append' doesn't have side effects that matter.

The compiler for a dynamic language like Python has very little
absolute "no significant side-effect in these specific cases"
guarantee of the kind you're assuming even in the cases you choose for
contrast with the ones that *do* have significant side-effects.

--
\ "When we call others dogmatic, what we really object to is |
`\ their holding dogmas that are different from our own." -- |
_o__) Charles Issawi |
Ben Finney
Sep 16 '07 #9
On Sun, 16 Sep 2007 00:40:13 +0000, Steven D'Aprano wrote:
On Sun, 16 Sep 2007 00:05:58 +0000, Marc 'BlackJack' Rintsch wrote:

In *general* the compiler can't tell, but in specific cases it could. A
(hypothetical) optimizing compiler would tell the difference between:
for item in alist[1:5]:
print item # no possible side-effects
To expand on what Ben said: After conversion to `str` the result is then
given to `sys.stdout.write()` and of course `sys.stdout` could be another
object that changes `alist`.
for item in alist[1:5]:
alist.append(item) # side-effects DON'T matter
Side effect do matter here, even in not so pathological cases. There are
cases where you get a different result whether you copy or not even with
plain vanilla `list`\s:

In [153]: alist = [1, 2, 3]

In [154]: for item in alist[1:5]:
.....: alist.append(item)
.....:

In [155]: alist
Out[155]: [1, 2, 3, 2, 3]

In [156]: alist = [1, 2, 3]

In [157]: for item in islice(alist, 1, 5):
.....: alist.append(item)
.....:

In [158]: alist
Out[158]: [1, 2, 3, 2, 3, 2, 3]

Ciao,
Marc 'BlackJack' Rintsch
Sep 16 '07 #10
On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:
The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.
What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.

Ciao,
Marc 'BlackJack' Rintsch
Sep 16 '07 #11
Haskell is a dynamic language, and there are optimizing compilers for it.

Last time I used Haskell it was statically typed but this has been
almost a year ago and times are changing fast.

About optimizing compilers for Python. These won't work without
restrictions a la Shedskin which uses C++ as its backend. But unlike
Shedskin and the design mistake being made in this project we can
learn the lesson from Psyco that there must be a fallback to bytecode
interpretation when some argument does not fit the signature of a
specialized function. I'm personally a bit skeptical about JIT
compilation since I feel it's a waste of electric currency to start
recording all the information at runtime and perform native
compilations again and again. It doesn't mean that JIT compilation
can't be effective. I have a conceptual bias against it.

Instead I expect a rather stable behaviour and rather fixed nominal
types despite the fact that Python supports duck-typing. Unfortunately
I'm unable to recover an investigation about the effectiveness of
adding tests to a rather well equipped testbase. The result was one of
"declining profits". Stable behaviour emerged early with good coverage
properties. Than one had to work hard to improve and find more bugs. A
similar effect might also cause the effectiveness of dynamic typing
for catching errors which is not easy to justify by purely theoretical
arguments.

-----

Note that I worked out some pieces of a type feedback mechanism that
records types and displays the results in a slightly modified Python
dialect called TPython. A TPython program can be regarded as a "typed
hypothesis" about an otherwise untyped ( or dynamically typed ) Python
program. You can keep the TPython program and translate it completely
or fragments of it to other languages like C++, D, Java, OCaml etc.
Obviously this approach shall be used for incremental progress and
smooth integration and it might work even when only small portions can
be translated ( typically functions that keep and return only integers/
floats which are bound in range are a good start ). So far everything
is implemented in pure Python and the amount of source is < 1000 LOC.

I have not published anything of it yet and i'm a rather slow
publisher anyway. If anyone is interested in this kind of work and
willing to contribute i might set it aside from the usual EasyExtend
stuff ( which is the foundation nevertheless ) I'm working on and
reframe the project context e.g. on Sourceforge or elsewhere. Email
address and project page are still

ka*@fiber-space.de
www.fiber-space.de

If no one contributes the fun is exclusively on my side and everything
is moving slower. That's all.

Sep 16 '07 #12
On Sun, 16 Sep 2007 10:58:07 +0000, Marc 'BlackJack' Rintsch wrote:
On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:
>The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.
Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!

See also http://homepages.cwi.nl/~ralf/OOHaskell/
showing that Haskell can do object-oriented programming, complete with
mutable objects and side-effects. Although "duck typing" is listed as a
keyword, I couldn't see any reference to it in the paper.

Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.

If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.
--
Steven.
Sep 16 '07 #13
On Sun, 16 Sep 2007 13:31:57 +0000, Steven D'Aprano wrote:
On Sun, 16 Sep 2007 10:58:07 +0000, Marc 'BlackJack' Rintsch wrote:
>On Sun, 16 Sep 2007 09:50:39 +0000, Steven D'Aprano wrote:
>>The point is rather moot, since CPython (and probably other Pythons) do
almost no optimizations. But just because Python is a dynamic language
doesn't mean there are no optimizations possible: Haskell is a dynamic
language, and there are optimizing compilers for it. Of course, it is
much simpler for Haskell, because of the type system it uses.

What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.

Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!
Those side effects stay in the IO monad and don't "leak" into other areas
so the compiler still knows much more about the possible data flow than
the compilers of most other languages.
Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.
Haskell is statically typed, the compiler must know the type of every name
otherwise it stops with an error message. The `Maybe` type is for
functions that may return something (typed) or `Nothing`. For example a
`lookup` function can have the signature:

lookup :: Eq a =a -[(a,b)] -Maybe b
lookup key assocList = ...

The compiler infers the concrete types of `a` and `b` at compile time.
If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.
You said it must be possible to optimize a dynamic language because there
are optimizing compilers for a dynamic language like Haskell. That's a
minor point now?

The main problem with optimizing Python code is that the language is
dynamically typed -- the types of objects bound to names are only checked
at runtime and can change at nearly every point in time while executing.
Haskell is statically typed, the types are all inferred at compile time so
it's possible to optimize the code for those types. The compiler knows
the exact type of every name. By what definition is that a dynamic
language!?

Ciao,
Marc 'BlackJack' Rintsch
Sep 16 '07 #14
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
What do you mean by Haskell is a dynamic language? It is statically and
strict typed and the compiler usually knows all the functions. No
"surprises", no side effects, no duck typing.

Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!
In a certain theoretical sense, input and output in the IO monad are
not side effects. Side effects means you can somehow distinguish
between calling f(x) twice with the same value of x, and calling it
just once. In Haskell, the print function does an output operation
(called an "action") but if I understand it right, it turns out you
cannot call the print function twice with the same argument. Of
course you can say
main = do { print "hello"; print "hello" }
and that prints "hello" twice, but the do statement is syntactic sugar
that turns the above into approximately like (the exact mechanism is
a bit hairier):
x = (special token representing the state of the external world)
y = print ("hello", x) # y is the new external world state
z = print ("hello", y) # z is another new state
so print receives a different arg each time. (Note that every Haskell
function has exactly one argument, so f(x,y) means call f, giving it
the tuple (x,y) as an argument).
See also http://homepages.cwi.nl/~ralf/OOHaskell/
showing that Haskell can do object-oriented programming, complete with
mutable objects and side-effects. Although "duck typing" is listed as a
keyword, I couldn't see any reference to it in the paper.
That describes an OO extension to Haskell, which I suppose is
interesting, but it's not Haskell. You could also bolt a Python
interpreter into Haskell and that would not be Haskell either.
Haskell uses type inference, and has a "maybe" type for those cases where
it can't tell what the type will be.
Type inference just means the static type system doesn't rely on
declarations. Like when you say 3+4 in C++ or Java, the compiler
figures out that is an integer expression, but if you say
x = 3 + 4
without an "int x" declaration, the compiler complains. Haskell goes
a little further and lets you say
x = 3 + 4
and the compiler says "ok, now I know that x is an integer". If you
later try to concatenate x with a string:
y = x ++ "hello"
you get a compile time error message. In a dynamic language like
Python you would not get the error message til runtime. That is the
difference between a static and dynamic language.

The Maybe monad is still a static type. You can think of it as a
statically typed list with a maximum of one element, i.e. a Maybe Int
can have values like [3] (denoted "Just 3") or like [] (the empty
list, denoted Nothing). It cannot have a value like "foo" or
[3.14159] or [[7]] or [2,3,4]. You get compile time error messages
if you try to assign those values to it. Of course you can have a
Maybe Float (like [3.14159]) or a Maybe (Maybe Int) (like [[7]]).
If you still don't accept that Haskell is a dynamic language, for
whatever definition of dynamic language you use, I'll withdraw the claim
for the sake of not getting bogged down in long arguments over something
that was really just a minor point.
Haskell is certainly not a dynamic language in the usual sense of that
term.
Sep 16 '07 #15
On Sun, 16 Sep 2007 09:12:56 -0700, Paul Rubin wrote:
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
What do you mean by Haskell is a dynamic language? It is statically
and strict typed and the compiler usually knows all the functions.
No "surprises", no side effects, no duck typing.

Haskell's IO monad (and possibly the do monad?) allows side effects. It
would be a pretty poor programming language that didn't allow input or
output!

In a certain theoretical sense, input and output in the IO monad are not
side effects.
Nevertheless, the authors of the paper listed below say they use the IO
monad to get mutable objects and therefore side effects.

>See also http://homepages.cwi.nl/~ralf/OOHaskell/ showing that Haskell
can do object-oriented programming, complete with mutable objects and
side-effects. Although "duck typing" is listed as a keyword, I couldn't
see any reference to it in the paper.

That describes an OO extension to Haskell, which I suppose is
interesting, but it's not Haskell.
Built entirely from standard Haskell98. Saying that it isn't Haskell is a
little like saying that Python plus the Sets module isn't Python anymore.

--
Steven.
Sep 16 '07 #16
Steven D'Aprano <st***@REMOVE-THIS-cybersource.com.auwrites:
On Sun, 16 Sep 2007 15:05:55 +1000, Ben Finney wrote:
The compiler for a dynamic language like Python has very little
absolute "no significant side-effect in these specific cases"
guarantee of the kind you're assuming even in the cases you choose
for contrast with the ones that *do* have significant
side-effects.

Yes, in general one might not be able to make those guarantees, but
still there are common cases where you can.
Perhaps. All my message was intended to show was that your message
didn't contain any such cases.

--
\ "If life deals you lemons, why not go kill someone with the |
`\ lemons (maybe by shoving them down his throat)." -- Jack Handey |
_o__) |
Ben Finney
Sep 17 '07 #17

This discussion thread is closed

Replies have been disabled for this discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.