473,385 Members | 1,907 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

beginning index for generators

I was wondering if there is or there could be some way to pass a generator an
optional starting index so that if it supported that slicing could be made
more efficient. Right now if you do use a generator and do a [100:110] or any
other kind of slice it reads all the values up to 100 also.

I know a lot of generators have to do all previous parts before they can do
the next part but it would be a nice optimization that can start at any index
to take an optional argument to tell them what index they are currently
working on. This could make strides more efficient also since it would not
have to run the generator to skip the parts it was not using.

The reason I would like this to be part of the generator rather then making a
special function to do this is that it would make it easier to refactor a
piece of code to be a much more efficient generator from some other sequence
type without having to change how it is called.

I have no idea how feasible this is, what it would take to implement it etc
but I did not get any matches searching this on google and wanted to throw it
out here to see if others have run into situations in which this would be
very useful also and would further simplify code and make it more readable.

I would also love to know of other ways that this problem could be solved but
I would like it to be as transparent as possible to existing calling code.
Jul 18 '05 #1
9 2111
kosh wrote:
I was wondering if there is or there could be some way to pass a generator
an optional starting index so that if it supported that slicing could be
made more efficient. Right now if you do use a generator and do a
[100:110] or any other kind of slice it reads all the values up to 100
also.
<snip> I have no idea how feasible this is, what it would take to implement it
etc but I did not get any matches searching this on google and wanted to
throw it out here to see if others have run into situations in which this
would be very useful also and would further simplify code and make it more
readable.

There is no way of doing this implicit - only explicit, with deep knowledge
of what actually gets iterated over by the generator - and as the slicing
is a second, independent operation on the generator that only knows about
the generator implementing the iterable protocol, it can't jump to
arbitrary positions in that generator. And as you said before, the
generator needs to be invoked that 100 times so it has consistent internal
state.

I would also love to know of other ways that this problem could be solved
but I would like it to be as transparent as possible to existing calling
code.


Maybe imlementing __getslice__ on your container object helps - that way,
calling code has not to be changed, but the implementation can take
advantage from the underlying collection beeing efficiently sliceable.

--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
kosh wrote:
I was wondering if there is or there could be some way to pass a generator
an optional starting index so that if it supported that slicing could be
made more efficient. Right now if you do use a generator and do a
[100:110] or any other kind of slice it reads all the values up to 100
also.


Here is an 'iterate' class that decides whether to use iteration over or
indexed access of items based on the existence of a __getitem__() method.
It's mostly a proof of concept and not tested beyond what you see below.

Peter

import itertools, sys

def irange(start, stop, step):
if start is None:
start = 0
if step is None:
step = 1
index = start
if stop is None:
while True:
yield index
index += step
else:
if step < 0:
while index > stop:
yield index
index += step
else:
while index < stop:
yield index
index += step

def islice(iterable, start, step, stop):
if start is None: start = 0
if step is None: step = 1
return itertools.islice(iterable, start, stop, step)

class iterate(object):
def __init__(self, iterable):
self.iterable = iterable
def __getitem__(self, s):
if not isinstance(s, slice):
raise ValueError("supports only slices")
start, stop, step = s.start, s.stop, s.step
if hasattr(self.iterable, "__getitem__"):
indexable = self.iterable
def iterate():
for index in irange(start, stop, step):
yield indexable[index]
return iterate()
return islice(self.iterable, start, step, stop)

class Sequential(object):
def __init__(self):
self.index = 0
def __iter__(self):
return self
def next(self):
print "sequential[%s]" % self.index
try:
return self.index
finally:
self.index += 1

class Random(Sequential):
def __getitem__(self, index):
print "random[%s]" % index
return index
def printThem(iterable):
for i in iterable:
print i
print "---"

printThem(iterate(Random())[3:10:2])
printThem(iterate(Random())[3:10])
printThem(iterate(Sequential())[3:10:2])
printThem(iterate(Sequential())[3:10])
# I'd rather not:
# printThem(iterate(Sequential())[sys.maxint-10::3])
printThem(itertools.islice(iterate(Random())[sys.maxint-5::3], 10))

Jul 18 '05 #3
kosh wrote:
I was wondering if there is or there could be some way to pass a generator an
optional starting index so that if it supported that slicing could be made
more efficient. Right now if you do use a generator and do a [100:110] or any
other kind of slice it reads all the values up to 100 also.

I know a lot of generators have to do all previous parts before they can do
the next part but it would be a nice optimization that can start at any index
to take an optional argument to tell them what index they are currently
working on. This could make strides more efficient also since it would not
have to run the generator to skip the parts it was not using.


I think adding an optional .skip(n) to the fenerator protocol could be
useful, it could be used for strides too, and I can't think of a much
more generic way to do things.

However, the caller willing to use this optimization would always need
to check if .skip() is defined, although this could naturally be
automated to some degree.

It could be specified as:

If an iterator object has a function skip([n=1]), it can be used to
advance the iterator n steps without it having to yield the intermediate
values. The end state of the iterator should be the same as with:
for i in range(iterable):
iterable.next()
Jul 18 '05 #4
Peter Otten wrote:
def irange(start, stop, step):
Isn't that xrange?
def islice(iterable, start, step, stop):
And that's itertools.islice?
class iterate(object):
def __init__(self, iterable):
self.iterable = iterable
def __getitem__(self, s):
if not isinstance(s, slice):
raise ValueError("supports only slices")
start, stop, step = s.start, s.stop, s.step
if hasattr(self.iterable, "__getitem__"):


And doesn't that test fail for dictionaries?

Andrew
da***@dalkescientific.com
Jul 18 '05 #5
Andrew Dalke wrote:
Peter Otten wrote:
def irange(start, stop, step):


Isn't that xrange?


No. That's what I would like xrange() to be. It has no artificial (if
generous) sys.maxint limit and can be open-ended, i. e. comprises the
itertools.count() functionality extended for step != 1.
def islice(iterable, start, step, stop):


And that's itertools.islice?


No, but it _calls_ itertool.islice() as you may have noted :-)
Its raison d'être is that it allows to pass the start, stop, and step
attributes of a slice object without looking into the special cases.
itertools.islice() works with different signatures instead of None default
values. When called with three parameters, only 'stop' may be None. Slice
literals cannot be that strict because they may have a negative step, in
which case start values of None and 0 have a different meaning.

I had all checks for special cases in the iterate class first and factored
them out as an afterthought because they made it harder to see how simple
'iterate' really was.
class iterate(object):
def __init__(self, iterable):
self.iterable = iterable
def __getitem__(self, s):
if not isinstance(s, slice):
raise ValueError("supports only slices")
start, stop, step = s.start, s.stop, s.step
if hasattr(self.iterable, "__getitem__"):


And doesn't that test fail for dictionaries?


Yes, unfortunately. Discriminating between dictionaries and lists has been
deliberately left out. Is there a recommended way to do it? Looking for a
'keys' attribute maybe? Anyway, as long as something like iterate is not
built into the iterator protocol, it is up to the programmer to decide
whether it can help or do harm. Dictionaries may be wrapped somewhere up
the callstack:
hasattr(iter({}), "__getitem__")

False

Peter
Jul 18 '05 #6
Peter Otten
No. That's what I would like xrange() to be.
Ahh, I had forgotten about that restriction.

BTW, this is kinda cute with your code
list(irange("A", "AAAAAAAAA", "A")) ['A', 'AA', 'AAA', 'AAAA', 'AAAAA', 'AAAAAA', 'AAAAAAA', 'AAAAAAAA']
And that's itertools.islice?

No, but it _calls_ itertool.islice() as you may have noted :-)
Its raison d'être is that it allows to pass the start, stop, and step
attributes of a slice object without looking into the special cases.


Acutally I didn't notice. I was on my way out the door and didn't
look that closely. My apologies.

Given the worry you had above about the sys.maxint limit, aren't
you also concerned about the same limits with itertools.islice?
Yes, unfortunately. Discriminating between dictionaries and lists has been
deliberately left out. Is there a recommended way to do it? Looking for a
'keys' attribute maybe?


No, because there are some data structure which act like both
a list and a dictionary. Eg, one where the dictionary keys are
strings and the items are stored and accesible via integers
into the sort order.

Andrew
da***@dalkescientific.com
Jul 18 '05 #7
Andrew Dalke wrote:
BTW, this is kinda cute with your code
>>> list(irange("A", "AAAAAAAAA", "A")) ['A', 'AA', 'AAA', 'AAAA', 'AAAAA', 'AAAAAA', 'AAAAAAA', 'AAAAAAAA'] >>>
This is of course completely accidental and strongly discouraged :-)
On the other hand I don't like arbitrary limitations like that of the sum()
built-in.
And that's itertools.islice?

No, but it _calls_ itertool.islice() as you may have noted :-)
Its raison d'être is that it allows to pass the start, stop, and step
attributes of a slice object without looking into the special cases.


Acutally I didn't notice. I was on my way out the door and didn't
look that closely. My apologies.


De nada. It _was_ named suggestively to trigger your perception.
Given the worry you had above about the sys.maxint limit, aren't
you also concerned about the same limits with itertools.islice?


I'm not especially worried about these limits, I just didn't like the look
of xrange(start, sys.maxint-step+1, step) (?) as the substitute for an
open-ended sequence and decided to do it right - after all large indices
are a corner case where the 'iterate' class shines if the iterable is
"indexable", too. As to itertools.islice(), anyone who throws away, say,
the first 2**31 values of an iteration has probably other things than
Python's limitations to worry about.

Peter

Jul 18 '05 #8
kosh <ko**@aesaeion.com> wrote:
I was wondering if there is or there could be some way to pass a generator an
optional starting index so that if it supported that slicing could be made
more efficient. Right now if you do use a generator and do a [100:110] or any
other kind of slice it reads all the values up to 100 also.
You can pass any starting arguments you want when you call a generator,
just like you can when you call any other function. You can, if you
wish, make that argument optional, and by implied agreement between the
call site and the generator you can give that argument any meaning that
the generator is willing and able to supply. "This is the starting
index" is in no way different from any other meaning whatsoever.

This is all so obvious that I'm afraid you mean something very different
from what you're actually saying, but, if so, then you'll just have to
try to express yourself differently, because I can't read minds!-)

I know a lot of generators have to do all previous parts before they can do
Right, of course, so presumably such generators would never let you pass
you that optional "this is the starting index" argument -- with them you
have to start from the beginning, period.
the next part but it would be a nice optimization that can start at any index
to take an optional argument to tell them what index they are currently
working on. This could make strides more efficient also since it would not
have to run the generator to skip the parts it was not using.
I suspect you don't mean by "stride" the same thing I do. To me, being
an old Fortran/Linpack/etc programmer, a "stride" in a loop is the
amount by which the loop-control variable (an integer) is incremented
after each leg of the loop. In Python, for example, the third argument
to range, xrange or slice is what I think of as "a stride", though
Python tends to call it "a step" instead (but then, in English, a stride
IS nothing more than a long step!-). I don't see what passing an
optional starting index has to do with strides, or what it means to
"make strides more efficient". You could of course design your
generators, if what they're computing is suitable for that, to take an
optional stride argument (presumably defaulting to 1) instead of, or in
addition to, an optional start argument(presumably defaulting to 0).
Again, this is so obvious that I suspec we're just talking at
cross-purposes...

The reason I would like this to be part of the generator rather then making a
special function to do this is that it would make it easier to refactor a
piece of code to be a much more efficient generator from some other sequence
type without having to change how it is called.
You can always design and code a generator instead of "some other"
_function_ returning an iterator, or vice versa. The generator is often
more convenient, but whether it's "much more efficient" is of course
dubious. For example, all the stuff in itertools _could_ have been
coded as generators, but in fact they're coded as callables returning
iterators of other kinds (instances of C-coded iterator types), as
_that_ turns out to be the "much more efficient" implementation choice
in this case.

I have no idea how feasible this is, what it would take to implement it etc
but I did not get any matches searching this on google and wanted to throw it
out here to see if others have run into situations in which this would be
very useful also and would further simplify code and make it more readable.

I would also love to know of other ways that this problem could be solved but
I would like it to be as transparent as possible to existing calling code.


Perhaps if you can give some strawman examples of how you would like to
be able to code something, with exactly what inputs and what results,
somebody can understand exactly what you're wishing for, because, as of
now, I'm totally incapable of such understanding, beyond the obvieties
above expounded (and I suspect they ARE obvious enough that, if that was
all you were after, you wouldn't have posted -- but this doesn't allow
me to divine what it IS you're after, because the obvieties I expounded
DO match 100% of what you have literally been writing in this post).
Alex
Jul 18 '05 #9
Peter Otten wrote:
As to itertools.islice(), anyone who throws away, say,
the first 2**31 values of an iteration has probably other things than
Python's limitations to worry about.


True enough.

Personally I'm surprised to see that my two year old laptop
can do

for i in xrange(0, sys.maxint):
pass

in just over 7 minutes. It's like in the 1980s when I
realized that 16 bit numbers were small.
Andrew
da***@dalkescientific.com
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
by: Francis Avila | last post by:
Below is an implementation a 'flattening' recursive generator (take a nested iterator and remove all its nesting). Is this possibly general and useful enough to be included in itertools? (I know...
9
by: Francis Avila | last post by:
A little annoyed one day that I couldn't use the statefulness of generators as "resumable functions", I came across Hettinger's PEP 288 (http://www.python.org/peps/pep-0288.html, still listed as...
8
by: Timothy Fitz | last post by:
It seems to me that in python, generators are not truly coroutines. I do not understand why. What I see is that generators are used almost exclusively for generation of lists just-in-time. Side...
3
by: Carlos Ribeiro | last post by:
As a side track of my latest investigations, I began to rely heavily on generators for some stuff where I would previsouly use a more conventional approach. Whenever I need to process a list, I'm...
5
by: Robert Oschler | last post by:
Preamble: - I know this is the Python forum - I know about (and have used) Jython I already posted this question in comp.lang.java. But after a week I have still not received a single reply....
3
by: Michael Sparks | last post by:
Hi, I'm posting a link to this since I hope it's of interest to people here :) I've written up the talk I gave at ACCU Python UK on the Kamaelia Framework, and it's been published as a BBC...
6
by: Talin | last post by:
I've been using generators to implement backtracking search for a while now. Unfortunately, my code is large and complex enough (doing unification on math expressions) that its hard to post a...
6
by: Franz Mueller | last post by:
Hi, which of the following books would you recommend: "Dive into Python" or "Beginning Python: From Novice to Professional"? I'm an experienced C++-programmer who wants to take a look at...
13
by: Martin Sand Christensen | last post by:
Hi! First a bit of context. Yesterday I spent a lot of time debugging the following method in a rather slim database abstraction layer we've developed: ,---- | def selectColumn(self,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.