itertools.flatten()? and copying generators/iterators.

Francis Avila

Below is an implementation a 'flattening' recursive generator (take a nested
iterator and remove all its nesting). Is this possibly general and useful
enough to be included in itertools? (I know *I* wanted something like it...)

Very basic examples:

rl = [1, [2, 3, [4, 5], '678', 9]]
list(flatten(rl)) [1, 2, 3, 4, 5, '6', '7', '8', 9] notstring = lambda obj: not isinstance(obj, type(''))
list(flatten(rl, notstring)) [1, 2, 3, 4, 5, '678', 9] isstring = lambda obj: not notstring(obj)
list(flatten(rl, isstring)) [1, [2, 3, [4, 5], '678', 9]] #The string is within a list, so we never descend that far.
car_is_2 = lambda obj: isinstance(obj, type([])) and obj[0] == 2
list(flatten(rl, car_is_2)) [1, 2, 3, [4, 5], '678', 9] rls = ['Here', 'are', ['some', ['nested'], 'strings']]
list(flatten(rls)) ['H', 'e', 'r', 'e', 'a', 'r', 'e', 's', 'o', 'm', 'e', 'n', 'e', 's', 't',
'e', 'd', 's', 't', 'r', 'i', 'n', 'g', 's'] list(flatten(rls, notstring)) ['Here', 'are', 'some', 'nested', 'strings'] rli = iter([1, 2, iter(['abc', iter('ABC')]), 4])
list(flatten(rli)) [1, 2, 'a', 'b', 'c', 'A', 'B', 'C', 4] list(flatten(rli, notstring)) [] #rli is an iterator, remember!
rli = iter([1, 2, iter(['abc', iter('ABC')]), 4])
list(flatten(rli, notstring)) [1, 2, 'abc', 'A', 'B', 'C', 4] # The following I'm not sure what to do about...
empty = [1, [], 3]
emptyiter = [1, iter([]), 3]
list(flatten(empty)) [1, [], 3] list(flatten(emptyiter)) [1, 3]

I tried hard to get it to work with iterator and generator objects, too, and
it mostly does. However, I'm having some problems determining whether a
given object will iterate infinitely, if that object is already a
generator/iterator. Basically, I'm unable to copy an iterator (why?). See
isemptyorself() below for details. Aside from that, I'm just generally
unsure what the proper behavior should be when an iterator/generator is
encountered.

Also, why is the iterator type not included in the types module or described
in the language reference (Python 2.2)?

--- Code ---
def isiterable(obj):
try: iter(obj)
except TypeError: return False
else: return True

def isemptyorself(iterable):
"""True if iterable yields nothing or itself."""
it = iter(iterable)

# KLUDGE! This test must modify the object in order to test
# it. This means that a value of iterable will be discarded!
# Most likely, iterable is itself an iterator or generator,
# because id(iter(GENR or ITER)) == id(GENR or ITER).
# Unfortunately, we can't copy generators and iterators using
# the copy module, so we must just assume that this iterator
# doesn't yield itself or nothing....

if it is iterable:
return False

try: res = it.next()
except StopIteration: #Yields nothing
return True
else:
if res == iterable: #Yields itself
return True
return False

def flatten(iterable, isnested=None):
"""Iterate items in iterable, descending into nested items.

isnested is a function that returns true if the element of
iterable should be descended into. The default is to
consider iterable anything that iter() thinks is iterable (unless
doing so would cause an infinite recursion).

"""
if isnested is None:
isnested = lambda obj: True #Always descend

for item in iterable:
if isiterable(item) and not isemptyorself(item) \
and isnested(item):
for subitem in flatten(item, isnested):
yield subitem
else:
yield item
--
Francis Avila

Jul 18 '05 #1

Subscribe Post Reply

3682

Raymond Hettinger

[Francis Avila]

Below is an implementation a 'flattening' recursive generator (take a nested
iterator and remove all its nesting). Is this possibly general and useful
enough to be included in itertools? (I know *I* wanted something like it...)
Interesting post!

I'll add your idea to the list of itertool suggestions received so far. My
first take
is that it more likely to be added to the "recipes" in the examples section.

Core itertools should be primitive building blocks that combine with one
another to make other tools. Also, I'm trying to keep the toolset as small
as possible by excluding new tools that can be easily and efficiently coded
in pure python.

I would code it like this:

def flatten(s):
try:
iter(s)
except TypeError:
yield s
else:
for elem in s:
for subelem in flatten(elem):
yield subelem

As your examples show, it does have some suprising behavior in that
strings get split apart instead of staying intact. That is easily taken care of
by an AtomicString subclass:

class AtomicString(str):
def __iter__(self):
raise TypeError

a = [1, 2, AtomicString('abc'), 4]

# The following I'm not sure what to do about...
empty = [1, [], 3]
emptyiter = [1, iter([]), 3]

The above code definition handles this without a problem.
I'm having some problems determining whether a
given object will iterate infinitely, if that object is already a
generator/iterator.
In general, that is not knowable.
Basically, I'm unable to copy an iterator (why?).
Alex Martelli just wrote a PEP about making many iterators copyable.
See www.python.org/peps/pep-0323.html

Until that is adopted, your best bet is to use tee():

def tee(iterable):
"Return two independent iterators from a single iterable"
def gen(next, data={}, cnt=[0]):
dpop = data.pop
for i in count():
if i == cnt[0]:
item = data[i] = next()
cnt[0] += 1
else:
item = dpop(i)
yield item
next = iter(iterable).next
return (gen(next), gen(next))

Also, why is the iterator type not included in the types module or described
in the language reference (Python 2.2)?

There is no iterator type. Iterators can be any object that supports the
iterator
protocol: __iter__() and next(). Generators, container iterators, and each of
the itertools define their own type.
This was a long but highly instructive pair of posts. I hope it becomes
widely read. Your work ventured into previously uncharted
territory and touched on a number of frontier issues like copyability,
determining whether something is iterable, the various iterator types,
and the deep mathematical subject of determining whether a given
process is finite.
Raymond Hettinger

Jul 18 '05 #2

Peter Otten

Raymond Hettinger wrote:

Core itertools should be primitive building blocks that combine with one
another to make other tools. Also, I'm trying to keep the toolset as
small as possible by excluding new tools that can be easily and
efficiently coded in pure python.

I would code it like this:

def flatten(s):
try:
iter(s)
except TypeError:
yield s
else:
for elem in s:
for subelem in flatten(elem):
yield subelem

As your examples show, it does have some suprising behavior in that
strings get split apart instead of staying intact. That is easily taken
care of by an AtomicString subclass:

class AtomicString(str):
def __iter__(self):
raise TypeError

a = [1, 2, AtomicString('abc'), 4]

>>> # The following I'm not sure what to do about...
>>> empty = [1, [], 3]
>>> emptyiter = [1, iter([]), 3]

I suggest a minor modification:

def flatten(s, toiter=iter):
try:
it = toiter(s)
except TypeError:
yield s
else:
for elem in it:
for subelem in flatten(elem, toiter):
yield subelem

def keepstrings(seq):
if isinstance(seq, basestring):
raise TypeError
return iter(seq)

sample = [1, 2, [3, "abc def".split()], 4]
print sample
print list(flatten(sample, keepstrings))
print list(flatten([1, [], 3]))
print list(flatten([1, iter([]), 3]))

The above handles strings in a way that is nonintrusive on client code.

Peter

Jul 18 '05 #3

Alex Martelli

Peter Otten wrote:
...

def keepstrings(seq):
if isinstance(seq, basestring):
raise TypeError
return iter(seq) ... The above handles strings in a way that is nonintrusive on client code.

Yes, very nice. I'd make keepstrings the default (one RARELY wants
to treat strings as nonatomic) AND replace the typetest with an
attempt to do a seq+'' (if it doesn't raise, seq ain't a string),
but that's just me:-).
Alex

Jul 18 '05 #4

Francis Avila

"Alex Martelli" <al***@aleax.it> wrote in message
news:Te********************@news1.tin.it...

Peter Otten wrote:
...
def keepstrings(seq):
if isinstance(seq, basestring):
raise TypeError
return iter(seq)

...
The above handles strings in a way that is nonintrusive on client code.

Yes, very nice. I'd make keepstrings the default (one RARELY wants
to treat strings as nonatomic) AND replace the typetest with an
attempt to do a seq+'' (if it doesn't raise, seq ain't a string),
but that's just me:-).

Alex

try: seq+'' seems more expensive. Is it just to handle the case where
someone makes a string-alike that doesn't derive from str? (Actually, I
guess that's about all pre-2.2 code.)

Is it expensive enough to even matter?
--
Francis Avila

Jul 18 '05 #5

Peter Otten

Francis Avila wrote:

try: seq+'' seems more expensive. Is it just to handle the case where
someone makes a string-alike that doesn't derive from str? (Actually, I
guess that's about all pre-2.2 code.)

Is it expensive enough to even matter?
--
Francis Avila

I you think that the concatenation is costly, as I did when I read your
remark - it's not:

smartSnakes = "don't rattle"
smartSnakes is (smartSnakes + "")

True

Peter

Jul 18 '05 #6

Francis Avila

"Raymond Hettinger" <vz******@verizon.net> wrote in message
news:ni******************@nwrdny03.gnilink.net...

[Francis Avila]
Also, why is the iterator type not included in the types module or described
in the language reference (Python 2.2)?

There is no iterator type. Iterators can be any object that supports the
iterator
protocol: __iter__() and next(). Generators, container iterators, and

each of the itertools define their own type.

Actually (Python 2.2):

IteratorType = type(iter(()))
IteratorType <type 'iterator'> from types import GeneratorType
GeneratorType <type 'generator'> GeneratorType == IteratorType 0 isinstance(IteratorType, GeneratorType) 0 isinstance(GeneratorType, IteratorType) 0 dir(GeneratorType) ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
'__init__', '__iter__', '__new__', '__reduce__', '__repr__', '__setattr__',
'__str__', 'gi_frame', 'gi_running', 'next'] dir(IteratorType) ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
'__init__', '__iter__', '__new__', '__reduce__', '__repr__', '__setattr__',
'__str__', 'next'] def g(): yield None

G = g()
type(G) <type 'generator'> G <generator object at 0x0156F1C0> iter(G) <generator object at 0x0156F1C0> I = iter(())
type(I) <type 'iterator'> I <iterator object at 0x014BAC80> iter(I) <iterator object at 0x014BAC80>

The generator seems to be an actual execution frame, but an iterator only a
type that supports the iterator protocol. I think from this that iterator
needs to store all its elements prior to yielding them, whereas a generator
does not. For this reason, an iterator will always terminate eventually.
For these two reasons, I think it's useful to distinguish iterators and
generators.

I think I've clarified it a bit more to myself. For iterators, it iterates
infinitely if it yields itself, otherwise it doesn't. For generators, it is
unknowable.

Are classes supporting the iterator protocol always generators? Would they
*ever* be an iterator? (According to type(), I mean) Is there any other way
to get a strict iterator other than applying iter() to a sequence? (I
noticed you can't force Python to subclass the iterator type.)

The language reference (or maybe it's the module reference?) states that
generators yield iterator objects. That's not exactly true: they yield
generator objects which support the iterator protocol, but not iterator
objects. And again, iterator types are not mentioned as a separate type in
the language reference, which strikes me as odd.

Perhaps a guru can clarify the relationship between generators and
iterators?
--
Francis Avila

Jul 18 '05 #7

Francis Avila

"Peter Otten" <__*******@web.de> wrote in message
news:bn*************@news.t-online.com...

Francis Avila wrote:
try: seq+'' seems more expensive. Is it just to handle the case where
someone makes a string-alike that doesn't derive from str? (Actually, I
guess that's about all pre-2.2 code.)

Is it expensive enough to even matter?
--
Francis Avila

I you think that the concatenation is costly, as I did when I read your
remark - it's not:
smartSnakes = "don't rattle"
smartSnakes is (smartSnakes + "")

True

Peter

Actually, I meant the exception testing. Isn't Alex suggesting something
like?:

try:
seq+''
except:
seq is not a string
else:
seq is a string

--
Francis Avila

Jul 18 '05 #8

Alex Martelli

Francis Avila wrote:
...

Actually, I meant the exception testing. Isn't Alex suggesting something
like?:

try:
seq+''
except:
seq is not a string
else:
seq is a string

Right. Yes, if you have a lot of nonstrings this IS going to be
substantially slower than isinstance tests. As usual, setting up
a representative benchmark and measuring the effect is best.

Of course, it's not easy to decide what IS representative, but
let's give it a try:

# Peter Otten's suggestion (simplified/flattened out)
def flatten_po(s):
try:
if isinstance(s, basestring):
raise TypeError
else:
it = iter(s)
except TypeError:
yield s
else:
for elem in it:
for subelem in flatten_po(elem):
yield subelem

# my slight preference
def flatten_am(s):
try:
s+''
except TypeError:
try: it = iter(s)
except TypeError:
yield s
else:
for elem in s:
for subelem in flatten_am(elem):
yield subelem
else:
yield s

# an example tree -- a mix of strings, items, some nesting
subtree = ['foo']*18 + [1,2,3]*6
tree = [ subtree*10, [ subtree * 8 ] ]

# functions to benchmark
def process(flatten, tree=tree):
return list(flatten(tree))

def pro_po(): return process(flatten_po)
def pro_am(): return process(flatten_am)
Now, with this setup, I measure:

[alex@lancelot bo]$ timeit.py -c -s'import fl' 'fl.pro_po()'
100 loops, best of 3: 9.4e+03 usec per loop
[alex@lancelot bo]$ timeit.py -c -s'import fl' 'fl.pro_am()'
100 loops, best of 3: 8.3e+03 usec per loop

....which is a reminder of WHY it's best to always measure --
I fully expected pro_am to be slower, as it's more general
(handles UserString instances etc), and I was just trying
to check _how_ much slower it was...!

Upon reflection, the problem is with flatten_po elegant
'raise TypeError', which merges strings with noniterables
BUT only at the cost of an exception. Changing the
preample of flatten_po to:

try:
if isinstance(s, basestring):
yield s
return
...

improves its timeit.py-measured performance to 5.9e+03 usec.
So, on this sample of data, the _po version is either 13%
slower or 29% faster than the _am .

Now, fortunately, it's an EXTREMELY unlikely event that
performance differences of this order of magnitude should
deeply influence our choice of algorithms! Twice as slow,
well, one wants to keep an eye on that; 30% or less, it's
unlikely to matter except at the very heart of some key
bottleneck -- and, honestly, how often will flattening
overhead be THE bottleneck of your application...? Note
that in this case we're basically doing no processing with
the items in the flattened sequence, just stuffing them into
flat lists. Do any processing whatsoever, and that will
probably dwarf the infrastructural overhead of the flattening
procedure itself.

The key thing is watching for difference in big-O behavior --
and, here, we have none... just somewhat different (NOT
by whole orders of magnitude!) multiplicative constants.
THAT kind of performance issue is one we can live with
more often than not! Which, in turn, is what makes the
ordinary use of exceptions generally tolerable from the
point of view of performance, except in the very hottest
bottlenecks...
Alex

Jul 18 '05 #9

Alex Martelli

Francis Avila wrote:
...

Perhaps a guru can clarify the relationship between generators and
iterators?

Sure: the use of "iterators" or "iterator objects" in the Python docs
does NOT mean "objects that are instance of <type 'iterator'>", it
means "objects that satisfy the iterator protocol".

The situation may be a bit less confusing in today's Python, just
because the typenames involved have been changed a bit in a few
important special cases:

iter(()) <tupleiterator object at 0x402deecc>
iter([]) <listiterator object at 0x402dedec>
iter('')

<iterator object at 0x402dee8c>

These types are unrelated -- each only inherits from object, as
you can check by viewing their __bases__ -- and besides the small
performance improvement that results from each of these iterator
objects 'knowing' the exact type of sequence it's iterating on,
the name diversification may help avoid the understandable kind
of confusion you may have been led into by older Python versions.
Alex

Jul 18 '05 #10

Alex Martelli

Francis Avila wrote:
...

try: seq+'' seems more expensive. Is it just to handle the case where
someone makes a string-alike that doesn't derive from str? (Actually, I
As per my other post, it sure SEEMS more expensive - whether it IS or
not depends on details:-). Yes, it's basically to try and treat as a
string any class that looks like one.
guess that's about all pre-2.2 code.)
Nope, UserString is still alive and well in 2.2 and 2.3 -- and doesn't
subclass basestring (I don't know why). Yes, if you write stringlike
classes in 2.2 and later, subclassing basestring, just as a "flag" since
quite a few spots may specialcase instances of basestring, IS becoming
advisable, I suspect.
Is it expensive enough to even matter?

Nope (in the sample case I measured), again see my other post for
details. But perhaps my old-style insistence against isinstance
abuse is becoming misplaced in this one case -- basestring exists
ONLY for the purpose of allowing easy isinstance checks, after all,
it offers no useful functionality per se. So, I should maybe
resign myself to the inevitable...

....and hope a similar type 'basenumber' to become the "just flagging"
baseclass of int, long, float and complext doesn't emerge...:-)
Alex

Jul 18 '05 #11

Bengt Richter

On Tue, 28 Oct 2003 14:19:40 GMT, Alex Martelli <al***@aleax.it> wrote:

Francis Avila wrote:
...
Actually, I meant the exception testing. Isn't Alex suggesting something
like?:

try:
seq+''
except:
seq is not a string
else:
seq is a string

Right. Yes, if you have a lot of nonstrings this IS going to be
substantially slower than isinstance tests. As usual, setting up
a representative benchmark and measuring the effect is best.

Of course, it's not easy to decide what IS representative, but
let's give it a try:

I am a worrywart, I guess, but my reaction is 'Nicht finger-poken
in der blinken-machinen' etc ;-) IOW, without knowing exactly what kind of
blinken-machine seq is, I would fear triggering an unknown side effect
with seq+'', so I would not like such a test as part of library code without
at least having it documented with shout-caps in the doc strings.

Regards,
Bengt Richter

Jul 18 '05 #12

Peter Otten

Alex Martelli wrote:

The key thing is watching for difference in big-O behavior --
and, here, we have none... just somewhat different (NOT
by whole orders of magnitude!) multiplicative constants.
THAT kind of performance issue is one we can live with
more often than not! Which, in turn, is what makes the
ordinary use of exceptions generally tolerable from the
point of view of performance, except in the very hottest
bottlenecks...

You are right, but let me play a little with the code anyway:

def flatten_am(s):
try:
s+''
except TypeError:
try: it = iter(s)
except TypeError:
yield s
else:
for elem in s:
for subelem in flatten_am(elem):
yield subelem
else:
yield s
def flatten_dict(s, isScalar):
try:
scalar = isScalar[s.__class__]
except KeyError:
t = s.__class__
try:
s = iter(s)
except TypeError:
scalar = isScalar[t] = True
else:
scalar = isScalar[t] = False

if scalar:
yield s
else:
for elem in s:
for subelem in flatten_dict(elem, isScalar):
yield subelem

# an example tree -- a mix of strings, items, some nesting
subtree = ['foo']*18 + [1,2,3]*6
tree = [ subtree*10, [ subtree * 8 ] ]

def pro_am():
for f in flatten_am(tree): pass

def pro_dict():
for f in flatten_dict(tree, {"".__class__: True}): pass

assert list(flatten_am(tree)) == list(flatten_dict(tree, {"".__class__:
True}))
This gives for flatten_am()
....> timeit.py -c -s"import fl" "fl.pro_am()"
100 loops, best of 3: 5.7e+03 usec per loop

versus

....> timeit.py -c -s"import fl" "fl.pro_dict()"
1000 loops, best of 3: 1.37e+03 usec per loop

for flatten_dict(). So you get some speed up at the cost of uglier though
still quite readable code and under the assumption that either all or no
instances of a class are iterable.
Of course the idea works with both isinstance() and a + "" but seems
somewhat more in the spirit of isinstance().

Peter

Jul 18 '05 #13

Alex Martelli

Peter Otten wrote:
...

def flatten_dict(s, isScalar):
try:
scalar = isScalar[s.__class__]
except KeyError:
t = s.__class__
try:
s = iter(s)
except TypeError:
scalar = isScalar[t] = True
else:
scalar = isScalar[t] = False

if scalar:
yield s
else:
for elem in s:
for subelem in flatten_dict(elem, isScalar):
yield subelem

Neat: "caching" (aka memoizing) IS a good general technique
for optimization, and this is a cool and un-obvious application.
Alex

Jul 18 '05 #14

Francis Avila

"Peter Otten" <__*******@web.de> wrote in message
news:bn*************@news.t-online.com...

def flatten_dict(s, isScalar):
try:
scalar = isScalar[s.__class__]
except KeyError: .... t = s.__class__
Why doesn't the above fail (AttributeError) for classic classes?

.... scalar = isScalar[t] = True
else:
scalar = isScalar[t] = False
This part I never would have thought to do. Clever!

.... for flatten_dict(). So you get some speed up at the cost of uglier though
still quite readable code and under the assumption that either all or no
instances of a class are iterable.
That's not always a good assumption to make, if we want to decide whether to
iterate based upon some property of the object (which is actually how I
originally discovered I needed a "flatten").

Perhaps we can combine both conditions and caching optimizations?
def flatten_fastcond(s, isScalar, itercond=lambda o: None):
try:
scalar = isScalar[s.__class__]
except KeyError:

# If itercond has something to say about s, use *it*, and don't
# add the lookup to isScalar.

iterate = itercond(s)
if not iterate is None:
scalar = iterate
if not scalar:
s = iter(s)

else:
t = s.__class__
try:
s = iter(s)
except TypeError:
scalar = isScalar[t] = True
else:
scalar = isScalar[t] = False

if scalar:
yield s
else:
for elem in s:
for subelem in flatten_dict(elem, isScalar):
yield subelem
It's usually true that all instances of a given class are iterable or not,
and it's less inconvenient to add an entry to a dictionary than to write a
conditional test (also faster). If we move the itercond block above 'try',
we'll slow down this more common case, but speed things up if s is a
homogenous structure where itercond does all the work of deciding. There
are probably other things I could say against it if I thought about it, and
we're fast enough anyway.

As expected, given the same test tree (which doesn't exercise itercond), the
speed is about the same.
python timeit.py -c -s"import flatten" "flatten.pro_fastcond()" 100 loops, best of 3: 6.29e+003 usec per loop
python timeit.py -c -s"import flatten" "flatten.pro_dict()"

100 loops, best of 3: 6.29e+003 usec per loop

I'll make up some fancy tree later to test out itercond.
--
Francis Avila

Jul 18 '05 #15

Francis Avila

"Francis Avila" <fr***********@yahoo.com> wrote in message
news:vq************@corp.supernews.com...

As expected, given the same test tree (which doesn't exercise itercond), the speed is about the same.
*Ahem*:
def flatten_fastcond(s, isScalar, itercond=lambda o: None): .... for subelem in flatten_dict(elem, isScalar):
yield subelem
I thought it was a little strange that it was *exactly* the same, given the
function call...
Corrected (still not much difference, though):

....>python timeit.py -c -s"import flatten" "flatten.pro_dict()"
100 loops, best of 3: 7.22e+003 usec per loop

....>python timeit.py -c -s"import flatten" "flatten.pro_fastcond()"
100 loops, best of 3: 7.33e+003 usec per loop

I'll make up some fancy tree later to test out itercond.

Well, should have made it sooner. I would have noticed an empty list...
--
Francis Avila

Jul 18 '05 #16

Bryan

i've embedded python into our product and i'm not sure how to handle this situation. the following code is a
simplification of a function that calls a python function. i've removed all error checking to furthur simplify the
problem. the following code works correctly:

void foo(context_t ctx) {
PyObject py_interp = NULL;
PyObject py_mod = NULL;
PyObject py_call_foo = NULL;
const char *mod;

PyEval_AcquireLock();
py_interp = get_interpreter(ctx);
PyThreadState_Swap(py_interp);

mod = get_module(ctx);
py_mod = PyImport_ImportModule(mod);
py_call_foo = PyObject_GetAttrString(py_mod, "py_foo");
PyObject_CallFunction(py_call_foo, NULL);

Py_XDECREF(py_interp);
Py_XDECREF(py_mod);
Py_XDECREF(py_call_foo);

PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
}

now i have a situation where foo may be called at the same time in different c threads. furthermore, the call py_foo
might take forever to execute and foo is then required to return immediately. so i modified the code to something like
this where foo starts a threaded function foo_threaded in a new thread so foo can return immediately.

void foo_threaded(void *args) {
PyObject py_interp = NULL;
PyObject py_mod = NULL;
PyObject py_call_foo = NULL;
const char *mod;

PyEval_AcquireLock();
py_interp = get_interpreter(ctx);
PyThreadState_Swap(py_interp);

mod = get_module(ctx);
py_mod = PyImport_ImportModule(mod);
py_call_foo = PyObject_GetAttrString(py_mod, "py_foo");
PyObject_CallFunction(py_call_foo, NULL);

Py_XDECREF(py_interp);
Py_XDECREF(py_mod);
Py_XDECREF(py_call_foo);

PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
}
void foo(context_t ctx) {
Thread thread;
thread = new_thread(call_py_foo);
thread.start()
}

this seems like the wrong approach to me. if foo gets called a second time but the call to py_foo hasn't returned yet,
the interpreter is still locked. so how can another py_foo be called? the 2nd time foo is called the call to
get_interpreter may or may not return the same interpreter. what am i missing?

thanks for any help.

bryan

Jul 18 '05 #17

Bryan

sorry, posted to the wrong thread

bryan

Jul 18 '05 #18

Peter Otten

Francis Avila wrote:

for flatten_dict(). So you get some speed up at the cost of uglier though
still quite readable code and under the assumption that either all or no
instances of a class are iterable.

That's not always a good assumption to make, if we want to decide whether
to iterate based upon some property of the object (which is actually how I
originally discovered I needed a "flatten").

Perhaps we can combine both conditions and caching optimizations?

I didn't work this out, but I suppose the way to go is to allow three states
(ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
cache misses and NOCACHE values.

Peter

Jul 18 '05 #19

Francis Avila

"Peter Otten" <__*******@web.de> wrote in message
news:bn*************@news.t-online.com...

Francis Avila wrote:
for flatten_dict(). So you get some speed up at the cost of uglier though still quite readable code and under the assumption that either all or no instances of a class are iterable.
That's not always a good assumption to make, if we want to decide whether to iterate based upon some property of the object (which is actually how I originally discovered I needed a "flatten").

Perhaps we can combine both conditions and caching optimizations?

I didn't work this out, but I suppose the way to go is to allow three

states (ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
cache misses and NOCACHE values.

Peter

Hah, just finished writing the post the moment you

Jul 18 '05 #20

Francis Avila

Here is my final attempt at flatten. It's very powerful, but also a bit
more unwealdy...

It combines the sequence-altering ability of Peter's 'toiter' version, with
his class-iterability caching. It can flatten arbitrary data types, and
manipulate them as it goes. See the giant doc string for examples.

Idea for enhancement: instead of itercond being a single function, allow it
to be a sequence of functions, which act as filters, each called in turn.
(Exercise for the reader.)

(For a mint, allow the sequence of functions to be nested...)

Ok, that's enough. Here we go:

--- Code ---

# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila.
def flatten_itercond(iterable, do_descend=None,
itercond=lambda seq:(None, seq)):
"""Yield leaves of nested iterable.

Caveat: assumes that all instances of a class are iterable or not.
To get around this, use an itercond function.

do_descend is a dictionary of class:bool pairs. The class of
iterable is looked up in do_descend. If its value is true,
flatten iterates the item, else not. Use this dictionary to set a
lower bound to iteration. Default is {''.__class__:False}, which
ensures that strings are not broken up into characters.
do_descend is added to as flatten goes along, to bypass checks
for reoccuring classes' iterability.

itercond is a function called when a lookup in do_descend fails.
It must accept iterable, and return (isiterable, seq), where
isiterable is one of (True, False, None) and seq is the (possibly
modified) sequence which replaces iterable. "True" means, "seq is
iterable." "None" means "seq.__class__ has consistent
iterability: let do_descend optimize it." "True or False" means
the opposite, and will bypass do_descend completely. Default is
(None, iterable), i.e., let do_descend handle everything.

Be careful with using 'None': it'll gain you some speed, but
sometimes you return a seq that *you* may only use as
*initerable,* but which is *intrinsically* iterable. 'None' does
not let you control what do_descend thinks about seq's
iterability, and this may lead to not reaching leaves you want to
reach or recursing infinitely (if there's a string-like object
around). For example, if you return a tuple pair on leaves, the
sequence is atomic to *you,* but not to do_descend. See the
leafdata function at the end of this docstring.

Examples:

flatten = flatten_itercond #for doctest
tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
list(flatten(tree)) [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree'] # Now lets break up strings into characters.
def tochars(s): ... # This one works by modifying s.
... if isinstance(s, str):
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... # Return False
... # to prevent adding strings to do_descend.
... return False, s
... else:
... # flatten will sort this one out itself.
... return None, s
... def stopchars(s): ... #This one works by stopping descent when we reach a char.
... if isinstance(s, str):
... if len(s) == 1:
... return False, s
... else:
... return True, s
... else:
... return None, s
... # Don't forget to turn off the default do_descend!
mres = list(flatten(tree, do_descend={}, itercond=tochars))
sres = list(flatten(tree, {}, stopchars))
if mres == sres: ... print 'True'
...
True mres [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] # You can filter out things if you're obsessed with speed,
# but better to do it *after* you get a flat list from
# flatten()...
def dropnum(s): ... if isinstance(s, type(0)):
... # Can't return nothing,
... # so return something that *iterates* to nothing!
... return True, ()
... else:
... return None, s
... list(flatten(tree, itercond=dropnum)) ['This', 'is', 'a', 'funky tree']
Itercond's real power is with trees of arbitrary objects...
flatten=flatten_itercond #for doctest
class Node(object): ... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
... leaves = [Node(chr(i+65),i) for i in range(5)]
branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
root = [Node(chr(i+65), i, branches) for i in range(10,15)]
list(flatten(root)) [] #We just listed the children of the leaves--of course it's empty!
def leafdata(n): ... if isinstance(n, Node):
... if not n.children:
... return False, (n.label, n.data)
... else:
... #Strictly speaking, Node doesn't need an __iter__!
... #We could return iter(n.children) here if we want.
... return True, n
... else:
... return None, n # n is a list of Nodes.
... list(flatten(root, itercond=leafdata))

[('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]

Use your imagination...

"""
if do_descend is None:
do_descend = {''.__class__:False, u''.__class__:False}
try:
descend = do_descend[iterable.__class__]
except KeyError:

# If itercond has something to say about iterable, use it and
# don't add the lookup to do_descend.

descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
if descend:
iterable = iter(iterable)
# Else, descend is False.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
descend = do_descend[t] = False
else:
descend = do_descend[t] = True

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_itercond(elem, do_descend, itercond):
yield subelem

--
Francis Avila

Jul 18 '05 #21

Francis Avila

"Peter Otten" <__*******@web.de> wrote in message
news:bn*************@news.t-online.com...

Francis Avila wrote:
for flatten_dict(). So you get some speed up at the cost of uglier though still quite readable code and under the assumption that either all or no instances of a class are iterable.
That's not always a good assumption to make, if we want to decide whether
to iterate based upon some property of the object (which is actually how I originally discovered I needed a "flatten").

Perhaps we can combine both conditions and caching optimizations?

I didn't work this out, but I suppose the way to go is to allow three

states (ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
cache misses and NOCACHE values.

Peter

Actually, now that I think of it, just put functions into the dictionary.
Do a type lookup, if False, don't iterate, if True, iterate, if a callable,
call it with the seq and have it return True/False and the replacement seq.
This corresponds to the four possible states: InstanceIterable,
InstanceNotIterable (dict keys with function values), and ClassIterable,
ClassNotIterable (dict keys with True/False).

This would eliminate the "isinstance()" that would end up being at the
beginning of every conditional function, and there'd be no need to fret over
when to return None. But if the issue of type were more complex than
looking at the class, the dict lookup wouldn't be enough. Perhaps if there
were a default function in the cache dict, called on misses? That would
eliminate the final argument to flatten_itercond. Although, what would be
the key for cache dict, such that it never gets overwritten?

I really need to stop thinking about this silly flatten() function, but
here's a quick hack which passes the hastily-modified doctests:

--- Code ---

# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila, with function-items in dict.
def flatten_dictcond(iterable, do_descend=None,
itercond=lambda seq:(None, seq)):
"""Yield leaves of iterable.

itercond is the default handler, if there's no type match.
add types and function handler to do_descend.

Tests:

flatten = flatten_dictcond #for doctest
tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
list(flatten(tree)) [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree'] # Now lets break up strings into characters.
def tochars(s): ... # This one works by modifying s.
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... return False, s
... def stopchars(s): ... #This one works by stopping descent when we reach a char.
... if len(s) == 1:
... return False, s
... else:
... return True, s
... mres = list(flatten(tree, do_descend={type(''):tochars},))
sres = list(flatten(tree, {type(''):stopchars}))
if mres == sres: ... print 'True'
...
True mres [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] def dropnum(s): ... return True, ()
... list(flatten(tree, {type(''):False, type(0):dropnum})) ['This', 'is', 'a', 'funky tree']
flatten=flatten_dictcond #for doctest
class Node(object): ... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
... leaves = [Node(chr(i+65),i) for i in range(5)]
branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
root = [Node(chr(i+65), i, branches) for i in range(10,15)]
list(flatten(root)) [] def leafdata(n): ... if not n.children:
... return False, (n.label, n.data)
... else:
... return True, n
... list(flatten(root, {Node:leafdata}))

[('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]
"""
if do_descend is None:
do_descend = {type(''):False, type(u''):False}
try:
descend = do_descend[iterable.__class__]
except KeyError:
# Default handling remains the same until I think about it more.
descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
# But we don't know what it is, so we find out in the else
# block.
if descend: #InstanceIterable
iterable = iter(iterable)
# Else, descend is InstanceNotIterable.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
# ClassNotIterable
descend = do_descend[t] = False
else:
# ClassIterable
descend = do_descend[t] = True
else:
#Ok, cache hit.
if callable(descend):
descend, iterable = descend(iterable)
# XXX I didn't modify the dict item here, did I?

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_dictcond(elem, do_descend, itercond):
yield subelem

--
Francis Avila

Jul 18 '05 #22

Peter Otten

Francis Avila wrote:

eliminate the final argument to flatten_itercond. Although, what would be
the key for cache dict, such that it never gets overwritten?
Using your identifier names, instead of try ... except KeyError:

descend = do_descend.get(iterable.__class__, defaultFunction)

It's not there, so it won't get overwritten. You can think of the except
clause as a hard-coded defaultFunction(), so this approach would spare you
the exception that I expected to be rare in my original flatten_dict()
design.
I think your flatten() is getting more complex fast, and you're reaching the
point where you should either use a version designed for the special
purpose or pay the performance price and go with a simple test function
like my keepstrings() somewhere above.
I really need to stop thinking about this silly flatten() function, but
here's a quick hack which passes the hastily-modified doctests:

Me too :-)

Peter

PS: The Martelli virus spreading fast: have you timed it recently?

Jul 18 '05 #23

Francis Avila

"Peter Otten" <__*******@web.de> wrote in message
news:bn*************@news.t-online.com...

I think your flatten() is getting more complex fast, and you're reaching the point where you should either use a version designed for the special
purpose or pay the performance price and go with a simple test function
like my keepstrings() somewhere above.
Actually, with your suggestion, it's gotten much simpler. See below.
PS: The Martelli virus spreading fast: have you timed it recently?

I don't have time now, but I'll collect all the functions and time them all.
There are two groups though: those that can modify as they go (i.e. contain
custom type handlers), and those that don't, so I need two sets of test
cases, and to ensure that they all can at least pass the static tests. And
there are a lot of functions. But now I'm really curious to see their times
and the power/time tradeoff.

Anyway, final revision. Much nicer (I took out the doctests cases for the
post. They're the same anyway, with one minor addition which I also added to
flatten_dictcond.)

--- Code ---

def defaulthandler(seq):
"""Return iterability of seq and new sequence.

Iterability is either a bool (which asserts class-instance
iterability), or a function which accepts a sequence and returns
(instance-iterability, newseq). The function asserts
instance-iterability.

Iterability as returned from default handler is *always* cached
into do_descend, so for every call to this function, seq will have
a unique type.

I can't imagine that your own handler would *ever* need to modify
seq before returning, unless you're doing some serious voodoo. You
probably just need to add a type:function pair to do_descend.

"""
# It is impossible to avoid an exception test, as far as I can
# tell. Oh well.
try:
iter(seq)
except TypeError:
return False, seq
else:
return True, seq
def flatten_dictdef(iterable, do_descend=None,
defaulthandler=defaulthandler):
"""Yield leaves of iterable.

Pretty much same semantics as flatten_dictcond.

Function items in do_descend determine if iterable is instance-iterable.
They return (iterability, newsequence), the iterability of the
*NEW*SEQUENCE* (like normal).

Boolean items in do_descend determine if iterable is
class-iterable.

Finally, defaulthandler determines any-iterability.
defaulthandler returns (iterability, newsequence), the
iterability of the *ORIGINAL*SEQUENCE*. (VERY important!)

"""
## Preload cache with atomic strings per default.
if do_descend is None:
do_descend = {type(''):False, type(u''):False}

descend = do_descend.get(iterable.__class__, defaulthandler)

# Of course, we end up *testing* in the reverse order of what the
# docstring outlines.

if descend is defaulthandler:
# Oy, I was going to write this:

# do_descend[iterable.__class__], iterable = defaulthandler(descend)

# But I don't remember the order of evaluation, and I'm too
# tired to look it up...

isit, s = descend(iterable)
descend = do_descend[iterable.__class__] = isit
iterable = s

elif callable(descend):
descend, iterable = descend(iterable)
#Else, bool constant.

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_dictdef(elem, do_descend,
defaulthandler):
yield subelem
# Ah, nice! This one I'm keeping.
--
Francis Avila

Jul 18 '05 #24

itertools.flatten()? and copying generators/iterators.

Similar topics