Recursive list comprehension

Timothy Babytch

Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

--
Timothy Babytch

Jul 18 '05 #1

Subscribe Post Reply

2347

Peter Otten

Timothy Babytch wrote:

Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

The order of the for expressions is as it would be for nested loops:

items = [['N', 'F'], ['E'], ['D']]
[y for x in items for y in x] ['N', 'F', 'E', 'D']

I would still prefer a for loop because it spares you from iterating over
the sublist items in python:
data = []
for sub in [['N', 'F'], ['E'], ['D']]: .... data.extend(sub)
.... data

['N', 'F', 'E', 'D']

Peter

Jul 18 '05 #2

Timothy Babytch

Peter Otten wrote:

The order of the for expressions is as it would be for nested loops:
items = [['N', 'F'], ['E'], ['D']]
[y for x in items for y in x]
I would still prefer a for loop because it spares you from iterating over
the sublist items in python:

data = []
for sub in [['N', 'F'], ['E'], ['D']]:

... data.extend(sub)
...

Thanks. Both tips were helpful.

--
Timothy Babytch

Jul 18 '05 #3

Peter Nuttall

On Monday 06 Dec 2004 09:26, Timothy Babytch wrote:

Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

--
Timothy Babytch

Hi,

I think you do it with a generator like this:

def flatten(nested):
for sublist in nested:
for element in sublist:
yield element

n=[['N', 'F'], ['E'], ['D']]
output=[]

for value in flatten(n):
output.append(value)

print output

Have a merry Christmas

Peter Nuttall

Jul 18 '05 #4

Nick Coghlan

Peter Nuttall wrote:

I think you do it with a generator like this:

def flatten(nested):
for sublist in nested:
for element in sublist:
yield element

n=[['N', 'F'], ['E'], ['D']]
output=[]

for value in flatten(n):
output.append(value)

print output

I highly recommend learning about the stdlib module "itertools" (I only really
looked into it recently). The above can be done in 3 lines (counting imports):

from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]

Documentation:
http://www.python.org/doc/2.3.4/lib/...functions.html

Cheers,
Nick.

Jul 18 '05 #5

Peter Otten

Nick Coghlan wrote:

from*itertools*import*chain
n*=*[['N',*'F'],*['E'],*['D']]
print*[chain(*n)]

However, [generator] is not the same as list(generator):

from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)] [<itertools.chain object at 0x402ac60c>] print list(chain(*n)) ['N', 'F', 'E', 'D']

And with the star operator you are foregoing some laziness, usually an
important selling point for the iterator approach. Therefore:
n = [['N', 'F'], ['E'], ['D']]
lazyItems = (x for y in n for x in y)
lazyItems.next() 'N' list(lazyItems) ['F', 'E', 'D']

Of course this makes most sense when you want to keep the original n anyway
_and_ can be sure it will not be mutated while you are still drawing items
from the lazyItems generator.

Peter

Jul 18 '05 #6

Nick Coghlan

Peter Otten wrote:

Nick Coghlan wrote:

from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]

However, [generator] is not the same as list(generator):

Heh - good point. As you say, replacing with list() gives the intended answer.

With regards to laziness, my main point was that itertools is handy for
manipulating sequences, even if you aren't exploiting its capacity for lazy
evaluation.

Cheers,
Nick.

--
Nick Coghlan | nc******@email.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.skystorm.net

Jul 18 '05 #7

Serhiy Storchaka

Timothy Babytch wrote:

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

sum(c_vars, [])

--
Serhiy Storchaka

Jul 18 '05 #8

Timothy Babytch

Serhiy Storchaka wrote:

sum([['N', 'F'], ['E'], ['D']], [])

['N', 'F', 'E', 'D']

THE BEST!

--
Timothy Babytch, PHP-dev Teamleader

Jul 18 '05 #9

Peter Hansen

Timothy Babytch wrote:

Serhiy Storchaka wrote:
>>>sum([['N', 'F'], ['E'], ['D']], [])

['N', 'F', 'E', 'D']

THE BEST!

Hmmm. Maybe, unless readability as in "self-documenting code"
is important to you...

Preceding the above with "flatten = sum" would perhaps be
an adequate improvement.

-Peter

Jul 18 '05 #10

Adam DePrince

On Mon, 2004-12-06 at 10:01, Timothy Babytch wrote:

Serhiy Storchaka wrote:
>>>sum([['N', 'F'], ['E'], ['D']], [])

['N', 'F', 'E', 'D']

THE BEST!

--
Timothy Babytch, PHP-dev Teamleader

Sum certainly takes the cake for hackish elegance, and would be my
choice if I was absolutely certain that my data structure was exactly a
list of lists. A more general solution with applicability to arbitrary
levels of nesting is below.

a = [['N','F'],['E'],['D']]
b = [[['A','B',['C','D']],'N','F'],'E', ['F'] ]
c = iter([[iter(['A','B',('C','D')]),'N','F'],'E', iter(['F']) ])

def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

if __name__ == "__main__":
print list( flatten( a ) )
print list( flatten( b ) )
print list( flatten( c ) )

Which when run gives you ...

['N', 'F', 'E', 'D']
['A', 'B', 'C', 'D', 'N', 'F', 'E', 'F']
['A', 'B', 'C', 'D', 'N', 'F', 'E', 'F']
Adam DePrince

Jul 18 '05 #11

Steven Bethard

Adam DePrince wrote:

def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:

class C: .... def __getitem__(self, index):
.... if index > 3:
.... raise IndexError(index)
.... return index
.... list(iter(C())) [0, 1, 2, 3]

I would write your code as something like:
def flatten(i): .... try:
.... if isinstance(i, basestring):
.... raise TypeError('strings are atomic')
.... iterable = iter(i)
.... except TypeError:
.... yield i
.... else:
.... for item in iterable:
.... for sub_item in flatten(item):
.... yield sub_item
.... list(flatten([['N','F'],['E'],['D']])) ['N', 'F', 'E', 'D'] list(flatten([C(), 'A', 'B', ['C', C()]]))

[0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Steve

Jul 18 '05 #12

Peter Otten

Adam DePrince wrote:

def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

While trying to break your code with a len() > 1 string I noted that strings
don't feature an __iter__ attribute. Therefore obj.__iter__() is not
equivalent to iter(obj) for strings. Do you (plural) know whether this is a
CPython implementation accident or can be relied upon?

Peter

Jul 18 '05 #13

Nick Craig-Wood

Adam DePrince <ad**@cognitcorp.com> wrote:

def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

Hmm, there is more to that than meets the eye! I was expecting

print list(flatten("hello"))

to print

['h', 'e', 'l', 'l', 'o']

But it didn't, it printed

['hello']

With a little more investigation I see that str has no __iter__
method. However you can call iter() on a str

for c in iter("hello"): print c ....
h
e
l
l
o

Or even
for c in "hello": print c

....
h
e
l
l
o

....and this works because str supports __getitem__ according to the
docs.

So there is some magic going on here! Is str defined to never have an
__iter__ method? I see no reason why that it couldn't one day have an
__iter__ method though.

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick

Jul 18 '05 #14

Steven Bethard

Peter Otten wrote:

I noted that strings
don't feature an __iter__ attribute. Therefore obj.__iter__() is not
equivalent to iter(obj) for strings. Do you (plural) know whether this is a
CPython implementation accident or can be relied upon?
Nick Craig-Wood wrote: With a little more investigation I see that str has no __iter__
method. However you can call iter() on a str [snip] ...and this works because str supports __getitem__ according to the
docs.

So there is some magic going on here! Is str defined to never have an
__iter__ method? I see no reason why that it couldn't one day have an
__iter__ method though.

The magic is the old-style iteration protocol (also called the 'sequence
protocol') which calls __getitem__ starting at 0 until an IndexError is
raised. From the docs:

http://www.python.org/doc/lib/built-in-funcs.html

iter( o[, sentinel])
Return an iterator object. The first argument is interpreted very
differently depending on the presence of the second argument. Without a
second argument, o must be a collection object which supports the
iteration protocol (the __iter__() method), or it must support the
sequence protocol (the __getitem__() method with integer arguments
starting at 0). If it does not support either of those protocols,
TypeError is raised...

I looked around to see if there was any talk specifically of str and
__iter__/__getitem__ and couldn't find any. (Though I wouldn't claim
that this means it's not out there.) ;) My guess is that there isn't
any guarantee that str objects might not one day grow an __iter__
method, so I wouldn't rely on it.

See my other post that uses iter() and TypeError instead of .__iter__()
and AttributeError -- it's relatively simple to avoid relying on
..__iter__, and doing so also allows you to support other objects that
support the sequence protocol but have no __iter__ method.

Steve

Jul 18 '05 #15

Adam DePrince

On Wed, 2004-12-08 at 15:02, Steven Bethard wrote:

Adam DePrince wrote:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:
>>> class C: ... def __getitem__(self, index):
... if index > 3:
... raise IndexError(index)
... return index
... >>> list(iter(C())) [0, 1, 2, 3]

I would write your code as something like:
>>> def flatten(i): ... try:
... if isinstance(i, basestring):
... raise TypeError('strings are atomic')
... iterable = iter(i)
... except TypeError:
... yield i
... else:
... for item in iterable:
... for sub_item in flatten(item):
... yield sub_item
... >>> list(flatten([['N','F'],['E'],['D']])) ['N', 'F', 'E', 'D'] >>> list(flatten([C(), 'A', 'B', ['C', C()]]))

[0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring. This change deals with that
case, albeit at the expense of some performance (it adds a small
O(depth) factor)).

def flatten(i, history=[]):
try:
if reduce( lambda x,y:x or y, map( lambda x:i is x, history ),\
False ):
raise TypeError('Dej' )
iterable = iter(i)
except TypeError:
yiel>>> list(flatten([C(), 'A', 'B', ['C', C()]])) [0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.
Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring. This change deals with that
case, albeit at the expense of some performance (it adds a small
O(depth) factor)).

def flatten(i, history=[]):
try:
if isinstance( i,basestring) or reduce( lambda x,y:x or y, map(
lambda x:i is x, history ),\ False ):
raise TypeError('strings are atomic' )
iterable = iter(i)
except TypeError:
yield i
else:
history = history + [i]
for item in iterable:
for sub_item in flatten(item, history ):
yield sub_item

if __name__ == "__main__":
print list( flatten( a ) )
print list( flatten( b ) )
print list( flatten( c ) )

Steve

Adam DePrince

Jul 18 '05 #16

Steven Bethard

Adam DePrince wrote:

On Wed, 2004-12-08 at 15:02, Steven Bethard wrote:
Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring.

Yup. Unfortunately, there's no "string protocol" like there's an
"iterator protocol" or we could check this kind of thing easier. Seems
like your history check might be the best option if you need to support
this kind of thing.

Steve

Jul 18 '05 #17

Terry Reedy

"Steven Bethard" <st************@gmail.com> wrote in message
news:o5Jtd.224974$R05.24953@attbi_s53...

Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:

No, having an __iter__ method that returns an iterator is an essential half
of the current iterator protocol just so that iter(iterator) (==
iterator.__iter__()) always works. This requirement conveniently makes
'iterator' a subcategory of 'iterable'. (I am ignoing the old and obsolete
getnext protocol, as does the itertools library module.)

Terry J. Reedy

Jul 18 '05 #18

Steven Bethard

Terry Reedy wrote:

"Steven Bethard" <st************@gmail.com> wrote in message
news:o5Jtd.224974$R05.24953@attbi_s53...
Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:

No, having an __iter__ method that returns an iterator is an essential half
of the current iterator protocol just so that iter(iterator) (==
iterator.__iter__()) always works. This requirement conveniently makes
'iterator' a subcategory of 'iterable'.

Yeah, you're right, I probably should have referred to it as the
'iterable protocol' instead of the 'iterator protocol'.
(I am ignoing the old and obsolete
getnext protocol, as does the itertools library module.)

What is the getnext protocol? Is that the same thing that the iter()
docs call the sequence protocol? Because this definitely still works
with itertools:

class C: .... def __getitem__(self, index):
.... if index > 3:
.... raise IndexError(index)
.... return index
.... import itertools
list(itertools.imap(str, C()))

['0', '1', '2', '3']

Steve

Jul 18 '05 #19

Terry Reedy

"Steven Bethard" <st************@gmail.com> wrote in message
news:LoNtd.465310$wV.295778@attbi_s54...

What is the getnext protocol? Is that the same thing that the iter()
docs call the sequence protocol?
Yes. (I meant to write getitem rather than getnext.)
Because this definitely still works with itertools:

Yes, not because itertools are cognizant of sequence objects but because
itertools apply iter() to inputs and because iter() currently accomodates
sequence-protocol objects as well as iterable-protocol objects by wrapping
the former with builtin <iterator> objects. I expect that may change if
and when the builtin C-coded types are updated to have __init__ methods.
This is a ways off, if ever, but I think the general advice for user code
is to use the newer protocol. So, for the purpose of writing new code, I
think it justified to forget about or at least ignore the older iteration
protocol.

Terry J. Reedy

Jul 18 '05 #20

Steven Bethard

Terry Reedy wrote:

This is a ways off, if ever, but I think the general advice for user code
is to use the newer protocol.
Yes, definitely. I hope no one misconstrued me to be suggesting that
you should use the 'sequence protocol' for iterators (e.g. using
__getitem__ and raising an IndexError). This is deprecated and is only
supported for backwards compatiblity. All new code should define
__iter__ instead.
So, for the purpose of writing new code, I
think it justified to forget about or at least ignore the older iteration
protocol.

This is probably valid as long as you don't need your code to work with
any objects defined before the __iter__ protocol.

Steve

Jul 18 '05 #21

Nick Craig-Wood

Peter Otten <__*******@web.de> wrote:

Adam DePrince wrote:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

While trying to break your code with a len() > 1 string I noted that strings
don't feature an __iter__ attribute. Therefore obj.__iter__() is not
equivalent to iter(obj) for strings. Do you (plural) know whether this is a
CPython implementation accident or can be relied upon?

I'd like to know this too!

You can write the above as the shorter (and safer IMHO - it doesn't
catch any exceptions it shouldn't)

def flatten( i ):
if hasattr(i, "__iter__"):
for j in i:
for k in flatten(j):
yield k
else:
yield i
--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick

Jul 18 '05 #22

Recursive list comprehension

Similar topics