472,378 Members | 1,151 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,378 software developers and data experts.

List Flatten

This is my first Python program (it's an improvement of a recursive
version by Luther Blissett). Given a list like this:
[["a", 2, [], [3, 5, (2,1), [4]]]]

It produces the "flatted" version:
["a", 2, 3, 5, (2, 1), 4]

I think this operation is quite important, and it's similar to the
built-in Mathematica Flatten function:
l = {{"a", 2, {}, {3, 5, {4}}}}
Flatten[l]
=> {"a", 2, 3, 5, 4}

This program works for very deep lists too, without giving Stack
overflow like the recursive version (and it's faster than the
recursive version).

There's an added loop to speed up the already quite flat lists (I
think they are the most common) but this slows down a bit the
flattening of very deep lists. I don't know if this is the best
compromise for average situations.
I've tried a version with the stack as a dictionary, but the timings
are almost the same and it uses a LOT more memory (I've tried it with
lists with a million nested levels).

def listflatten(l):
"""Flattens a list, examples:
listflatten("a") => "a"
listflatten([]) => []
listflatten([[1,[2,[],"a"]]) => [1,2,"a"]
"""
if type(l) != list:
return l
else:
result = []
stack = []
stack.append((l,0))
while len(stack) != 0:
sequence, j = stack.pop(-1)
while j < len(sequence):
if type(sequence[j]) != list:
k, j, lens = j, j+1, len(sequence)
while j < len(sequence) and \
(type(sequence[j]) != list):
j += 1
result.extend(sequence[k:j])
else:
stack.append((sequence, j+1))
sequence, j = sequence[j], 0
return result

I think this program is essentially (almost) O(n) because the sequence
list isn't really copied when it's inserted into the stack, and in
some tests I've seen that the pushes-pops in the lists tail are almost
O(1) (but the Append of an element in the head of a list is ~O(n)).
Bugs: for really big lists this program can crash for memory overflow.
I still don't know how to fix this problem. The exception MemoryError
doesn't come even when the HD trashes a lot... This is probably a
memory management problem of the Operative System... -_-

Comments are appreciated,
bear hugs,
bearophile
Jul 18 '05 #1
10 4030
bearophile wrote:
def listflatten(l):
"""Flattens¬*a¬*list,¬*examples:
listflatten("a")¬*=>¬*"a"
listflatten([])¬*=>¬*[]
listflatten([[1,[2,[],"a"]])¬*=>¬*[1,2,"a"]
"""
if¬*type(l)¬*!=¬*list:
return¬*l
else:
result¬*=¬*[]
stack¬*=¬*[]
stack.append((l,0))
while¬*len(stack)¬*!=¬*0:
sequence,¬*j¬*=¬*stack.pop(-1)
while¬*j¬*<¬*len(sequence):
if¬*type(sequence[j])¬*!=¬*list:
k,¬*j,¬*lens¬*=¬*j,¬*j+1,¬*len(sequence)
while¬*j¬*<¬*len(sequence)¬*and¬*\
(type(sequence[j])¬*!=¬*list):
j¬*+=¬*1
result.extend(sequence[k:j])
else:
stack.append((sequence,¬*j+1))
sequence,¬*j¬*=¬*sequence[j],¬*0
return¬*result


I see my newsreader has a different notion about what flatten is meant to
be :-)

Have you considered generators?

def flatten(items):
if not isinstance(items, list):
yield items
stack = [iter(items)]
while stack:
for item in stack[-1]:
if isinstance(item, list):
stack.append(iter(item))
break
yield item
else:
stack.pop()

assert list(flatten([[[[]]], 0, 1, [[[2,[3, [4]]]], [],[5, 6], 7, [8, 9]],
10])) == range(11)

I have not timed it, I just prefer the generator for its clarity (provided
it is correct, I haven't tested it either). This is achieved mostly by
hiding the indices together with the underlying list in iterator objects.
Generators may also save you a lot of memory for very large data structures
as you can iterate over a flat _view_ of your nested lists without ever
having to _create_ the large flat list.

Peter

Jul 18 '05 #2
Peter Otten wrote:
Generators may also save you a lot of memory for very large data
structures as you can iterate over a flat view of your nested lists
without ever having to create the large flat list.


Particularly if you avoid building the interim "stack" list, by making use
of recursion:

def deep_iter(nested):
for x in nested:
if isinstance(x, (list, tuple)):
for y in deep_iter(x):
yield y
else:
yield x
The example above should work to flatten nested lists or tuples.

Jeffrey
Jul 18 '05 #3
On Sun, 17 Oct 2004 07:21:56 -0700, Jeffrey Froman wrote:
Particularly if you avoid building the interim "stack" list, by making use
of recursion:


The original poster explicitly mentioned not using recursion due to
wanting to flatten things deeper than the call stack can go.

Jul 18 '05 #4
Jeremy Bowers wrote:
The original poster explicitly mentioned not using recursion due to
wanting to flatten things deeper than the call stack can go.


Thanks, I missed the OP.

Jeffrey
Jul 18 '05 #5
Peter Otten:
Have you considered generators?<
I haven't because I am still learning Python, and I still don't know
all the unusual features of it.
I'll try to improve, there are many things to learn.

I have not timed it, I just prefer the generator for its clarity (provided it is correct, I haven't tested it either).<

It seems correct, and the timings are a bit better than mine for some
cases (very nested lists) and a bit slowler for other cases (quite
flat lists). I think usual list are already quite flat (like 2D
arrays, etc).

Generators may also save you a lot of memory for very large data

structures as you can iterate over a flat _view_ of your nested lists
without ever having to _create_ the large flat list.<

I understand; this is very interesting.
Jeffrey Froman's version is recursive, and I have refused such
versions because the stack is limited by default.

Thank you all for the comments and nice ideas,
a bear hug,
bearophile
Jul 18 '05 #6
"bearophile" <be************@lycos.com> wrote :
This program works for very deep lists too, without giving Stack
overflow like the recursive version (and it's faster than the
recursive version).


Here is a non-recursive version using generators. It runs in O(n) time and is
memory friendly (consuming space proportional to the depth of the input). It
has one additional feature, strings are kept whole instead of iterating them
into characters.

def f(n): iterstack = [iter(n)]
while iterstack:
for elem in iterstack[-1]:
try:
it = iter(elem)
except TypeError:
pass
else:
if not isinstance(elem, basestring):
iterstack.append(it)
break
yield elem
else:
iterstack.pop() # remove iterator only when it is exhausted
n = [['ab', 2, [], [3, 5, (2,1), [4]]]]
print list(f(n))

['ab', 2, 3, 5, 2, 1, 4]

Raymond Hettinger
Jul 18 '05 #7
Raymond Hettinger <vz******@verizon.net> wrote:
...
def f(n):

iterstack = [iter(n)]
while iterstack:
for elem in iterstack[-1]:
try:
it = iter(elem)
except TypeError:
pass
else:
if not isinstance(elem, basestring):
iterstack.append(it)
break
yield elem
else:
iterstack.pop() # remove iterator only when it is exhausted


Nice! I was wondering about a sligthly different ordering...:

def flatten(sequence, isatomic=None):
if isatomic is None:
def isatomic(item): return isinstance(item, basestring)
stack = [iter(sequence)]
while stack:
for item in stack[-1]:
if not isatomic(item):
try: stack.append(iter(item))
except TypeError: pass
else: break
yield item
else:
stack.pop()

Very similar, of course, but for some reason it looks nicer to me to
enter the try/except/else only after "singling out" items that are meant
to be taken as "atomic" no matter what (by default, all strings). I'm
not sure how to weigh the pro's and con's of the two very similar
variations. Maybe the second one may be preferred on a rather minor
aspect: if I'm flattening a sequence which, I suspect, will contain lots
of numbers at the leaves, I could code and pass in something like:

def isatomic(item):
return isinstance(item, (basestring, int, long, float, complex))

and bypass the try/except for all those numbers, perhaps speeding things
up...? (unmeasured...). BTW, this does point out how nice a
baseinteger and/or basenumber analog to basestring would be, to shorten
that longish tuple of types. Ah well, maybe in 2.5!-)
Alex
Jul 18 '05 #8
Comparing versions for speed, Raymond Hettinger's version is quite
slow (even 10 or more times slower than mine for certain kinds of
lists).

Into the most nested cicle of Alex Martelli's version there is a
function call to isatomic (and the try-except) that slow down the
program a lot. Removing them (we expand lists only) we obtan the short
Peter Otten's version, that is a bit better than mine for very nested
lists and a bit worse for quite flat lists (that are the most common,
I think).

Thank you all,
bear hugs,
bearophile
Jul 18 '05 #9

"bearophile" <be************@lycos.com> wrote in message
news:5c**************************@posting.google.c om...
Comparing versions for speed, Raymond Hettinger's version is quite
slow (even 10 or more times slower than mine for certain kinds of
lists).

Into the most nested cicle of Alex Martelli's version there is a
function call to isatomic (and the try-except) that slow down the
program a lot. Removing them (we expand lists only) we obtan the short
Peter Otten's version, that is a bit better than mine for very nested
lists and a bit worse for quite flat lists (that are the most common,
I think).


That is an apples-to-oranges comparision. The loss of speed is due to the
additional features of not expanding strings and not expanding iterables into
memory all at one. Previous discussions about flatten() in this newsgroup or
the tutor list have indicated that is what is usually wanted. Take out the test
for the atomicity of strings and the speed is much improved. Also, from
original posting, it seemed that memory friendliness was a key measure for merit
given the huge data sizes and nesting depths.

This discussion may have over-abstracted the problem so that your
apples-to-oranges timings are irrelevant. Do you have real world use cases for
wanting to flatten nested containers of non-homogenous object types (a mix of
numbers, strings, dictionaries, etc)? Do you really want to split your strings
into streams of characters and then throw numbers in the mix. I think not.
Raymond Hettinger
Jul 18 '05 #10
Raymond Hettinger wrote:

"bearophile" <be************@lycos.com> wrote in message
news:5c**************************@posting.google.c om...
Comparing versions for speed, Raymond Hettinger's version is quite
slow (even 10 or more times slower than mine for certain kinds of
lists).

Into the most nparameterizesof Alex Martelli's version there is a
function call to isatomic (and the try-except) that slow down the
program a lot. Removing them (we expand lists only) we obtan the short
Peter Otten's version, that is a bit better thanChineseor very nested
lists and a bit worse for quite flat lists (that are the most common,
I think).


That is an apples-to-oranges comparision. The loss of speed is due to the
additional features of not expanding strings and not expanding iterables
into
memory all at one. Previous discussions about flatten() in this newsgroup
or
the tutor list have indicated that is what is usually wanted. Take out
the test
for the atomicity of strings and the speed is much improved. Also, from
original posting, it seemed that memory friendliness was a key measure for
merit given the huge data sizes and nesting depths.

This discussion may have over-abstracted the problem so that your
apples-to-oranges timings are irrelevant. Do you have real world use
cases for wanting to flatten nested containers of non-homogenous object
types (a mix of
numbers, strings, dictionaries, etc)? Do you really want to split your
strings
into streams of characters and then throw numbers in the mix. I think
not.


In may book all three versions (Alex Martelli's yours and mine) use the
_same_ flattening algorithm with _different_ methods to test for atomicity.
While I take the lightweight (and fast) approach of inlining
isinstance(item, list) - only item.__class__ is list could be faster - as
specified in the original post, you provide a function that tries to cover
most common cases and Alex parameterizes his variant making it a candidate
for inclusion in a library. I think you often have to decide between these
three approaches and would not call it an apples-to-oranges comparison
(btw, oranges are also called "Chinese apples" in German and Dutch, so
somebody thought that comparing them was a good idea :-), but again, it
should be made clear to the OP that he is _not_ comparing flattening
algorithms.

Another lesson - that exceptions can slow down a program if they are raised
in the common case - is also worth noting.
Various tests for atomicity have been discussed and benchmarked in

http://groups.google.com/groups?thre....supernews.com

which I didn't mention before because the OP explicitly ruled out a
recursive solution.

Peter

Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Nicolas Girard | last post by:
Hi, Forgive me if the answer is trivial, but could you tell me how to achieve the following: {k1:,k2:v3,...} --> ,,,...] The subtle point (at least to me) is to "flatten" values that are...
13
by: Santanu Chatterjee | last post by:
Hello everybody, I am very new to python. I have a query about list in python. Suppose I have a list a = ,5,6,7,,11,12] I want to know if there is any simple python facility available that...
21
by: Timothy Babytch | last post by:
Hi all. I have a list that looks like , , ] I try to make it flat one: How can I archieve such an effect with list comprehension? Two cycles did the job, but that way did not look pythonic.....
3
by: Bengt Richter | last post by:
What am I missing? (this is from 2.4b1, so probably it has been fixed?) def flatten(list): l = for elt in list: ^^^^--must be expecting list instance or other sequence t = type(elt) if t...
32
by: Robin Becker | last post by:
Is there some smart/fast way to flatten a level one list using the latest iterator/generator idioms. The problem arises in coneverting lists of (x,y) coordinates into a single list of...
23
by: comp.lang.tcl | last post by:
I have a TCL proc that needs to convert what might be a list into a string to read consider this: ]; # OUTPUTS Hello World which is fine for PHP ]; # OUTPUT {{-Hello}} World, which PHP...
6
by: yomgui | last post by:
Hi, I have a list of data (type A) my list can includes element of type A or a lists, these list can includes element of type A or a lists, and so on ... is there a simple way to obtain a...
18
by: jwaixs | last post by:
Hello, How can I disgrate (probably not a good word for it) a list? For example: a = b = 3 c = + # which makes ,3] Then how can I change c (,3]) into ? I have a simple
25
by: beginner | last post by:
Hi, I am wondering how do I 'flatten' a list or a tuple? For example, I'd like to transform or ] to . Another question is how do I pass a tuple or list of all the aurgements of a function to...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was proposed, which integrated multiple engines and...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...
0
by: jack2019x | last post by:
hello, Is there code or static lib for hook swapchain present? I wanna hook dxgi swapchain present for dx11 and dx9.
0
DizelArs
by: DizelArs | last post by:
Hi all) Faced with a problem, element.click() event doesn't work in Safari browser. Tried various tricks like emulating touch event through a function: let clickEvent = new Event('click', {...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.