can list comprehensions replace map?

David Isaac

Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

Thanks,
Alan Isaac

Jul 27 '05 #1

Subscribe Post Reply

1854

Michael Hoffman

David Isaac wrote:

Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

It ain't broke so I'd stick with what you're doing. Even if map() is
removed as a builtin, it will surely stick around in a module.
--
Michael Hoffman

Jul 27 '05 #2

Michael Hoffman

Michael Hoffman wrote:

David Isaac wrote:
Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

It ain't broke so I'd stick with what you're doing. Even if map() is
removed as a builtin, it will surely stick around in a module.

Addendum: I know this doesn't answer your question, so if you were
asking out of purely academic interest, then someone else will probably
post another answer.
--
Michael Hoffman

Jul 27 '05 #3

Larry Bates

This isn't really a question about list
comprehensions as you are using a "feature"
of map by passing None as the function to be
executed over each list element:

This works when len(x) > len(y):

zip(x,y+(len(x)-len(y))*[None])

This works when len(y) >=0 len(x):

zip(x+(len(x)-len(y))*[None],y)

I would probably wrap into function:

def foo(x,y):
if len(x) > len(y):
return zip(x,y+(len(x)-len(y))*[None])

return zip(x+(len(x)-len(y))*[None],y)

Larry Bates

David Isaac wrote:

Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

Thanks,
Alan Isaac

Jul 27 '05 #4

Paolino

David Isaac wrote:

Newbie question:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)
Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list. Thanks,
Alan Isaac

Jul 27 '05 #5

Andrew Dalke

David Isaac wrote:

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

If you know that len(x)>=len(y) and you want the same behavior as
map() you can use itertools to synthesize a longer iterator

x = [1,2,3,4,5,6]
y = "Hi!"
from itertools import repeat, chain
zip(x, chain(y, repeat(None))) [(1, 'H'), (2, 'i'), (3, '!'), (4, None), (5, None), (6, None)]

This doesn't work if you want the result to be max(len(x), len(y))
in length - the result has length len(x).

As others suggested, if you want to use map, go ahead. It won't
disappear for a long time and even if it does it's easy to
retrofit if needed.

Andrew
da***@dalkescientific.com

Jul 27 '05 #6

Raymond Hettinger

[David Isaac]

I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

[Paolino] Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list.

Heck no! For the core use case of lockstep iteration, it is almost
always a mistake to continue iterating beyond the length of the
shortest input sequence. Even for map(), the use cases are thin. How
many functions do something meaningful when one or more of their inputs
changes type and becomes a stream of Nones. Consider for example,
map(pow, seqa, seqb) -- what good can come of one sequence or the other
suddenly switching to a None mode?

As Andrew pointed out, if you really need that behavior, it can be
provided explicity. See the padNone() recipe in the itertools
documentation for an easy one-liner.

IMO, reliance on map's None fill-in feature should be taken as a code
smell indicating a design flaw (not always, but usually). There is a
reason that feature is missing from map() implementations in some other
languages.

In contrast, the existing behavior of zip() is quite useful. It allows
some of the input sequences to be infinite:

zip(itertools.count(1), open('myfile.txt'))

Raymond

Jul 28 '05 #7

Steven Bethard

David Isaac wrote:

I ran into a need for something like map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x.

I almost never run into this situation, so I'd be interested to know why
you need this. Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
.... max_len = max(len(lst) for lst in lists)
.... return zip(*[it.chain(lst, it.repeat(None, max_len - len(lst)))
.... for lst in lists])
....
py> zipfill(range(4), range(5), range(3))
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, None), (None, 4, None)]

If you prefer, you can replace the call to zip with it.zip and get an
iterator back instead of a list.

STeVe

Jul 28 '05 #8

Paolino

Raymond Hettinger wrote:

[David Isaac]
I have been generally open to the proposal that list comprehensions
should replace 'map', but I ran into a need for something like
map(None,x,y)
when len(x)>len(y). I cannot it seems use 'zip' because I'll lose
info from x. How do I do this as a list comprehension? (Or,
more generally, what is the best way to do this without 'map'?)

[Paolino]
Probably zip should change behaviour,and cover that case or at least
have another like 'tzip' in the __builtins__ .Dunno, I always thought
zip should not cut to the shortest list.

Heck no! For the core use case of lockstep iteration, it is almost
always a mistake to continue iterating beyond the length of the
shortest input sequence. Even for map(), the use cases are thin. How
many functions do something meaningful when one or more of their inputs
changes type and becomes a stream of Nones. Consider for example,
map(pow, seqa, seqb) -- what good can come of one sequence or the other
suddenly switching to a None mode?

As Andrew pointed out, if you really need that behavior, it can be
provided explicity. See the padNone() recipe in the itertools
documentation for an easy one-liner.

IMO, reliance on map's None fill-in feature should be taken as a code
smell indicating a design flaw (not always, but usually). There is a
reason that feature is missing from map() implementations in some other
languages.

In contrast, the existing behavior of zip() is quite useful. It allows
some of the input sequences to be infinite:

zip(itertools.count(1), open('myfile.txt'))

Right point.
Well, for my little experiences use cases in which the lists have different
lengths are rare, but in those cases I don't see the reason of not being
able
to zip to the longest one.What is really strange is that I have to use
map(None,....) for that,instead of another zip-like function which ,at
least
would be intutitive for the average user.Also map(None,...) looks like a
super-hack
and it's not elegant or readable or logic (IMO)

I think zip comes to substitute the tuple.__new__ untolerant
implementation.A dumb like me wuold expect map(tuple,[1,2,3],[2,3,4]) to
work, so pretending map(None,....) would do it is like saying that None
and tuple are near concepts, which is obviously an absurdity.

Thanks anyway, for explanations.

Paolino

Raymond

Jul 28 '05 #9

Andrew Dalke

Steven Bethard wrote:

Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

for x in zipfill("This", "is", "only", "a", "test."):
print x

This generates

['T', 'i', 'o', 'a', 't']
['h', 's', 'n', None, 'e']
['i', None, 'l', None, 's']
['s', None, 'y', None, 't']
[None, None, None, None, '.']

This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]
(I originally had a "element == tuple([sentinel]*len(seqs))" check
but didn't like all the == tests incurred.)

Andrew
da***@dalkescientific.com

Jul 28 '05 #10

Andrew Dalke

Me:

Here's a clever, though not (in my opinion) elegant solution ... This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

Here's the direct approach without using itertools. Each list is
iterated over only once. No test against a sequence element is ever
made (either as == or 'is') and the end of the sequence exception
is raised only once per input iterator.

The use of a list for the flag is a bit of a hack. If the list has
1 element then its true, no elements then its false. By doing it this
way I don't need one extra array and one extra indexing/enumeration.

def zipfill(*seqs):
count = len(seqs)
seq_info = [(iter(seq), [1]) for seq in seqs]
while 1:
fields = []
for seq, has_data in seq_info:
if has_data:
try:
fields.append(seq.next())
except StopIteration:
fields.append(None)
del has_data[:]
count -= 1
else:
fields.append(None)
if count:
yield fields
else:
break
Hmm, it should probably yield tuple(fields)

Andrew
da***@dalkescientific.com

Jul 28 '05 #11

Raymond Hettinger

[Paolino]

Well, for my little experiences use cases in which the lists have different
lengths are rare, but in those cases I don't see the reason of not being
able
to zip to the longest one.What is really strange is that I have to use
map(None,....) for that,instead of another zip-like function which ,at
least
would be intutitive for the average user.Also map(None,...) looks like a
super-hack
and it's not elegant or readable or logic (IMO)

I think zip comes to substitute the tuple.__new__ untolerant
implementation.A dumb like me wuold expect map(tuple,[1,2,3],[2,3,4]) to
work, so pretending map(None,....) would do it is like saying that None
and tuple are near concepts, which is obviously an absurdity.

Yes, map(None, ...) lacks grace and it would be nice if it had never
been done. The more recently implemented zip() does away with these
issues. The original was kept for backwards compatibility. That's
evolution.

My sense for the rest is that your difficulties arise from fighting the
language rather than using it as designed. Most language features are
the result of much deliberation. When design X was chosen over
alternative Y, it is a pretty good cue that X is a more harmonious way
to do things.

Some other languages chose to implement both X and Y. On the plus
side, your intuition likely matches one of the two. On the minus side,
someone else's intuition may not match your own. Also, it leads to
language bloat. More importantly, such a language provides few cues as
to how to select components that work together harmoniously.
Unfortunately, that makes it effortless to mire yourself in deep goo.

My advice is to use the language instead of fighting it. Guido has
marked the trail; don't ignore the signs unless you really know where
you're going.

Raymond
"... and soon you'll feel right as rain." -- from The Matrix

Jul 28 '05 #12

Christopher Subich

Andrew Dalke wrote:

Steven Bethard wrote:
Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

I like this solution best (note, it doesn't actually use itertools). My
naive solution:
def lzip(*args):
ilist = [iter(a) for a in args]
while 1:
res = []
count = 0
for i in ilist:
try:
g = i.next()
count += 1
except StopIteration: # End of iter
g = None
res.append(g)
if count > 0: # At least one iter wasn't finished
yield tuple(res)
else: # All finished
raise StopIteration

Jul 29 '05 #13

Andrew Dalke

Christopher Subich wrote:

My naive solution: ... for i in ilist:
try:
g = i.next()
count += 1
except StopIteration: # End of iter
g = None

...

What I didn't like about this was the extra overhead of all
the StopIteration exceptions. Eg,

zipfill("a", range(1000))

will raise 1000 exceptions (999 for "a" and 1 for the end of the range).

But without doing timing tests I'm not sure which approach is
fastest, and it may depend on the data set.

Since this is code best not widely used, I don't think it's something
anyone should look into either. :)

Andrew
da***@dalkescientific.com

Jul 29 '05 #14

Peter Otten

Andrew Dalke wrote:

Steven Bethard wrote:
Here's one possible solution:

py> import itertools as it
py> def zipfill(*lists):
... max_len = max(len(lst) for lst in lists)
A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

This seems a bit more elegant, though the "replace" dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]

Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Whether we ran out of active sequences is only tested once per sequence.

Fiddling with itertools is always fun, but feels a bit like reinventing the
wheel in this case. The only excuse being that you might need a lazy
map(None, ...) someday...

Peter

Jul 29 '05 #15

Andrew Dalke

Peter Otten wrote:

Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work. :)

Could make it one line shorter with

from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Go too far on that path and the code starts looking likg

from itertools import chain, izip, repeat
forever, table = repeat(None), {0: []}.get
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
return table(done[0], forever)
return izip(*[chain(seq, done_iter()) for seq in seqs])

Now add the performance tweak....

def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it. :)

Andrew
da***@dalkescientific.com

Jul 29 '05 #16

Scott David Daniels

Peter Otten wrote:

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Can I play too? How about:
import itertools

def fillzip(*seqs):
def Nones(countactive=[len(seqs)]):
countactive[0] -= 1
while countactive[0]:
yield None
seqs = [itertools.chain(seq, Nones()) for seq in seqs]
return itertools.izip(*seqs)

--Scott David Daniels
Sc***********@Acm.Org

Jul 29 '05 #17

Peter Otten

Andrew Dalke wrote:

Peter Otten wrote:
Combining your "clever" and your "elegant" approach to something fast
(though I'm not entirely confident it's correct):

def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return
while 1:
yield None
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)
Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work. :)

Though I would never have come up with it, were it not for the juxtaposition
of your two variants (I initially disliked the first and tried to improve
on the second), it is an unobvious merger :)
It's a bit fragile, too, as
Could make it one line shorter with from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)
that won't work because done_iter() is now no longer a generator.
In effect you just say

seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
[])]

I tried

class Done(Exception):
pass

pad = repeat(None)
def fillzip(*seqs):
def check(active=[len(seqs)]):
active[0] -= 1
if not active[0]:
raise Done
# just to turn check() into a generator
if 0: yield None
seqs = [chain(seq, check(), pad) for seq in seqs]
try
for item in izip(*seqs):
yield item
except Done:
pass

to be able to use the faster repeat() instead of the while loop, and then
stared at it for a while -- in vain -- to eliminate the for item... loop.
If there were a lazy ichain(iter_of_iters) you could tweak check() to decide
whether a repeat(None) should follow it, but I'd rather not ask Raymond for
that particular addition to the itertools.
Now add the performance tweak....

def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it. :)

Me too. I think. For now...

Peter

Jul 29 '05 #18

Andrew Dalke

Me:

Could make it one line shorter with
from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Peter Otten: that won't work because done_iter() is now no longer a generator.
In effect you just say

seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
[])]

It does work - I tested it. The trick is that izip takes iter()
of the terms passed into it. iter([]) -> an empty iterator and
iter(repeat(None)) -> the repeat(None) itself.

'Course then the name should be changed.

Andrew
da***@dalkescientific.com

Jul 29 '05 #19

Andrew Dalke

Scott David Daniels wrote:

Can I play too? How about:

Sweet!
Andrew
da***@dalkescientific.com

Jul 29 '05 #20

Peter Otten

Andrew Dalke wrote:

Me:
Could make it one line shorter with
from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)
Peter Otten:
that won't work because done_iter() is now no longer a generator.
In effect you just say

seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
[])]

It does work - I tested it. The trick is that izip takes iter()
of the terms passed into it. iter([]) -> an empty iterator and
iter(repeat(None)) -> the repeat(None) itself.

Seems my description didn't convince you. So here's an example:

from itertools import chain, izip, repeat
def fillzip(*seqs): .... def done_iter(done=[len(seqs)]):
.... done[0] -= 1
.... if not done[0]:
.... return []
.... return repeat(None)
.... seqs = [chain(seq, done_iter()) for seq in seqs]
.... return izip(*seqs)
.... list(fillzip(range(6), range(3))) [(0, 0), (1, 1), (2, 2)]
versus
map(None, range(6), range(3))

[(0, 0), (1, 1), (2, 2), (3, None), (4, None), (5, None)]

Now where's the typo?
'Course then the name should be changed.

My variable names where ill-chosen to begin with.

Peter

Jul 29 '05 #21

Peter Otten

Scott David Daniels wrote:

Can I play too?
Not unless you buy the expensive but good-looking c.l.py gaming license
which is only available trough me :)
How about:
import itertools

def fillzip(*seqs):
def Nones(countactive=[len(seqs)]):
countactive[0] -= 1
while countactive[0]:
yield None
seqs = [itertools.chain(seq, Nones()) for seq in seqs]
return itertools.izip(*seqs)

You may be introducing a lot of extra tests in the while loop with the
non-constant condition -- which in practice is fairly cheap, though.
I'm willing to take the performance hit for the introduction of sane
variable names alone...

Peter

Jul 29 '05 #22

Andrew Dalke

Peter Otten wrote:

Seems my description didn't convince you. So here's an example:

Got it. In my test case the longest element happened to be the last
one, which is why it didn't catch the problem.

Thanks.

Andrew
da***@dalkescientific.com

Jul 29 '05 #23

can list comprehensions replace map?

Similar topics