Real-world use cases for map's None fill-in feature?

Raymond Hettinger

Proposal
--------
I am gathering data to evaluate a request for an alternate version of
itertools.izip() with a None fill-in feature like that for the built-in
map() function:

map(None, 'abc', '12345') # demonstrate map's None fill-in feature

[('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]

The motivation is to provide a means for looping over all data elements
when the input lengths are unequal. The question of the day is whether
that is both a common need and a good approach to real-world problems.
The answer can likely be found in results from other programming
languages and from surveying real-world Python code.

Other languages
---------------
I scanned the docs for Haskell, SML, and Perl6's yen operator and found
that the norm for map() and zip() is to truncate to the shortest input
or raise an exception for unequal input lengths. Ruby takes the
opposite approach and fills-in nil values -- the reasoning behind the
design choice is somewhat inscrutable:
http://blade.nagaokaut.ac.jp/cgi-bin...ruby-dev/18651

Real-world code
---------------
I scanned the standard library, my own code, and a few third-party
tools. I
found no instances where map's fill-in feature was used.

History of zip()
----------------
PEP 201 (lock-step iteration) documents that a fill-in feature was
contemplated and rejected for the zip() built-in introduced in Py2.0.
In the years before and after, SourceForge logs show no requests for a
fill-in feature.

Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature
improve the code). Likewise, I'm curious as to whether anyone has seen
a zip-style fill-in feature employed to good effect in some other
programming language.

Parallel to SQL?
----------------
If an iterator element's ordinal position were considered as a record
key, then the proposal equates to a database-style full outer join
operation (one which includes unmatched keys in the result) where record
order is significant. Does an outer-join have anything to do with
lock-step iteration? Is this a fundamental looping construct or just a
theoretical wish-list item? Does Python need itertools.izip_longest()
or would it just become a distracting piece of cruft?

Raymond Hettinger
FWIW, the OP's use case involved printing files in multiple
columns:

for f, g in itertools.izip_longest(file1, file2, fillin_value=''):
print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

The alternative was straightforward but less terse:

while 1:
f = file1.readline()
g = file2.readline()
if not f and not g:
break
print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

Jan 9 '06 #1

Subscribe Reply

3272

Alex Martelli

Raymond Hettinger <py****@rcn.com> wrote:
...

Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature

I had (years ago, version was 1.5.2) one real-world case of map(max,
seq1, seq2). The sequences represented alternate scores for various
features, using None to mean "the score for this feature cannot be
computed by the algorithm used to produce this sequence", and it was
common to have one sequence longer (using a later-developed algorithm
that computed more features). This use may have been an abuse of my
observation that max(None, N) and max(N, None) were always N on the
platform I was using at the time. I was relatively new at Python, and
in retrospect I feel I might have been going for "use all the new toys
we've just gotten" -- looping on feature index to compute the scores,
and explicitly testing for None, might have been a better approach than
building those lists (with seq1=map(scorer1, range(N)), btw) and then
running map on them, anyway. At any rate, I later migrated to a lazily
computed version, don't recall the exact details but it was something
like (in today's Python):

class LazyMergedList(object):
def __init__(self, *fs):
self.fs = *fs
self.known= {}
def __getitem__(self, n):
try: return self.known[n]
except KeyError: pass
result = self.known[n] = max(f(n) for f in fs)
return result

when it turned out that in most cases the downstream code wasn't
actually using all the features (just a small subset in each case), so
computing all of them ahead of time was a waste of cycles.

I don't recall ever relying on map's None-filling feature in other
real-world cases, and, as I mentioned, even here the reliance was rather
doubtful. OTOH, if I had easily been able to specify a different
filler, I _would_ have been able to use it a couple of times.
Alex

Jan 9 '06 #2

Anders Hammarquist

In article <ma**************************************@python.o rg>,
Raymond Hettinger <py****@rcn.com> wrote:

Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples).

I had a quick look through our (Strakt's) codebase and found one example.

The code is used to process user-designed macros, where the user wants
to append data to strings stored in the system. Note that all data is
stored as lists of whatever the relevant data type is.

While I didn't write this bit of code (so I can't say what, if any,
alternatives were considered), it does seem to me the most straight-
forward way to do it. Being able to say what the fill-in value should
be would make the code even simpler.

oldAttrVal is the original stored data, and attValue is what the macro
wants to append.

--->8---
newAttrVal = []
for x, y in map(None, oldAttrVal, attrValue):
newAttrVal.append(u''.join((x or '', y or '')))
--->8---

/Anders

--
-- Of course I'm crazy, but that doesn't mean I'm wrong.
Anders Hammarquist | ik*@cd.chalmers.se
Physics student, Chalmers University of Technology, | Hem: +46 31 88 48 50
G|teborg, Sweden. RADIO: SM6XMM and N2JGL | Mob: +46 707 27 86 87

Jan 9 '06 #3

Raymond Hettinger

[Alex Martelli]

I had (years ago, version was 1.5.2) one real-world case of map(max,
seq1, seq2). The sequences represented alternate scores for various
features, using None to mean "the score for this feature cannot be
computed by the algorithm used to produce this sequence", and it was
common to have one sequence longer (using a later-developed algorithm
that computed more features). This use may have been an abuse of my
observation that max(None, N) and max(N, None) were always N on the
platform I was using at the time.
Analysis
--------

That particular dataset has three unique aspects allowing the map(max,
s1, s2, s3) approach to work at all.

1) Fortuitious alignment in various meanings of None:
- the input sequence using it to mean "feature cannot be computed"
- the auto-fillin of None meaning "feature used in later
algorithms, but not earlier ones"
- the implementation quirk where max(None, n) == max(n, None) == n

2) Use of a reduction function like max() which does not care about the
order of inputs (i.e. the output sequence does not indicate which
algorithm produced the best score).

3) Later-developed sequences had to be created with the knowledge of
the features used by all earlier sequences (lest two of the sequences
get extended with different features corresponding to the same ordinal
position).

Getting around the latter limitation suggests using a mapping
(feature->score) rather than tracking scores by ordinal position (with
position corresponding to a particular feature):

bestscore = {}
for d in d1, d2, d3:
for feature, score in d.iteritems():
bestscore[feature] = max(bestscore.get(feature, 0), score)

Such an approach also gets around dependence on the other two unique
aspects of the dataset. With dict.get() any object can be specified as
a default value (with zero being a better choice for a null input to
max()). Also, the pattern is not limited to commutative reduction
functions like max(); instead, it would work just as well with a
result.setdefault(feature, []).append(score) style accumulation of all
results or with other combining/analysis functions.

So, while map's None fill-in feature happened to apply to this
dataset's unique features, I wonder if its availability steered you
away from a better data-structure with greater flexibility, less
dependence on quirks, and more generality.

Perhaps the lesson is that outer-join operations are best expressed
with dictionaries rather than sequences with unequal lengths.

I was relatively new at Python, and
in retrospect I feel I might have been going for "use all the new toys
we've just gotten"
That suggests that if itertools.zip_longest() doesn't turn out to be
TheRightTool(tm) for many tasks, then it may have ill-effects beyond
just being cruft -- it may steer folks away from better solutions. As
you know, it can take a while for Python newcomers to realize the full
power and generality of dictionary based approaches. I wonder if this
proposed itertool would distract from that realization.

I don't recall ever relying on map's None-filling feature in other
real-world cases, and, as I mentioned, even here the reliance was rather
doubtful. OTOH, if I had easily been able to specify a different
filler, I _would_ have been able to use it a couple of times.

Did you run across any cookbook code that would have been improved by
the proposed itertools.zip_longest() function?

Raymond

Jan 9 '06 #4

rurpy

"Raymond Hettinger" <py****@rcn.com> wrote in message
news:ma**************************************@pyth on.org...

Proposal
--------
I am gathering data to evaluate a request for an alternate version of
itertools.izip() with a None fill-in feature like that for the built-in
map() function:
map(None, 'abc', '12345') # demonstrate map's None fill-in feature
[('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')]

The motivation is to provide a means for looping over all data elements
when the input lengths are unequal. The question of the day is whether
that is both a common need and a good approach to real-world problems.
The answer can likely be found in results from other programming
languages and from surveying real-world Python code.

Other languages
---------------
I scanned the docs for Haskell, SML, and Perl6's yen operator and found
that the norm for map() and zip() is to truncate to the shortest input
or raise an exception for unequal input lengths. Ruby takes the
opposite approach and fills-in nil values -- the reasoning behind the
design choice is somewhat inscrutable:
http://blade.nagaokaut.ac.jp/cgi-bin...ruby-dev/18651

From what I can make out (with help of internet language translation sites) the relevent part
(section [2]) of this presents three options for
handling unequal length arguments:
1. zip to longest (Perl6 does it this way)
2. zip to shortest (Python does it this way)
3. use zip method and choose depending on
whether argument list is shorter or longer
than object's list.
It then solicits opinions on the best way.
It does not state or justify any particular choice.

If "perl6"=="perl6 yen operator" then there
is a contradiction with your earlier statement.
Real-world code
---------------
I scanned the standard library, my own code, and a few third-party
tools. I
found no instances where map's fill-in feature was used.

History of zip()
----------------
PEP 201 (lock-step iteration) documents that a fill-in feature was
contemplated and rejected for the zip() built-in introduced in Py2.0.
In the years before and after, SourceForge logs show no requests for a
fill-in feature.
My perception is that many people view the process
of advocating for a library addition as
1. Very time consuming due to the large amount of
work involved in presenting and defending a proposal.
2. Having a very small chance of acceptance.
I do not know whether this is really the case or even if my
perception is correct, but if it is, it could account for the
lack of feature requests.
Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature
improve the code). Likewise, I'm curious as to whether anyone has seen
a zip-style fill-in feature employed to good effect in some other
programming language.
How well correlated in the use of map()-with-fill with the
(need for) the use of zip/izip-with-fill?
Parallel to SQL?
----------------
If an iterator element's ordinal position were considered as a record
key, then the proposal equates to a database-style full outer join
operation (one which includes unmatched keys in the result) where record
order is significant. Does an outer-join have anything to do with
lock-step iteration? Is this a fundamental looping construct or just a
theoretical wish-list item? Does Python need itertools.izip_longest()
or would it just become a distracting piece of cruft?

Raymond Hettinger

FWIW, the OP's use case involved printing files in multiple
columns:

for f, g in itertools.izip_longest(file1, file2, fillin_value=''):
print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

The alternative was straightforward but less terse:

while 1:
f = file1.readline()
g = file2.readline()
if not f and not g:
break
print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())

Actuall my use case did not have quite so much
perlish line noise :-)
Compared to
for f, g in izip2 (file1, file2, fill=''):
print '%s\t%s' % (f, g)
the above looks like a relatively minor loss
of conciseness, but consider the uses of the
current izip, for example

for i1, i2 in itertools.izip (iterable_1, iterable_2):
print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

can be replaced by:
while 1:
i1 = iterable_1.next()
i2 = iterable_2.next()
print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

yet that was not justification for rejecting izip()'s
inclusion in itertools.

The other use case I had was a simple file diff.
All I cared about was if the files were the same or
not, and if not, what were the first differing lines.
This was to compare output from a process that
was supposed to match some saved reference
data. Because of error propagation, lines beyond
the first difference were meaningless. The code,
using an "iterate to longest with fill" izip would be
roughly:

# Simple file diff to ident
for ln1, ln2 in izip_long (file1, file2, fill="<EOF>"):
if ln1 != ln2:
break
if ln1 == ln2:
print "files are identical"
else:
print "files are different"

This same use case occured again very recently
when writing unit tests to compare output of a parser
with known correct output during refactoring.

With file iterators one can imagine many potential
use cases for izip but not imap, but there are probably
few real uses existant because generaly files may be
of different lengths, and there currently is no useable
izip for this case.

[jan09 08:30 utc]

Jan 9 '06 #5

Duncan Booth

Raymond Hettinger wrote:

My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature
improve the code). Likewise, I'm curious as to whether anyone has seen
a zip-style fill-in feature employed to good effect in some other
programming language.

One example of padding out iterators (although I didn't use map's fill-in
to implement it) is turning a single column of items into a multi-column
table with the items laid out across the rows first. The last row may have
to be padded with some empty cells.

Here's some code I wrote to do that. Never mind for the moment that the use
of zip isn't actually defined here, it could use izip, but notice that the
input iterator has to be converted to a list first so that I can add a
suitable number of empty strings to the end. If there was an option to izip
to pad the last element with a value of choice (such as a blank string) the
code could work with iterators throughout:

def renderGroups(self, group_size=2, allow_add=True):
"""Iterates over the items rendering one item for each group.
Each group contains an iterator for group_size elements.
The last group may be padded out with empty strings.
"""
elements = list(self.renderIterator(allow_add)) + ['']*(group_size-
1)
eliter = iter(elements)
return zip(*[eliter]*group_size)

If there was a padding option to izip this could could have been something
like:

def renderGroups(self, group_size=2, allow_add=True):
"""Iterates over the items rendering one item for each group.
Each group contains an iterator for group_size elements.
The last group may be padded out with empty strings.
"""
iter = self.renderIterator(allow_add)
return itertools.izip(*[iter]*group_size, pad='')

The code is then used to build a table using tal like this:

<tal:loop repeat="row python:slot.renderGroups(group_size=4);">
<tr tal:define="isFirst repeat/row/start"
tal:attributes="class python:test(isFirst, 'slot-top','')">
<td class="slotElement" tal:repeat="cell row"
tal:content="structure cell">4X Slot element</td>
</tr>
</tal:loop>

Jan 9 '06 #6

Raymond Hettinger

[Anders Hammarquist]:

I had a quick look through our (Strakt's) codebase and found one example.
Thanks for the research :-)

The code is used to process user-designed macros, where the user wants
to append data to strings stored in the system. Note that all data is
stored as lists of whatever the relevant data type is.

While I didn't write this bit of code (so I can't say what, if any,
alternatives were considered), it does seem to me the most straight-
forward way to do it. Being able to say what the fill-in value should
be would make the code even simpler.

oldAttrVal is the original stored data, and attValue is what the macro
wants to append.

newAttrVal = []
for x, y in map(None, oldAttrVal, attrValue):
newAttrVal.append(u''.join((x or '', y or '')))

I'm finding this case difficult to analyze and generalize without
knowing the significance of position in the list. It looks like None
fill-in is used because attrValue may be a longer list whenever the
user is specifying new system strings and it may be shorter when some
of there are no new strings and the system strings aren't being updated
at all. Either way, it looks like the ordinal position has some
meaning that is shared by both oldAttrVal and newAttrVal, perhaps a
message number or somesuch. If that is the case, is there some other
table the assigns meanings to the resulting strings according to their
index? What does the code look like that accesses newAttrVal and how
does it know the significance of various positions in the list? This
is important because it could shed some light on how an app finds
itself looping over two lists which share a common meaning for each
index position, yet they are unequal in length.

Raymond

Jan 9 '06 #7

Raymond Hettinger

Duncan Booth wrote:

One example of padding out iterators (although I didn't use map's fill-in
to implement it) is turning a single column of items into a multi-column
table with the items laid out across the rows first. The last row may have
to be padded with some empty cells.

ANALYSIS
--------

This case relies on the side-effects of zip's implementation details --
the trick of windowing or data grouping with code like: zip(it(),
it(), it()). The remaining challenge is handling missing values when
the reshape operation produces a rectangular matrix with more elements
than provided by the iterable input.

The proposed function directly meets the challenge:

it = iter(iterable)
result = izip_longest(*[it]*group_size, pad='')

Alternately, the need can be met with existing tools by pre-padding the
iterator with enough extra values to fill any holes:

it = chain(iterable, repeat('', group_size-1))
result = izip_longest(*[it]*group_size)

Both approaches require a certain meaure of inventiveness, rely on
advacned tricks, and forgo readability to gain the raw speed and
conciseness afforded by a clever use of itertools. They are also a
challenge to review, test, modify, read, or explain to others.

In contrast, a simple generator is trivially easy to create and read,
albiet less concise and not as speedy:

it = iter(iterable)
while 1:
row = tuple(islice(it, group_size))
if len(row) == group_size:
yield row
else:
yield row + ('',) * (group_size - len(row))
break

The generator version is plain, simple, boring, and uninspirational.
But it took only seconds to write and did not require a knowledge of
advanced itertool combinations. It more easily explained than the
versions with zip tricks.
Raymond

Jan 9 '06 #8

Paul Rubin

"Raymond Hettinger" <py****@rcn.com> writes:

The generator version is plain, simple, boring, and uninspirational.
But it took only seconds to write and did not require a knowledge of
advanced itertool combinations. It more easily explained than the
versions with zip tricks.

I had this cute idea of using dropwhile to detect the end of an iterable:

it = chain(iterable, repeat(''))
while True:
row = tuple(islice(it, group_size))
# next line raises StopIteration if row is entirely null-strings
dropwhile(lambda x: x=='', row).next()
yield row

Jan 9 '06 #9

Duncan Booth

Raymond Hettinger wrote:

The generator version is plain, simple, boring, and uninspirational.
But it took only seconds to write and did not require a knowledge of
advanced itertool combinations. It more easily explained than the
versions with zip tricks.

I can't argue with that.

Jan 9 '06 #10

Raymond Hettinger

ru***@yahoo.com wrote:

The other use case I had was a simple file diff.
All I cared about was if the files were the same or
not, and if not, what were the first differing lines.
This was to compare output from a process that
was supposed to match some saved reference
data. Because of error propagation, lines beyond
the first difference were meaningless. . . . This same use case occured again very recently
when writing unit tests to compare output of a parser
with known correct output during refactoring.

Analysis
--------

Both of these cases compare two data streams and report the first
mismatch, if any. Data beyond the first mismatch is discarded.

The example code seeks to avoid managing two separate iterators and the
attendant code for trapping StopIteration and handling end-cases. The
simplification is accomplished by generating a single fill element so
that the end-of-file condition becomes it own element capable of being
compared or reported back as a difference. The EOF element serves as a
sentinel and allows a single line of comparison to handle all cases.
This is a normal and common use for sentinels.

The OP's code appends the sentinel using a proposed variant of zip()
which pads unequal iterables with a specified fill element:

for x, y in izip_longest(file1, file2, fill='<EOF>'):
if x != y:
return 'Mismatch', x, y
return 'Match'

Alternately, the example can be written using existing itertools:

for x, y in izip(chain(file1, ['<EOF>']), chain(file2, ['<EOF>'])):
if x != y:
return 'Mismatch', x, y
return 'Match'

This is a typical use of chain() and not at all tricky. The chain()
function was specifically designed for tacking one or more elements
onto the end of another iterable. It is ideal for appending sentinels.
Raymond

Jan 9 '06 #11

Raymond Hettinger

> Alternately, the need can be met with existing tools by pre-padding the

iterator with enough extra values to fill any holes:

it = chain(iterable, repeat('', group_size-1))
result = izip_longest(*[it]*group_size)

Typo: That should be izip() instead of izip_longest()

Jan 9 '06 #12

rurpy

"Raymond Hettinger" <py****@rcn.com> wrote:

Duncan Booth wrote:
One example of padding out iterators (although I didn't use map's fill-in
to implement it) is turning a single column of items into a multi-column
table with the items laid out across the rows first. The last row may have
to be padded with some empty cells.
ANALYSIS
--------

This case relies on the side-effects of zip's implementation details --
the trick of windowing or data grouping with code like: zip(it(),
it(), it()). The remaining challenge is handling missing values when
the reshape operation produces a rectangular matrix with more elements
than provided by the iterable input.

The proposed function directly meets the challenge:

it = iter(iterable)
result = izip_longest(*[it]*group_size, pad='')

Alternately, the need can be met with existing tools by pre-padding the
iterator with enough extra values to fill any holes:

it = chain(iterable, repeat('', group_size-1))
result = izip_longest(*[it]*group_size)

I assumed you meant izip() here (and saw your followup)
Both approaches require a certain meaure of inventiveness, rely on
advacned tricks, and forgo readability to gain the raw speed and
conciseness afforded by a clever use of itertools. They are also a
challenge to review, test, modify, read, or explain to others.
The inventiveness is in the "(*[it]*group_size, " part. The
rest is straight forward (assuming of course that itertools
has good documentation, and it was read first.)
In contrast, a simple generator is trivially easy to create and read,
albiet less concise and not as speedy:

it = iter(iterable)
while 1:
row = tuple(islice(it, group_size))
if len(row) == group_size:
yield row
else:
yield row + ('',) * (group_size - len(row))
break
Yes with 4 times the amount of code. (Yes, I am
one of those who believes production and maintence
cost is, under many circumstances, roughly correlated
with LOC.

An frankly, I don't find the above any more
comprehensible than: result = izip_longest(*[it]*group_size, pad='') once a little thought is given to the *[it]*group_size,
part. I see much more opaque code everytime
I look at source code in the standard library.
The generator version is plain, simple, boring, and uninspirational.
But it took only seconds to write and did not require a knowledge of
advanced itertool combinations.
"advanced itertool combinations"?? Even I, newbie
that I am, found the concepts of repeat() and chain()
pretty straight forward. Of course having to
understand/use 3 itertools tools is more difficult
than understanding one (izip_longest). Better
documentation could mitigate that a lot.
But the solution using "advanced itertool combinations"
was your's, avoided altogether with an izip_long().

Also this same argument (uses of x can be easily
coded without x by using a generator) is equally
applicable to itertools.izip() itself, yes?
It more easily explained than the versions with zip tricks.

Calling this a "trick" is unfair. The (current pre-2.5)
documentation still mentions no requirement that
izip() arguments be independent (despite the fact
that this issue was discussed here a couple months
ago as I remember. If I remember it was not clear if
that should be a requirement or not, since it would
prevent any use of the same iterable more than
once in izip's arg list, it has not been documented
for 3(?) Python versions, and clearly people are
using the current behavior.

Jan 10 '06 #13

Cappy2112

I haven't used itertools yet, so I don't know their capabilities.

I have used map twice recently with None as the first argument. This
was also the first time I've used map, and was dissapointed when I
found out about the truncation. The lists map was iterating over in my
case were of unequal lengths, so I had to pad the lists to make sure
nothing was truncated.

The most universal solution would be to provide a mechanism to
truncate, pad, or remain the same length. However, with the pad
feature, room should be provided for the user to add the pad item.

Jan 10 '06 #14

Peter Otten

Raymond Hettinger wrote:

Alternately, the need can be met with existing tools by pre-padding the
iterator with enough extra values to fill any holes:

it = chain(iterable, repeat('', group_size-1))
result = izip_longest(*[it]*group_size)

Both approaches require a certain meaure of inventiveness, rely on
advacned tricks, and forgo readability to gain the raw speed and
conciseness afforded by a clever use of itertools. They are also a
challenge to review, test, modify, read, or explain to others.
Is this the author of itertools becoming its most articulate opponent? What
use is this collection of small functions sharing an underlying concept if
you are not supposed to combine them to your heart's content? You probably
cannot pull off some of those tricks until you have good working knowledge
of the iterator protocol, but that is becoming increasingly important to
understand all Python code.
In contrast, a simple generator is trivially easy to create and read,
albiet less concise and not as speedy:

it = iter(iterable)
while 1:
row = tuple(islice(it, group_size))
if len(row) == group_size:
yield row
else: if row:
yield row + ('',) * (group_size - len(row)) break

The generator version is plain, simple, boring, and uninspirational.

I Can't argue with that :-) But nobody spotted the bug within a day; so
dumbing down the code didn't pay off. Furthermore, simple code like above
is often inlined and therefore harder to test and an impediment to
modification. Once you put the logic into a separate function/generator it
doesn't really matter which version you use. You can't get the
chain/repeat/izip variant to meet your (changing) requirements? Throw it
away and just keep the (modified) test suite.

A newbie, by the way, would have /written/ neither. The it = iter(iterable)
voodoo isn't obvious and the barrier to switch from lst[:group_size] to
islice(it, group_size) to /improve/ one's is code high. I expect to see an
inlined list-based solution. The two versions are both part of a learning
experience and both worth the effort.

Regarding the thread's topic, I have no use cases for a map(None, ...)-like
izip_longest(), but occasionally I would prefer izip() to throw a
ValueError if its iterable arguments do not have the same "length".

Peter

Jan 10 '06 #15

Raymond Hettinger

[Raymond]

Both approaches require a certain measure of inventiveness, rely on
advanced tricks, and forgo readability to gain the raw speed and
conciseness afforded by a clever use of itertools. They are also a
challenge to review, test, modify, read, or explain to others.

[Peter Otten] Is this the author of itertools becoming its most articulate opponent? What
use is this collection of small functions sharing an underlying concept if
you are not supposed to combine them to your heart's content? You probably
cannot pull off some of those tricks until you have good working knowledge
of the iterator protocol, but that is becoming increasingly important to
understand all Python code.
I'm happy with the module -- it has been well received and is in
widespread use. The components were designed to be useful both
individually and in combination.

OTOH, I sometimes cringe at code reminiscent of APL:

it = chain(iterable, repeat('', group_size-1))
result = izip(*[it]*group_size)

The code is understandable IF you're conversant with all the component
idioms; however, if you're the slightest bit rusty, the meaning of the
code is not obvious. Too much of the looping logic is implicit (1D
padded input reshaped and truncated to a 2D iterator of tuples); the
style is not purely functional (relying on side-effects from multiple
calls to the same iterator); there are two distinct meanings for the
star operator; and it is unlikely that a most people remember the
precedence rules for whether *[it] expands before the [it]*group_size
repeats. All in all, it cannot be claimed to be a masterpiece of
clarity. That being said, if speed was essential, I would use it every
time (as a separate helper function and never as in-line code).

Of course, the main point of the post was that Duncan's use case was
readily solved with existing tools and did not demonstrate a need for
izip_longest(). His original code was almost there -- it just needed
to use chain() instead of list concatenation.
Regarding the thread's topic, I have no use cases for a map(None, ...)-like
izip_longest(), but occasionally I would prefer izip() to throw a
ValueError if its iterable arguments do not have the same "length".

The Standard ML authors agree. Their library offers both alternatives
(with and without an exception for unequal inputs):

http://www.standardml.org/Basis/list...PAIR.zipEq:VAL

Thanks for the input,

Raymond

Jan 10 '06 #16

rurpy

Raymond Hettinger <py****@rcn.com> wrote:

History of zip()
----------------
PEP 201 (lock-step iteration) documents that a fill-in feature was
contemplated and rejected for the zip() built-in introduced in Py2.0.
In the years before and after, SourceForge logs show no requests for a
fill-in feature.
My perception is that many people view the process
of advocating for a library addition as
1. Very time consuming due to the large amount of
work involved in presenting and defending a proposal.

I would characterize it as time consuming due to the amount of
research, discussion, and analysis it takes to determine whether or not
a proposal is a good idea.
2. Having a very small chance of acceptance.

It is less a matter of chance and more a matter of quality. Great
ideas usually make it. Crummy ideas have no chance unless no one takes
the time to think them through.

Great and crummy are not the problem, since the answer
in those cases is obvious. It is the middle ground where
the answer is not clear, where different people can hold
different views, that are the problem.

I do not know whether this is really the case or even if my
perception is correct, but if it is, it could account for the
lack of feature requests.

I've been monitoring and adjudicating feature requests for five years.
Pythonistas are not known for the lack of assertiveness. If a core
feature has usability problems, we tend to hear about it quickly.
Also, at PyCon, people are not shy about discussing issues that have
arisen.

Yet these are the people both most familiar with the
library as it exists and the most able to easily work
around any limitations, maybe without even thinking
about it. So I am not surprised that this might not
have come up.

To me, the izip solution for my use case was "obvious".
None of the other solutions posted here were.
Of course that could be fixed with documentation.
The lack of requests is not a definitive answer; however, it does
suggest that there is not an strong unmet need. The lack of examples
in the standard library and other code scans corroborates that notion.
This newsgroup query with further serve to gauge the level of interest
and to ferret-out real-word use cases. The jury is still out.
Comments at end re use cases.

How well correlated in the use of map()-with-fill with the
(need for) the use of zip/izip-with-fill?

Close to 100%. A non-iterator version of izip_longest() is exactly
equivalent to map(None, it1, it2, ...).

Isn't non-iterator and iterator very significant? If I use map()
I can trivially determine the arguments lengths and deal with
unequal length before map(). With iterators that is more
difficult. So I can imagine many cases where izip might
be applicable but map not, and a lack of map use cases
not representative of izip use cases.
Since "we already got one", the real issue is whether it has been so
darned useful that it warrants a second variant with two new features
(returns an iterator instead of a list and allows a user-specifiable
fill value).
I don't see it as having one and adding a second variant.
I see it as having 1/2 and adding the other 1/2.

FWIW, the OP's use case involved printing files in multiple
columns:

for f, g in itertools.izip_longest(file1, file2, fillin_value=''):
print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip()) . . .
Actuall my use case did not have quite so much
perlish line noise :-)

The code was not intended to recapitulate your thread; instead, it was
a compact way of summarizing the problem context that first suggested
some value to izip_longest().

I realize that. I just thought that having a
lot extraneous stuff like the formatting made
it look at first glance, messier than it should.
for i1, i2 in itertools.izip (iterable_1, iterable_2):
print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

can be replaced by:
while 1:
i1 = iterable_1.next()
i2 = iterable_2.next()
print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())

yet that was not justification for rejecting izip()'s
inclusion in itertools.

Two thoughts:

1) The easily-coded-simple-alternative argument applies less strongly
to common cases (equal sequence lengths and finite sequences mixed with
infinite suppliers) than it does to less common cases (unequal sequence
lengths where order is important and missing data elements have
meaning).

2) The replacement code is not quite accurate -- the StopIteration
exception needs to be trapped.

Yes, but I don't think that negates the point.

The other use case I had was a simple file diff.
All I cared about was if the files were the same or
not, and if not, what were the first differing lines.

Did you look at difflib?

Yes, but it was way overkill for what I needed.
Raymond

~~~
Thanks for your response but I'm curious why you
mailed it rather than posted?

I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

Here is izip().
My conception is it takes two sequence generators
and matches up the items from each. (I am talking
overall coceptual models here, not details.)
Here is my problem.
I have two files that produce lines and I want to
compare each line.
Seems like a perfect fit.

So I read that izip() only goes to shortest itereable,
I think, "why only the shortest? why not the longest?
what's so special about the shortest?"
At this point explanations involving lack of uses cases
are not very convincing. I have a use. All the
alternative solutions are more code, less clear, less
obvious, less right. But most importantly, there
seems to be a symmetry between the two cases
(shortest vs longest) that makes the lack of
support for matching-to-longest somehow a
defect.

Now if there is something fundamental about
matching items in parallel lists that makes it a
sensible thing to do only for equal lists (or to the
shortest list) that's fine. You seem to imply that's
the case by referencing Haskell, ML, etc. If so,
that needs to be pointed out in izip's docs.
(Though nothing I have read in this thread has
been convincing.)

If it is the case that a matching-longest izip is easily
handled by adding a line or to code using izip-shortest
that should be pointed out in the doc.

But if the answer is to write out an equivalent generator
in basic python, I cannot see izip but as being
excessively specialized, and needing to be fixed.

Re use-cases...

Uses cases seem to be sought from readers
of c.l.p. and python-dev. That is a pretty small
percentage of python users, and those that
choose to respond are self-selecting. I would
expect the distribution of responders to be
skewed toward advanced users for example.
The other source seems to be a search of
the standard libraries but isn't that also likely
not representative of all the code out in the
wild?

Also, can anyone really remember their code
well enough to recall when some proposed
enhancement would be beneficial?

What I am suggesting is that use cases are
important but it also should be realized is that
they may not always give an accurate quantitative
picture, and that some things still might be good
ideas even without use cases (and the converse of
course), not because the use cases don't exist,
but because they may not be seen by the current
use case solicitation process.

Jan 11 '06 #17

David Murmann

ru***@yahoo.com schrieb:

I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

... [explains his POV]

i more or less completely agree with you, IOW i'd like izip
to change, too. but there are two problems that you haven't
mentioned. first is that, in the case of izip, it is not clear
how it should be fixed and if such a change does not naturally
fit an API it is difficult to incorporate. personally i think
i like the keyword version ("izip(*args, sentinel=None)") best,
but the .rest-method version is appealing too...

second (and i think this is the reason for the use-case search)
is that someone has to do it. that means implement it and fix
the docs, add a test-case and such stuff. if there are not many
use-cases the effort to do so might not be worthwhile.

that means if someone (you?) steps forward with a patch that does
this, it would dramatically increase the chance of a change ;).

--
David.

Jan 12 '06 #18

Raymond Hettinger

[David Murmann]

i'd like izip
to change, too.
The zip() function, introduced in Py2.0, was popular and broadly
useful. The izip() function is a zip() substitute with better memory
utilization yet almost identical in how it is used. It is bugfree,
successful, fast, and won't change.

The map() function, introduced shortly after the transistor was
invented, incorporates an option that functions like zip() but fills-in
missing values and won't truncate. It probably seemed like a good idea
at the time, but AFAICT no one uses it (Alex once as a newbie; Strakt
once; me never; the standard library never; etc).

So, the question is not whether non-truncating fill-in will be
available. Afterall, we've already got one: map(None, it1, it2).

Instead, the question is whether to introduce another substantially
identical function with improved memory utilization and a specifiable
fill-in value. But, why would you offer a slightly improved variant of
something that doesn't get used?

Put another way: If you don't use map(None, it1, it2), then you're
going to have a hard time explaining why you need
itertools.izip_longest(it1, it2).
second (and i think this is the reason for the use-case search)
is that someone has to do it. that means implement it and fix
the docs, add a test-case and such stuff. if there are not many
use-cases the effort to do so might not be worthwhile.

In this case, the coding and testing are easy. So that's not the
problem. The real issue is the clutter factor from introducing new
functions if they're not going to be used, if they don't have good use
cases, and if there are better ways to approach most problems.

The reason for the use case search is to determine whether
izip_longest() would end-up as unutilized cruft and add dead-weight to
the language. The jury is still out but it doesn't look promising.
Raymond

Jan 12 '06 #19

Raymond Hettinger

[ru***@yahoo.com]

How well correlated in the use of map()-with-fill with the
(need for) the use of zip/izip-with-fill?

[raymond] Close to 100%. A non-iterator version of izip_longest() is exactly
equivalent to map(None, it1, it2, ...).

[ru***@yahoo.com] If I use map()
I can trivially determine the arguments lengths and deal with
unequal length before map(). With iterators that is more
difficult. So I can imagine many cases where izip might
be applicable but map not, and a lack of map use cases
not representative of izip use cases.

You don't seem to understand what map() does. There is no need to
deal with unequal argument lengths before map(); it does the work for
you. It handles iterator inputs the same way. Meditate on this:

def izip_longest(*args):
return iter(map(None, *args))

Modulo arbitrary fill values and lazily evaluated inputs, the semantics
are exactly what is being requested. Ergo, lack of use cases for
map(None,it1,it2) means that izip_longest(it1,it2) isn't needed.

Raymond

Jan 12 '06 #20

rurpy

"Raymond Hettinger" <py****@rcn.com> wrote:

[ru***@yahoo.com]
> How well correlated in the use of map()-with-fill with the
> (need for) the use of zip/izip-with-fill?
[raymond] Close to 100%. A non-iterator version of izip_longest() is exactly
equivalent to map(None, it1, it2, ...).

[ru***@yahoo.com]
If I use map()
I can trivially determine the arguments lengths and deal with
unequal length before map(). With iterators that is more
difficult. So I can imagine many cases where izip might
be applicable but map not, and a lack of map use cases
not representative of izip use cases.

You don't seem to understand what map() does. There is no need to
deal with unequal argument lengths before map(); it does the work for
you. It handles iterator inputs the same way. Meditate on this:

def izip_longest(*args):
return iter(map(None, *args))

Modulo arbitrary fill values and lazily evaluated inputs, the semantics
are exactly what is being requested. Ergo, lack of use cases for
map(None,it1,it2) means that izip_longest(it1,it2) isn't needed.

"lazily evaluated inputs" is exactly what I was pointing
out and what make your izip_longest() above not the
same as map(None,...), and hence, your conclusion
invalid. Specifically....

def izip_longest(*args):
return iter(map(None, *args))
f1 = file ("test.dat")
f2 = file ("test.dat")
it = izip2 (f1, f2)
while 1:
h1, h2 = it.next ()
print h1.strip(), h2

izip2() in the above code is a "real" izip_longest
based on a version posted in this thread.
test.py 3347 3347
-3487 -3487
2011 2011
239 239
....

Replace izip2 in the above code with your izip_longest test.py

[wait, wait, wait,... after a few minutes type ^c, nothing
happens, close window].

I don't think your izip_longest is at all equivalent to
the proposed izip, and thus there may well be uses
cases for izip that aren't represented by imap(None,...)
use cases, which is what I said. That is, I might have
a use case for izip which I would never even consider
map() for.

Jan 12 '06 #21

Aahz

In article <ma**************************************@python.o rg>,
Raymond Hettinger <py****@rcn.com> wrote:

Request for more information
----------------------------
My request for readers of comp.lang.python is to search your own code
to see if map's None fill-in feature was ever used in real-world code
(not toy examples). I'm curious about the context, how it was used,
and what alternatives were rejected (i.e. did the fill-in feature
improve the code). Likewise, I'm curious as to whether anyone has seen
a zip-style fill-in feature employed to good effect in some other
programming language.

I've counted 63 cases of ``map(None, ...`` in my company's code base.
You're probably right that most of them could/should use zip() instead;
I see at least a few cases of

map(None, field_names, values)

but it's not clear what the expectation is for the size of the two lists.
(None of the uses were created by me -- I abhor map(). ;-)
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing." --Alan Perlis

Jan 13 '06 #22

Raymond Hettinger

[Aahz]

I've counted 63 cases of ``map(None, ...`` in my company's code base.
You're probably right that most of them could/should use zip() instead;
I see at least a few cases of

map(None, field_names, values)

but it's not clear what the expectation is for the size of the two lists.
(None of the uses were created by me -- I abhor map(). ;-)

Thanks for the additional datapoint. I'm most interested in the code
surrounding the few cases with multiple inputs and whether the code is
designed around equal or unequal length inputs. The existence of the
latter is good news for the proposal. Its absence would be a
contra-indication. If you get a chance, please look at those few
multi-input cases.

Thanks,
Raymond

Jan 13 '06 #23

Paul Rubin

"Raymond Hettinger" <py****@rcn.com> writes:

I see at least a few cases of
map(None, field_names, values)
but it's not clear what the expectation is for the size of the two lists.

...
Thanks for the additional datapoint. I'm most interested in the code
surrounding the few cases with multiple inputs and whether the code is
designed around equal or unequal length inputs. The existence of the
latter is good news for the proposal. Its absence would be a
contra-indication. If you get a chance, please look at those few
multi-input cases.

ISTR there's also a plan to eliminate map in Python 3.0 in favor of
list comprehensions. That would get rid of the possibility of using
map(None...) instead of izip_longest. This needs to be thought through.

Jan 13 '06 #24

Raymond Hettinger

[Paul Rubin]

ISTR there's also a plan to eliminate map in Python 3.0 in favor of
list comprehensions. That would get rid of the possibility of using
map(None...) instead of izip_longest. This needs to be thought through.

Not to fear. If map() eventually loses its built-in status, it will
almost certainly reappear in the functional module. Also, if Py3.0
changes the balance of needs and tools, I will certainly adapt the
itertools module as needed.

Jan 13 '06 #25

Andrae Muys

ru***@yahoo.com wrote:

I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

Here is izip().
My conception is it takes two sequence generators
and matches up the items from each. (I am talking
overall coceptual models here, not details.)
Here is my problem.
I have two files that produce lines and I want to
compare each line.
Seems like a perfect fit.

So I read that izip() only goes to shortest itereable,
I think, "why only the shortest? why not the longest?
what's so special about the shortest?"
At this point explanations involving lack of uses cases
are not very convincing. I have a use. All the
alternative solutions are more code, less clear, less
obvious, less right. But most importantly, there
seems to be a symmetry between the two cases
(shortest vs longest) that makes the lack of
support for matching-to-longest somehow a
defect.

Now if there is something fundamental about
matching items in parallel lists that makes it a
sensible thing to do only for equal lists (or to the
shortest list) that's fine. You seem to imply that's
the case by referencing Haskell, ML, etc. If so,
that needs to be pointed out in izip's docs.
(Though nothing I have read in this thread has
been convincing.)

Because a simple call to chain() is an obvious (it's the very first
itertool in the docs), efficient, and straight forward solution to the
problem of padding a shorter iterable.

izip(chain(shorter, pad), longer)

It is not so straight forward to arrange the truncation of an iterable;
moreover this is a far more common case as it is not uncommon to use
infinite iterators in itertable based code.

izip(count(), iter(file))

which doesn't terminate without truncation. That a common use case
fails to terminate is generally considered 'something fundamental'.

The conversion between them is of course a matter of using takewhile
and an appropriate fence.

Padding in the presence of truncation:
def fence(): pass
takewhile(lambda x: x[0] != fence and x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Truncation in the presence of padding:
def fence(): pass
takewhile(lambda x: x[0] != fence or x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Of course you can use any value not in the domain of iter1 or iter2 as
a fence, but a closure is guarrenteed to satisfy that requirement and
hence keeps the code generic. In the padding example, if you actually
care what value is used for pad then either you can either replace
fence, or wrap the result in an imap.

Andrae Muys

Jan 23 '06 #26

rurpy

"Andrae Muys" <an*********@gmail.com> wrote:

ru***@yahoo.com wrote:
I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

Here is izip().
My conception is it takes two sequence generators
and matches up the items from each. (I am talking
overall coceptual models here, not details.)
Here is my problem.
I have two files that produce lines and I want to
compare each line.
Seems like a perfect fit.

So I read that izip() only goes to shortest itereable,
I think, "why only the shortest? why not the longest?
what's so special about the shortest?"
At this point explanations involving lack of uses cases
are not very convincing. I have a use. All the
alternative solutions are more code, less clear, less
obvious, less right. But most importantly, there
seems to be a symmetry between the two cases
(shortest vs longest) that makes the lack of
support for matching-to-longest somehow a
defect.

Now if there is something fundamental about
matching items in parallel lists that makes it a
sensible thing to do only for equal lists (or to the
shortest list) that's fine. You seem to imply that's
the case by referencing Haskell, ML, etc. If so,
that needs to be pointed out in izip's docs.
(Though nothing I have read in this thread has
been convincing.)

Because a simple call to chain() is an obvious (it's the very first
itertool in the docs), efficient, and straight forward solution to the
problem of padding a shorter iterable.

izip(chain(shorter, pad), longer)

And how do you tell, a priori, which iterable will turn out to
be the shortest?
It is not so straight forward to arrange the truncation of an iterable;
moreover this is a far more common case as it is not uncommon to use
infinite iterators in itertable based code.
It may be more common (which is arguable), but that
does not mean the "iterate-to-longest" is uncommon,
or is not common enough to be worth bothering about.
izip(count(), iter(file))

which doesn't terminate without truncation. That a common use case
fails to terminate is generally considered 'something fundamental'.
Nobody is suggesting changing the current behavior
of izip() in this case.
The conversion between them is of course a matter of using takewhile
and an appropriate fence.
"of course"?
Padding in the presence of truncation:
def fence(): pass
takewhile(lambda x: x[0] != fence and x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Truncation in the presence of padding:
def fence(): pass
takewhile(lambda x: x[0] != fence or x[1] != fence,
izip(chain(iter1, repeat(fence)),
chain(iter2, repeat(fence))))

Of course you can use any value not in the domain of iter1 or iter2 as
a fence, but a closure is guarrenteed to satisfy that requirement and
hence keeps the code generic. In the padding example, if you actually
care what value is used for pad then either you can either replace
fence, or wrap the result in an imap.

Thank you for the posting Andrae, it has increased my
knowledge.
But my original point was there are cases (often involving
file iterators) where the problem's complexity seems to be
on the same order as problems involving iterate-to-shortest
solutions, but, while the latter have simple, one function
call solutions, solutions for the former are far more complex
(as your post illustrates). This seems at best unbalanced.
When encountered by someone with less than your level of
expertise, it leads to the feeling, "jeez, why is this simple
problem take hours to figure out and a half dozen function
calls?!?" And please note, I am complaining about a general
problem with Python. The izip() issue was just (at the time)
the most recent trigger of that reaction. (Most recent is
<string>.translate() but that is for a new thread.)

Jan 23 '06 #27

Andrae Muys

ru***@yahoo.com wrote:

Thank you for the posting Andrae, it has increased my
knowledge. No problem, happy to help. But my original point was there are cases (often involving
file iterators) where the problem's complexity seems to be
on the same order as problems involving iterate-to-shortest
solutions, but, while the latter have simple, one function
call solutions, solutions for the former are far more complex
(as your post illustrates). This seems at best unbalanced.
When encountered by someone with less than your level of
expertise, it leads to the feeling, "jeez, why is this simple
problem take hours to figure out and a half dozen function
calls?!?"

I agree, having had to think about how to implement padding with
truncating api to implement your use-case, padding is a useful feature
to have available. I didn't mean to imply otherwise. You asked why
truncating is a common choice in the design of izip-like functions
(Python, ML, Haskell, Scheme); my post was an attempt to answer that.
The summary of my post is:

1. Either can be implemented in terms of the other.
2. Using a truncating zip instead of a padding zip leads to an
incorrect result.
3. Using a padding zip instead of a truncating zip leads to
non-termination.
4. A terminating bug is preferred to a non-terminating bug.

Hence zip is generally truncating.

Andrae Muys

Jan 24 '06 #28

rurpy

Andrae Muys wrote:

ru***@yahoo.com wrote:
Thank you for the posting Andrae, it has increased my
knowledge. No problem, happy to help.
But my original point was there are cases (often involving
file iterators) where the problem's complexity seems to be
on the same order as problems involving iterate-to-shortest
solutions, but, while the latter have simple, one function
call solutions, solutions for the former are far more complex
(as your post illustrates). This seems at best unbalanced.
When encountered by someone with less than your level of
expertise, it leads to the feeling, "jeez, why is this simple
problem take hours to figure out and a half dozen function
calls?!?"

I agree, having had to think about how to implement padding with
truncating api to implement your use-case, padding is a useful feature
to have available. I didn't mean to imply otherwise. You asked why
truncating is a common choice in the design of izip-like functions
(Python, ML, Haskell, Scheme); my post was an attempt to answer that.
The summary of my post is:

1. Either can be implemented in terms of the other.
2. Using a truncating zip instead of a padding zip leads to an
incorrect result.
3. Using a padding zip instead of a truncating zip leads to
non-termination.

(I assume "erroneously" should be inserted in front
of "Using")
OK.
4. A terminating bug is preferred to a non-terminating bug.
This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.
Hence zip is generally truncating.

I see your point in a theoretical sense, but it still
seems to me be a pretty weak reason for making
a practical decision about what should be in Python,
particularly when the justification is being transfered
from a functional programming domain to an
object/procedural one. Is any language feature that
might result in non-termination if erroneously used,
to be banned? That doesn't seem very reasonble.
(I realize you were explaining the rationale behind
the FP choice, not necessarily taking a position
on what should be in Python so the above comment
is directed to the discussion at large.)

Jan 24 '06 #29

Andrae Muys

ru***@yahoo.com wrote:

4. A terminating bug is preferred to a non-terminating bug.
This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

This is not a paradigm based preference. It derives from the basic
fact that you can test for an incorrect result, but you can't test for
non-termination. Therefore if you have to choose between making it
easy to inadvertantly introduce either non-termination or a trivial
logic error, you are better off chosing the trivial logic error. That
preference is independent of which
structured/object/functional/logic/constraint/concurrent/reactive
school of programming you adhere to.
It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.

Two problems with that. One we are dealing with a bug, not an error.
Two, even if we classified the bug as an error noticing non-termination
requires solving the halting-problem whereas noticing an erroneous
result simply requires a unit-test.

The reason why you see this in FP and not in OOP is that
infinite/unbounded/v.large values are trivial to define in FP and
(generally) not in OOP. Consequently the potential for inadvertant
non-termination simply doesn't arise in OOP.

Hence zip is generally truncating.

I see your point in a theoretical sense, but it still
seems to me be a pretty weak reason for making
a practical decision about what should be in Python,
particularly when the justification is being transfered
from a functional programming domain to an
object/procedural one. Is any language feature that
might result in non-termination if erroneously used,
to be banned? That doesn't seem very reasonble.

Of course not --- only by making python non-turing complete could such
a thing be achieved. But you have overstated the position. This isn't a
matter of outlawing function that could be used to write a
non-terminating prodcedure. It's about a simple API design decision.
A choice between two options. It doesn't require the designer to
decide that one is *right* and one *wrong*. Just that one is at least
slightly better than the other; or even that one was chosen simply
because either was better than neither. API design decisions are not
personal vendettas against your use-case :).

Andrae Muys

Jan 25 '06 #30

Steven D'Aprano

ru***@yahoo.com wrote:

4. A terminating bug is preferred to a non-terminating bug.

This is not self-evident to me. Is this somehow
related to the design philosophy of functional
languages? I was never aware of such a preference
in conventional procedural languages (though I
could easily be wrong).

It also seems directly counter to Python's "no errors
should pass silently" dogma -- a non termination
seems more noticable than silent erroneous results.

You are assuming that the function in question returns
a result *quickly*, and that any delay obvious to the
user is clearly a problem ("damn program has hung...").

Consider a generic function which may take "a long
time" to return a result. How long do you wait before
you conclude it has hung? A minute? An hour? A day? A
month? A year? If you have some knowledge of the
expected running time you can make a good estimate
("well, there are only a thousand records in the
database, so even if it takes an entire minute to check
each record, if it hasn't returned after 17 hours, it
is probably hung"). But for arbitrary problems, you
might not know enough about the function and data to
make that estimate. Some calculations do have to run
for days or weeks or months to get a correct result,
and some are impossible to predict in advance.

So, in general, it is impossible to tell the difference
between a non-terminating bug and a correct calculation
that would have finished if you had just waited a
little longer. (How much is a little longer?) Hence, in
general, a terminating wrong answer is easier to test
for than a non-terminating bug.
--
Steven.

Jan 25 '06 #31

Real-world use cases for map's None fill-in feature?

Similar topics