Fate of itertools.dropwhile() and itertools.takewhile()

Raymond Hettinger

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

* I'm concerned that use cases for the two functions are uncommon and
can obscure code rather than clarify it.

* I originally added them to itertools because they were found in
other functional languages and because it seemed like they would serve
basic building blocks in combination with other itertools allow
construction of a variety of powerful, high-speed iterators. The
latter may have been a false hope -- to date, I've not seen good
recipes that depend on either function.

* If an always true or always false predicate is given, it can be hard
to break-out of the function once it is running.

* Both functions seem simple and basic until you try to explain them
to someone else. Likewise, when reading code containing dropwhile(),
I don't think it is self-evident that dropwhile() may have a lengthy
start-up time.

* Since itertools are meant to be combined together, the whole module
becomes easier to use if there are fewer tools to choose from.

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

Raymond

Dec 29 '07 #1

Subscribe Post Reply

7626

Istvan Albert

On Dec 29, 6:10 pm, Raymond Hettinger <pyt...@rcn.comwrote:

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

first off, the itertools module is amazing, thanks for creating it. It
changed the way I think about programming. In fact nowadays I start
all my programs with:

from itertools import *

which may not be the best form, but I got tired of importing every
single function individually or writing out the module name.

Now I never needed the dropwhile() and takewhile() functions, but that
may not mean much. For quite a while I never needed the repeat()
function either. It even looked nonsensical to have an iterator that
simply repeats the same thing over and over. One day I had to solve a
problem that needed repeat() and made me really understand what it was
for and got to marvel at a just how neat the solution was.

i.

Dec 30 '07 #2

Steven D'Aprano

On Sat, 29 Dec 2007 15:10:24 -0800, Raymond Hettinger wrote:

* Both functions seem simple and basic until you try to explain them to
someone else.

Oh I don't know about that. The doc strings seem to do an admirable job
to me. Compared to groupby(), the functions are simplicity themselves.

Likewise, when reading code containing dropwhile(), I
don't think it is self-evident that dropwhile() may have a lengthy
start-up time.

*scratches head in confusion*

It isn't? I can understand somebody *under*estimating the start-up time
(perhaps because they overestimate how quickly dropwhile() can iterate
through the items). But surely it is self-evident that a function which
drops items has to drop the items before it can start returning?

* Since itertools are meant to be combined together, the whole module
becomes easier to use if there are fewer tools to choose from.

True, but on the other hand a toolbox with too few tools is harder to use
than one with too many tools.

--
Steven

Dec 30 '07 #3

bearophileHUGS

Almost every day I write code that uses itertools, so I find it very
useful, and its functions fast.
Removing useless things and keeping things tidy is often positive. But
I can't tell you what to remove. Here are my usages (every sub-list is
sorted by inverted frequency usage):

I use often or very often:
groupby( iterable[, key])
imap( function, *iterables)
izip( *iterables)
ifilter( predicate, iterable)
islice( iterable, [start,] stop [, step])

I use once in while:
cycle( iterable)
chain( *iterables)
count( [n])
repeat( object[, times])

I have used probably one time or few times:
starmap( function, iterable)
tee( iterable[, n=2])
ifilterfalse( predicate, iterable)

Never used so far:
dropwhile( predicate, iterable)
takewhile( predicate, iterable)

Bye,
bearophile

Dec 30 '07 #4

Michele Simionato

On Dec 30, 12:10 am, Raymond Hettinger <pyt...@rcn.comwrote:

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

I am with Steven D'Aprano when he says that takewhile and dropwhile
are clear enough. On the other hand, in my code
base I have exactly zero occurrences of takewhile and
dropwhile, even if I tend to use the itertools quite
often. That should be telling. If my situations is
common, that means that takewhile and dropwhile are
useless in practice and should be deprecated.
But I will wait for other respondents. It may just be
that I never needed them. I presume you did scans of
large code bases and you did not find occurrences of
takewhile and dropwhile, right?
Michele Simionato

Dec 30 '07 #5

Marc 'BlackJack' Rintsch

On Sat, 29 Dec 2007 15:10:24 -0800, Raymond Hettinger wrote:

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

I seem to be in a minority here as I use both functions from time to time.
One "recipe" is extracting blocks from text files that are delimited by a
special start and end line.

def iter_block(lines, start_marker, end_marker):
return takewhile(lambda x: not x.startswith(end_marker),
dropwhile(lambda x: not x.startswith(start_marker),
lines))

Maybe these functions usually don't turn up in code that can be called
"recipes" so often but are useful for themselves.

Ciao,
Marc 'BlackJack' Rintsch

Dec 30 '07 #6

Istvan Albert

On Dec 30, 3:29 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:

One "recipe" is extracting blocks from text files that are delimited by a
special start and end line.

Neat solution!

I actually need such functionality every once in a while.

Takewhile + dropwhile to the rescue!

i.

Dec 30 '07 #7

George Sakkis

On Dec 30, 4:12 pm, Istvan Albert <istvan.alb...@gmail.comwrote:

On Dec 30, 3:29 am, Marc 'BlackJack' Rintsch <bj_...@gmx.netwrote:

One "recipe" is extracting blocks from text files that are delimited by a
special start and end line.

Neat solution!

I actually need such functionality every once in a while.

Takewhile + dropwhile to the rescue!

i.

On at least one thread and a recipe for this task (http://
aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877), the proposed
solutions involved groupby() with an appropriate key function. The
takewhile/dropwhile solution seems shorter and (maybe) easier to read
but perhaps not as flexible and general. Regardless, it's a good
example of takewhile/dropwhile.

George

Dec 30 '07 #8

Raymond Hettinger

[bearophile]

Here are my usages (every sub-list is
sorted by inverted frequency usage):

I use often or very often:
groupby( iterable[, key])
imap( function, *iterables)
izip( *iterables)
ifilter( predicate, iterable)
islice( iterable, [start,] stop [, step])

I use once in while:
cycle( iterable)
chain( *iterables)
count( [n])
repeat( object[, times])

I have used probably one time or few times:
starmap( function, iterable)
tee( iterable[, n=2])
ifilterfalse( predicate, iterable)

Never used so far:
dropwhile( predicate, iterable)
takewhile( predicate, iterable)

Thank you for the useful and informative response.
Raymond

Dec 31 '07 #9

Raymond Hettinger

[Michele Simionato]

in my code
base I have exactly zero occurrences of takewhile and
dropwhile, even if I tend to use the itertools quite
often. That should be telling.

Thanks for the additional empirical evidence.

I presume you did scans of
large code bases and you did not find occurrences of
takewhile and dropwhile, right?

Yes.
Raymond

Dec 31 '07 #10

Raymond Hettinger

[Marc 'BlackJack' Rintsch]

I use both functions from time to time.
One "recipe" is extracting blocks from text files that are delimited by a
special start and end line.

def iter_block(lines, start_marker, end_marker):
* * return takewhile(lambda x: not x.startswith(end_marker),
* * * * * * * * * * *dropwhile(lambda x: not x.startswith(start_marker),
* * * * * * * * * * * * * * * *lines))

Glad to hear this came from real code instead of being contrived for
this discussion. Thanks for the contribution.

Looking at the code fragment, I wondered how that approach compared to
others in terms of being easy to write, self-evidently correct,
absence of awkward constructs, and speed. The lambda expressions are
not as fast as straight C calls or in-lined code, and they also each
require a 'not' to invert the startswith condition. The latter is a
bit problematic in that it is a bit awkward, and it is less self-
evident whether the lines with the markers are included or excluded
from the output (the recipe may in fact be buggy -- the line with the
start marker is included and the line with the end marker is
excluded). Your excellent choice of indentation helps improve the
readability of the nested takewhile/dropwhile calls.

In contrast, the generator version is clearer about whether the start
and end marker lines get included and is easily modified if you want
to change that choice. It is easy to write and more self-evident
about how it handles the end cases. Also, it avoids the expense of
the lambda function calls and the awkwardness of the 'not' to invert
the sense of the test:

def iter_block(lines, start_marker, end_marker):
inblock = False
for line in lines:
if inblock:
if line.startswith(end_marker):
break
yield line
elif line.startswith(start_marker):
yield line
inblock = True

And, of course, for this particular application, an approach based on
regular expressions makes short work of the problem and runs very
fast:

re.search('(^beginmark.*)^endmark', textblock, re.M |
re.S).group(1)
Raymond

Dec 31 '07 #11

Raymond Hettinger

FWIW, here is an generator version written without the state flag:

def iter_block(lines, start_marker, end_marker):
lines = iter(lines)
for line in lines:
if line.startswith(start_marker):
yield line
break
for line in lines:
if line.startswith(end_marker):
return
yield line

Raymond

Dec 31 '07 #12

Paul Hankin

On Dec 31, 1:25*am, Raymond Hettinger <pyt...@rcn.comwrote:

FWIW, here is an generator version written without the state flag:

* * def iter_block(lines, start_marker, end_marker):
* * * * lines = iter(lines)
* * * * for line in lines:
* * * * * * if line.startswith(start_marker):
* * * * * * * * yield line
* * * * * * * * break
* * * * for line in lines:
* * * * * * if line.startswith(end_marker):
* * * * * * * * return
* * * * * * yield line

Here's a (stateful) version that generates all blocks...

import itertools

def iter_blocks(lines, start_marker, end_marker):
inblock = [False]
def line_in_block(line):
inblock[0] = inblock[0] and not line.startswith(end_marker)
inblock[0] = inblock[0] or line.startswith(start_marker)
return inblock[0]
return (block for is_in_block, block in
itertools.groupby(lines, line_in_block) if is_in_block)

If you just want the first block (as the original code did), you can
just take it...

for line in iter_blocks(lines, start_marker, end_marker).next():
... process lines of first block.

I'm not happy about the way the inblock state has to be a 1-element
list to avoid the non-local problem. Is there a nicer way to code it?
Otherwise, I quite like this code (if I do say so myself) as it neatly
separates out the logic of whether you're inside a block or not from
the code that yields blocks and lines. I'd say it was quite readable
if you're familiar with groupby.

And back on topic... I use itertools regularly (and have a functional
background), but have never needed takewhile or dropwhile. I'd be
happy to see them deprecated.

--
Paul Hankin

Dec 31 '07 #13

Matt Nordhoff

Raymond Hettinger wrote:

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

* I'm concerned that use cases for the two functions are uncommon and
can obscure code rather than clarify it.

* I originally added them to itertools because they were found in
other functional languages and because it seemed like they would serve
basic building blocks in combination with other itertools allow
construction of a variety of powerful, high-speed iterators. The
latter may have been a false hope -- to date, I've not seen good
recipes that depend on either function.

* If an always true or always false predicate is given, it can be hard
to break-out of the function once it is running.

* Both functions seem simple and basic until you try to explain them
to someone else. Likewise, when reading code containing dropwhile(),
I don't think it is self-evident that dropwhile() may have a lengthy
start-up time.

* Since itertools are meant to be combined together, the whole module
becomes easier to use if there are fewer tools to choose from.

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

Raymond

FWIW, Google Code Search shows a few users:

<http://www.google.com/codesearch?q=lang%3Apython+%28drop%7Ctake%29while>

Do any of them make good use of them?
--

Dec 31 '07 #14

winjer

On Dec 29 2007, 11:10 pm, Raymond Hettinger <pyt...@rcn.comwrote:

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

Well I have just this minute used dropwhile in anger, to find the next
suitable filename when writing database dumps using date.count names:

filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
if os.path.exists(filename):
candidates = ("%s.%d" % (filename, x) for x in count(1))
filename = dropwhile(os.path.exists, candidates).next()

Much clearer than the alternatives I think, please keep dropwhile and
takewhile in itertools ;)

Cheers,

Doug.

Jan 3 '08 #15

Arnaud Delobelle

On Jan 3, 4:39*pm, "win...@gmail.com" <win...@gmail.comwrote:

On Dec 29 2007, 11:10 pm, Raymond Hettinger <pyt...@rcn.comwrote:

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

Well I have just this minute used dropwhile in anger, to find the next
suitable filename when writing database dumps using date.count names:

* * filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
* * if os.path.exists(filename):
* * * * candidates = ("%s.%d" % (filename, x) for x in count(1))
* * * * filename = dropwhile(os.path.exists, candidates).next()

Much clearer than the alternatives I think, please keep dropwhile and
takewhile in itertools ;)

Wouldn't using ifilterfalse instead of dropwhile produce the same
result?

--
Arnaud

Jan 3 '08 #16

Paul Rubin

Raymond Hettinger <py****@rcn.comwrites:

I presume you did scans of
large code bases and you did not find occurrences of
takewhile and dropwhile, right?

Yes.

I think I have used them. I don't remember exactly how. Probably
something that could have been done more generally with groupby.

I remember a clpy thread about a takewhile gotcha, that it consumes an
extra element:

>>from itertools import takewhile as tw
x = range(10)
z = iter(x)
list(tw(lambda i:i<5, z))

[0, 1, 2, 3, 4]

>>z.next()

6

I.e. I had wanted to use takewhile to split a list into the
initial sublist satisfying some condition, and the rest of the
list.

This all by itself is something to at least warn about. I don't
know if it's enough for deprecation.

I've been cooking up a scheme for iterators with lookahead, that I
want to get around to coding and posting. It's a harder thing
to get right than it at first appears.

Jan 11 '08 #17

Simon Brunning

On Dec 29, 2007 11:10 PM, Raymond Hettinger <py****@rcn.comwrote:

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

Personally, I'd rather you kept them around. I have no FP background,
and I found them easy enough to understand.

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

FWIW, I used them only today: http://tinyurl.com/22q6cb

Not sure if something that ugly counts as a reason for keeping them
around, though!

--
Cheers,
Simon B.
si***@brunningonline.net
http://www.brunningonline.net/simon/blog/
GTalk: simon.brunning | MSN: small_values | Yahoo: smallvalues

Feb 18 '08 #18

by: anton muhin | last post by:

Hello, everybody! Trying to solve the problem in the subj, I found that I miss some iterator-related tools. Mostly consequental application of the same function to some argument (if I'm not...

Python

Wishlist item: itertools.flatten

by: Ville Vainio | last post by:

For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at ...

Python

itertools to iter transition (WAS: Pre-PEP: Dictionary accumulatormethods)

by: Steven Bethard | last post by:

Jack Diederich wrote: > > itertools to iter transition, huh? I slipped that one in, I mentioned > it to Raymond at PyCon and he didn't flinch. It would be nice not to > have to sprinkle 'import...

Python

Bill Gate Tries to Wreck Another Linux Stock: RHAT Using Barron's

by: John Bailo | last post by:

Funny, how Bill Gate uses the Deutsches Bank and Barron's to defraud people and try to wreck his competitors ( he can't ). For example, ...

.NET Framework

itertools.izip brokeness

by: rurpy | last post by:

The code below should be pretty self-explanatory. I want to read two files in parallel, so that I can print corresponding lines from each, side by side. itertools.izip() seems the obvious way to...

Python

itertools, functools, file enhancement ideas

by: Paul Rubin | last post by:

I just had to write some programs that crunched a lot of large files, both text and binary. As I use iterators more I find myself wishing for some maybe-obvious enhancements: 1. File iterator...

Python

dropwhile question

by: Rajanikanth Jammalamadaka | last post by:

>>list(itertools.dropwhile(lambda x: x<5,range(10))) Why doesn't this work? Thanks, Raj

Python

Re: dropwhile question

by: Fredrik Lundh | last post by:

Rajanikanth Jammalamadaka wrote: it works exactly as specified: Help on class dropwhile in module itertools: class dropwhile(__builtin__.object) | dropwhile(predicate, iterable)...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Fate of itertools.dropwhile() and itertools.takewhile()

Similar topics