473,809 Members | 2,777 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Fate of itertools.dropw hile() and itertools.takew hile()

I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

* I'm concerned that use cases for the two functions are uncommon and
can obscure code rather than clarify it.

* I originally added them to itertools because they were found in
other functional languages and because it seemed like they would serve
basic building blocks in combination with other itertools allow
construction of a variety of powerful, high-speed iterators. The
latter may have been a false hope -- to date, I've not seen good
recipes that depend on either function.

* If an always true or always false predicate is given, it can be hard
to break-out of the function once it is running.

* Both functions seem simple and basic until you try to explain them
to someone else. Likewise, when reading code containing dropwhile(),
I don't think it is self-evident that dropwhile() may have a lengthy
start-up time.

* Since itertools are meant to be combined together, the whole module
becomes easier to use if there are fewer tools to choose from.

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

Raymond
Dec 29 '07
17 7707
[Marc 'BlackJack' Rintsch]
I use both functions from time to time.
One "recipe" is extracting blocks from text files that are delimited by a
special start and end line.

def iter_block(line s, start_marker, end_marker):
* * return takewhile(lambd a x: not x.startswith(en d_marker),
* * * * * * * * * * *dropwhile(lamb da x: not x.startswith(st art_marker),
* * * * * * * * * * * * * * * *lines))
Glad to hear this came from real code instead of being contrived for
this discussion. Thanks for the contribution.

Looking at the code fragment, I wondered how that approach compared to
others in terms of being easy to write, self-evidently correct,
absence of awkward constructs, and speed. The lambda expressions are
not as fast as straight C calls or in-lined code, and they also each
require a 'not' to invert the startswith condition. The latter is a
bit problematic in that it is a bit awkward, and it is less self-
evident whether the lines with the markers are included or excluded
from the output (the recipe may in fact be buggy -- the line with the
start marker is included and the line with the end marker is
excluded). Your excellent choice of indentation helps improve the
readability of the nested takewhile/dropwhile calls.

In contrast, the generator version is clearer about whether the start
and end marker lines get included and is easily modified if you want
to change that choice. It is easy to write and more self-evident
about how it handles the end cases. Also, it avoids the expense of
the lambda function calls and the awkwardness of the 'not' to invert
the sense of the test:

def iter_block(line s, start_marker, end_marker):
inblock = False
for line in lines:
if inblock:
if line.startswith (end_marker):
break
yield line
elif line.startswith (start_marker):
yield line
inblock = True

And, of course, for this particular application, an approach based on
regular expressions makes short work of the problem and runs very
fast:

re.search('(^be ginmark.*)^endm ark', textblock, re.M |
re.S).group(1)
Raymond
Dec 31 '07 #11
FWIW, here is an generator version written without the state flag:

def iter_block(line s, start_marker, end_marker):
lines = iter(lines)
for line in lines:
if line.startswith (start_marker):
yield line
break
for line in lines:
if line.startswith (end_marker):
return
yield line

Raymond
Dec 31 '07 #12
On Dec 31, 1:25*am, Raymond Hettinger <pyt...@rcn.com wrote:
FWIW, here is an generator version written without the state flag:

* * def iter_block(line s, start_marker, end_marker):
* * * * lines = iter(lines)
* * * * for line in lines:
* * * * * * if line.startswith (start_marker):
* * * * * * * * yield line
* * * * * * * * break
* * * * for line in lines:
* * * * * * if line.startswith (end_marker):
* * * * * * * * return
* * * * * * yield line
Here's a (stateful) version that generates all blocks...

import itertools

def iter_blocks(lin es, start_marker, end_marker):
inblock = [False]
def line_in_block(l ine):
inblock[0] = inblock[0] and not line.startswith (end_marker)
inblock[0] = inblock[0] or line.startswith (start_marker)
return inblock[0]
return (block for is_in_block, block in
itertools.group by(lines, line_in_block) if is_in_block)

If you just want the first block (as the original code did), you can
just take it...

for line in iter_blocks(lin es, start_marker, end_marker).nex t():
... process lines of first block.

I'm not happy about the way the inblock state has to be a 1-element
list to avoid the non-local problem. Is there a nicer way to code it?
Otherwise, I quite like this code (if I do say so myself) as it neatly
separates out the logic of whether you're inside a block or not from
the code that yields blocks and lines. I'd say it was quite readable
if you're familiar with groupby.

And back on topic... I use itertools regularly (and have a functional
background), but have never needed takewhile or dropwhile. I'd be
happy to see them deprecated.

--
Paul Hankin

Dec 31 '07 #13
Raymond Hettinger wrote:
I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

* I'm concerned that use cases for the two functions are uncommon and
can obscure code rather than clarify it.

* I originally added them to itertools because they were found in
other functional languages and because it seemed like they would serve
basic building blocks in combination with other itertools allow
construction of a variety of powerful, high-speed iterators. The
latter may have been a false hope -- to date, I've not seen good
recipes that depend on either function.

* If an always true or always false predicate is given, it can be hard
to break-out of the function once it is running.

* Both functions seem simple and basic until you try to explain them
to someone else. Likewise, when reading code containing dropwhile(),
I don't think it is self-evident that dropwhile() may have a lengthy
start-up time.

* Since itertools are meant to be combined together, the whole module
becomes easier to use if there are fewer tools to choose from.

These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.

Raymond
FWIW, Google Code Search shows a few users:

<http://www.google.com/codesearch?q=la ng%3Apython+%28 drop%7Ctake%29w hile>

Do any of them make good use of them?
--
Dec 31 '07 #14
On Dec 29 2007, 11:10 pm, Raymond Hettinger <pyt...@rcn.com wrote:
I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.
Well I have just this minute used dropwhile in anger, to find the next
suitable filename when writing database dumps using date.count names:

filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
if os.path.exists( filename):
candidates = ("%s.%d" % (filename, x) for x in count(1))
filename = dropwhile(os.pa th.exists, candidates).nex t()

Much clearer than the alternatives I think, please keep dropwhile and
takewhile in itertools ;)

Cheers,

Doug.
Jan 3 '08 #15
On Jan 3, 4:39*pm, "win...@gmail.c om" <win...@gmail.c omwrote:
On Dec 29 2007, 11:10 pm, Raymond Hettinger <pyt...@rcn.com wrote:
I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.

Well I have just this minute used dropwhile in anger, to find the next
suitable filename when writing database dumps using date.count names:

* * filename = "%02d-%02d-%d" % (now.day, now.month, now.year)
* * if os.path.exists( filename):
* * * * candidates = ("%s.%d" % (filename, x) for x in count(1))
* * * * filename = dropwhile(os.pa th.exists, candidates).nex t()

Much clearer than the alternatives I think, please keep dropwhile and
takewhile in itertools ;)
Wouldn't using ifilterfalse instead of dropwhile produce the same
result?

--
Arnaud
Jan 3 '08 #16
Raymond Hettinger <py****@rcn.com writes:
I presume you did scans of
large code bases and you did not find occurrences of
takewhile and dropwhile, right?

Yes.
I think I have used them. I don't remember exactly how. Probably
something that could have been done more generally with groupby.

I remember a clpy thread about a takewhile gotcha, that it consumes an
extra element:
>>from itertools import takewhile as tw
x = range(10)
z = iter(x)
list(tw(lambd a i:i<5, z))
[0, 1, 2, 3, 4]
>>z.next()
6

I.e. I had wanted to use takewhile to split a list into the
initial sublist satisfying some condition, and the rest of the
list.

This all by itself is something to at least warn about. I don't
know if it's enough for deprecation.

I've been cooking up a scheme for iterators with lookahead, that I
want to get around to coding and posting. It's a harder thing
to get right than it at first appears.
Jan 11 '08 #17
On Dec 29, 2007 11:10 PM, Raymond Hettinger <py****@rcn.com wrote:
I'm considering deprecating these two functions and would like some
feedback from the community or from people who have a background in
functional programming.
Personally, I'd rather you kept them around. I have no FP background,
and I found them easy enough to understand.
These thoughts reflect my own experience with the itertools module.
It may be that your experience with them has been different. Please
let me know what you think.
FWIW, I used them only today: http://tinyurl.com/22q6cb

Not sure if something that ugly counts as a reason for keeping them
around, though!

--
Cheers,
Simon B.
si***@brunningo nline.net
http://www.brunningonline.net/simon/blog/
GTalk: simon.brunning | MSN: small_values | Yahoo: smallvalues
Feb 18 '08 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1281
by: anton muhin | last post by:
Hello, everybody! Trying to solve the problem in the subj, I found that I miss some iterator-related tools. Mostly consequental application of the same function to some argument (if I'm not missing something it has a name y-combinator). If we had one, generating the sequence of digits is easy: iter(y(lambda (q, _): divmod(q, n), (x, 0)).next, (0, 0))
18
2640
by: Ville Vainio | last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at http://aspn.activestate.com/ASPN/Mail/Message/python-tutor/2302348
21
2191
by: Steven Bethard | last post by:
Jack Diederich wrote: > > itertools to iter transition, huh? I slipped that one in, I mentioned > it to Raymond at PyCon and he didn't flinch. It would be nice not to > have to sprinkle 'import itertools as it' in code. iter could also > become a type wrapper instead of a function, so an iter instance could > be a wrapper that figures out whether to call .next or __getitem__ > depending on it's argument. > for item in...
28
2305
by: John Bailo | last post by:
Funny, how Bill Gate uses the Deutsches Bank and Barron's to defraud people and try to wreck his competitors ( he can't ). For example, http://www.reuters.com/financeNewsArticle.jhtml?type=hotStocksNews&storyID=4262964 Oh, but see how Barron's and MS are in bed together...
41
2687
by: rurpy | last post by:
The code below should be pretty self-explanatory. I want to read two files in parallel, so that I can print corresponding lines from each, side by side. itertools.izip() seems the obvious way to do this. izip() will stop interating when it reaches the end of the shortest file. I don't know how to tell which file was exhausted so I just try printing them both. The exhausted one will generate a
8
1699
by: Paul Rubin | last post by:
I just had to write some programs that crunched a lot of large files, both text and binary. As I use iterators more I find myself wishing for some maybe-obvious enhancements: 1. File iterator for blocks of chars: f = open('foo') for block in f.iterchars(n=1024): ... iterates through 1024-character blocks from the file. The default iterator
3
1549
by: Rajanikanth Jammalamadaka | last post by:
>>list(itertools.dropwhile(lambda x: x<5,range(10))) Why doesn't this work? Thanks, Raj
0
1143
by: Fredrik Lundh | last post by:
Rajanikanth Jammalamadaka wrote: it works exactly as specified: Help on class dropwhile in module itertools: class dropwhile(__builtin__.object) | dropwhile(predicate, iterable) --dropwhile object |
0
9721
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9601
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10376
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10378
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10115
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9198
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7653
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4332
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3013
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.