473,587 Members | 2,267 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

itertools, functools, file enhancement ideas

I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1 024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(file name):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(file name):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars " which uses the n-chars iterator instead of
the line iterator.

3. itertools.ichai n:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_li nes(x) for x in all_filenames)) :
crunch(x)

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f) , x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect (lt, 100), (x*x for x in count()))
Apr 7 '07 #1
8 1692
Paul Rubin <http://ph****@NOSPAM.i nvalidwrites:
# loop through all the files crunching all lines in each one
for line in (ichain(file_li nes(x) for x in all_filenames)) :
crunch(x)
supposed to say crunch(line) of course.
Apr 7 '07 #2
Paul Rubin <http://ph****@NOSPAM.i nvalidwrote:
I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1 024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.
the simple way (letting the file object deal w/buffering issues):

def iterchars(f, n=1):
while True:
x = f.read(n)
if not x: break
yield x

the fancy way (doing your own buffering) is left as an exercise for the
reader. I do agree it would be nice to have in some module.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(file name):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(file name):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars " which uses the n-chars iterator instead of
the line iterator.
I'm +/-0 on this one vs the idioms:

with open(filename) as f:
for line in f: crunch(line)

with open(filename, 'rb') as f:
for block in iterchars(f): crunch(block)

Making two lines into one is a weak use case for a stdlib function.

3. itertools.ichai n:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_li nes(x) for x in all_filenames)) :
crunch(x)
Yes, subtle but important distinction.

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f) , x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect (lt, 100), (x*x for x in count()))
Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
Alex
Apr 8 '07 #3
al***@mac.com (Alex Martelli) writes:
for line in file_lines(file name):
crunch(line)

I'm +/-0 on this one vs the idioms:
with open(filename) as f:
for line in f: crunch(line)
Making two lines into one is a weak use case for a stdlib function.
Well, the inspiration is being able to use the iterator in another
genexp:

for line in (ichain(file_li nes(x) for x in all_filenames)) :
crunch(line)

so it's making more than two lines into one, and just flows more
naturally, like the perl idiom "while(<>) {...}".
lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect (lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
I'm not sure how to generalize them but if there's an obvious correct
way to do it, that sounds great ;).

Also forgot to include the obvious:

def compose(f,g):
return lambda(*args,** kw): f(g(*args,**kw) )

Apr 8 '07 #4
al***@mac.com (Alex Martelli) writes:
4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f) , x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect (lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
How's

from mysterymodule import resect
from operator import lt
takewhile(rsect (lt, 100), (x*x for x in count()))

better than

takewhile(lambd a x:x<100, (x*x for x in count()))

Apart from boiler-plate creation and code-obfuscation purposes?

'as


Apr 8 '07 #5
[Paul Rubin]
1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1 024): ...
for block in iter(partial(f. read, 1024), ''): ...
iterates through 1024-character blocks from the file. The default iterator
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:

itemgetter(1,0)
Raymond

Apr 8 '07 #6
rd*********@gma il.com writes:
for block in f.iterchars(n=1 024): ...
for block in iter(partial(f. read, 1024), ''): ...
Hmm, nice. I keep forgetting about that feature of iter. It also came
up in a response to my queue example from another post.
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)
Not sure I understand that.
Apr 8 '07 #7
On Apr 8, 9:34 am, Paul Rubin <http://phr...@NOSPAM.i nvalidwrote:
rdhettin...@gma il.com writes:
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)

Not sure I understand that.
I think he read it as lambda (x, y): (y, x)

More interesting would be functools.rshif t/lshift, that would rotate
the positional arguments (with wrapping)

def f(a, b, c, d, e):
....
rshift(f, 3) --g, where g(c, d, e, a, b) == f(a, b, c, d, e)

Still don't see much advantage over writing a lambda (except perhaps
speed).

-Mike

Apr 10 '07 #8
"Klaas" <mi********@gma il.comwrites:
Still don't see much advantage over writing a lambda (except perhaps
speed).
Well, it's partly a matter of avoiding boilerplate, especially with
the lambdaphobia that many Python users seem to have.
Apr 11 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
3715
by: Francis Avila | last post by:
Below is an implementation a 'flattening' recursive generator (take a nested iterator and remove all its nesting). Is this possibly general and useful enough to be included in itertools? (I know *I* wanted something like it...) Very basic examples: >>> rl = , '678', 9]] >>> list(flatten(rl)) >>> notstring = lambda obj: not...
1
3778
by: Steven Bethard | last post by:
Is there a reason that itertools.islice doesn't support None arguments for start and step? This would be handy for use with slice objects: >>> r = range(20) >>> s1 = slice(2, 10, 2) >>> s2 = slice(2, 10) >>> s3 = slice(10) >>> list(itertools.islice(r, s1.start, s1.stop, s1.step)) >>> list(itertools.islice(r, s2.start, s2.stop, s2.step))
18
2615
by: Ville Vainio | last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at http://aspn.activestate.com/ASPN/Mail/Message/python-tutor/2302348
104
7124
by: cody | last post by:
What about an enhancement of foreach loops which allows a syntax like that: foeach(int i in 1..10) { } // forward foeach(int i in 99..2) { } // backwards foeach(char c in 'a'..'z') { } // chars foeach(Color c in Red..Blue) { } // using enums It should work with all integral datatypes. Maybe we can step a bit further: foeach(int i in...
41
2648
by: rurpy | last post by:
The code below should be pretty self-explanatory. I want to read two files in parallel, so that I can print corresponding lines from each, side by side. itertools.izip() seems the obvious way to do this. izip() will stop interating when it reaches the end of the shortest file. I don't know how to tell which file was exhausted so I just...
3
1162
by: Daniel Nogradi | last post by:
In a recent thread, http://mail.python.org/pipermail/python-list/2006-September/361512.html, a couple of very useful and enlightening itertools examples were given and was wondering if my problem also can be solved in an elegant way by itertools. I have a bunch of tuples with varying lengths and would like to have all of them the length of...
23
1451
by: Mathias Panzenboeck | last post by:
I wrote a few functions which IMHO are missing in python(s itertools). You can download them here: http://sourceforge.net/project/showfiles.php?group_id=165721&package_id=212104 A short description to all the functions: icmp(iterable1, iterable2) -integer Return negative if iterable1 < iterable2, zero if iterable1 == iterable1,
1
1194
by: castironpi | last post by:
1. functools.partialpre: partialpre( f, x, y )( z )-f( z, x, y ) 2. functools.pare: pare( f, 1 )( x, y )-f( y ) 3. functools.parepre: parepre( f, 1 )( x, y )-f( x ) 4. functools.calling_default: calling_default( f, a, DefaultA, b )-> f( a, <default 2rd arg, even if not None>, b )
2
2794
by: piotr.findeisen | last post by:
Hello! I wanted to use a decorator to wrap partially applied function like this: from functools import * def never_throw(f): @wraps(f) def wrapper(*args, **kwargs): try: return f(*args, **kwargs)
0
7915
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7843
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8339
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
8220
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6619
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5712
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5392
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3872
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1185
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.