By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,383 Members | 2,080 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,383 IT Pros & Developers. It's quick & easy.

itertools, functools, file enhancement ideas

P: n/a
I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(filename):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.

3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
Apr 7 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
supposed to say crunch(line) of course.
Apr 7 '07 #2

P: n/a
Paul Rubin <http://ph****@NOSPAM.invalidwrote:
I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.
the simple way (letting the file object deal w/buffering issues):

def iterchars(f, n=1):
while True:
x = f.read(n)
if not x: break
yield x

the fancy way (doing your own buffering) is left as an exercise for the
reader. I do agree it would be nice to have in some module.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(filename):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.
I'm +/-0 on this one vs the idioms:

with open(filename) as f:
for line in f: crunch(line)

with open(filename, 'rb') as f:
for block in iterchars(f): crunch(block)

Making two lines into one is a weak use case for a stdlib function.

3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
Yes, subtle but important distinction.

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
Alex
Apr 8 '07 #3

P: n/a
al***@mac.com (Alex Martelli) writes:
for line in file_lines(filename):
crunch(line)

I'm +/-0 on this one vs the idioms:
with open(filename) as f:
for line in f: crunch(line)
Making two lines into one is a weak use case for a stdlib function.
Well, the inspiration is being able to use the iterator in another
genexp:

for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(line)

so it's making more than two lines into one, and just flows more
naturally, like the perl idiom "while(<>) {...}".
lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
I'm not sure how to generalize them but if there's an obvious correct
way to do it, that sounds great ;).

Also forgot to include the obvious:

def compose(f,g):
return lambda(*args,**kw): f(g(*args,**kw))

Apr 8 '07 #4

P: n/a
al***@mac.com (Alex Martelli) writes:
4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
How's

from mysterymodule import resect
from operator import lt
takewhile(rsect(lt, 100), (x*x for x in count()))

better than

takewhile(lambda x:x<100, (x*x for x in count()))

Apart from boiler-plate creation and code-obfuscation purposes?

'as


Apr 8 '07 #5

P: n/a
[Paul Rubin]
1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...
for block in iter(partial(f.read, 1024), ''): ...
iterates through 1024-character blocks from the file. The default iterator
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:

itemgetter(1,0)
Raymond

Apr 8 '07 #6

P: n/a
rd*********@gmail.com writes:
for block in f.iterchars(n=1024): ...
for block in iter(partial(f.read, 1024), ''): ...
Hmm, nice. I keep forgetting about that feature of iter. It also came
up in a response to my queue example from another post.
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)
Not sure I understand that.
Apr 8 '07 #7

P: n/a
On Apr 8, 9:34 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
rdhettin...@gmail.com writes:
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)

Not sure I understand that.
I think he read it as lambda (x, y): (y, x)

More interesting would be functools.rshift/lshift, that would rotate
the positional arguments (with wrapping)

def f(a, b, c, d, e):
....
rshift(f, 3) --g, where g(c, d, e, a, b) == f(a, b, c, d, e)

Still don't see much advantage over writing a lambda (except perhaps
speed).

-Mike

Apr 10 '07 #8

P: n/a
"Klaas" <mi********@gmail.comwrites:
Still don't see much advantage over writing a lambda (except perhaps
speed).
Well, it's partly a matter of avoiding boilerplate, especially with
the lambdaphobia that many Python users seem to have.
Apr 11 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.