473,383 Members | 1,952 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

itertools, functools, file enhancement ideas

I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(filename):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.

3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
Apr 7 '07 #1
8 1678
Paul Rubin <http://ph****@NOSPAM.invalidwrites:
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
supposed to say crunch(line) of course.
Apr 7 '07 #2
Paul Rubin <http://ph****@NOSPAM.invalidwrote:
I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:

1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...

iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.
the simple way (letting the file object deal w/buffering issues):

def iterchars(f, n=1):
while True:
x = f.read(n)
if not x: break
yield x

the fancy way (doing your own buffering) is left as an exercise for the
reader. I do agree it would be nice to have in some module.

2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say

for line in file_lines(filename):
crunch(line)

The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.
I'm +/-0 on this one vs the idioms:

with open(filename) as f:
for line in f: crunch(line)

with open(filename, 'rb') as f:
for block in iterchars(f): crunch(block)

Making two lines into one is a weak use case for a stdlib function.

3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:

all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
Yes, subtle but important distinction.

4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
Alex
Apr 8 '07 #3
al***@mac.com (Alex Martelli) writes:
for line in file_lines(filename):
crunch(line)

I'm +/-0 on this one vs the idioms:
with open(filename) as f:
for line in f: crunch(line)
Making two lines into one is a weak use case for a stdlib function.
Well, the inspiration is being able to use the iterator in another
genexp:

for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(line)

so it's making more than two lines into one, and just flows more
naturally, like the perl idiom "while(<>) {...}".
lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
I'm not sure how to generalize them but if there's an obvious correct
way to do it, that sounds great ;).

Also forgot to include the obvious:

def compose(f,g):
return lambda(*args,**kw): f(g(*args,**kw))

Apr 8 '07 #4
al***@mac.com (Alex Martelli) writes:
4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)

lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))

Looks like they'd be useful, but I'm not sure about limiting them to
working with 2-argument functions only.
How's

from mysterymodule import resect
from operator import lt
takewhile(rsect(lt, 100), (x*x for x in count()))

better than

takewhile(lambda x:x<100, (x*x for x in count()))

Apart from boiler-plate creation and code-obfuscation purposes?

'as


Apr 8 '07 #5
[Paul Rubin]
1. File iterator for blocks of chars:

f = open('foo')
for block in f.iterchars(n=1024): ...
for block in iter(partial(f.read, 1024), ''): ...
iterates through 1024-character blocks from the file. The default iterator
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:

itemgetter(1,0)
Raymond

Apr 8 '07 #6
rd*********@gmail.com writes:
for block in f.iterchars(n=1024): ...
for block in iter(partial(f.read, 1024), ''): ...
Hmm, nice. I keep forgetting about that feature of iter. It also came
up in a response to my queue example from another post.
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)
Not sure I understand that.
Apr 8 '07 #7
On Apr 8, 9:34 am, Paul Rubin <http://phr...@NOSPAM.invalidwrote:
rdhettin...@gmail.com writes:
a) def flip(f): return lambda x,y: f(y,x)
Curious resemblance to:
itemgetter(1,0)

Not sure I understand that.
I think he read it as lambda (x, y): (y, x)

More interesting would be functools.rshift/lshift, that would rotate
the positional arguments (with wrapping)

def f(a, b, c, d, e):
....
rshift(f, 3) --g, where g(c, d, e, a, b) == f(a, b, c, d, e)

Still don't see much advantage over writing a lambda (except perhaps
speed).

-Mike

Apr 10 '07 #8
"Klaas" <mi********@gmail.comwrites:
Still don't see much advantage over writing a lambda (except perhaps
speed).
Well, it's partly a matter of avoiding boilerplate, especially with
the lambdaphobia that many Python users seem to have.
Apr 11 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

23
by: Francis Avila | last post by:
Below is an implementation a 'flattening' recursive generator (take a nested iterator and remove all its nesting). Is this possibly general and useful enough to be included in itertools? (I know...
1
by: Steven Bethard | last post by:
Is there a reason that itertools.islice doesn't support None arguments for start and step? This would be handy for use with slice objects: >>> r = range(20) >>> s1 = slice(2, 10, 2) >>> s2 =...
18
by: Ville Vainio | last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence (which perl does, surprise surprise, by default): ]]] -> One such implementation is at ...
104
by: cody | last post by:
What about an enhancement of foreach loops which allows a syntax like that: foeach(int i in 1..10) { } // forward foeach(int i in 99..2) { } // backwards foeach(char c in 'a'..'z') { } // chars...
41
by: rurpy | last post by:
The code below should be pretty self-explanatory. I want to read two files in parallel, so that I can print corresponding lines from each, side by side. itertools.izip() seems the obvious way to...
3
by: Daniel Nogradi | last post by:
In a recent thread, http://mail.python.org/pipermail/python-list/2006-September/361512.html, a couple of very useful and enlightening itertools examples were given and was wondering if my problem...
23
by: Mathias Panzenboeck | last post by:
I wrote a few functions which IMHO are missing in python(s itertools). You can download them here: http://sourceforge.net/project/showfiles.php?group_id=165721&package_id=212104 A short...
1
by: castironpi | last post by:
1. functools.partialpre: partialpre( f, x, y )( z )-f( z, x, y ) 2. functools.pare: pare( f, 1 )( x, y )-f( y ) 3. functools.parepre: parepre( f, 1 )( x, y )-f( x ) 4. functools.calling_default:...
2
by: piotr.findeisen | last post by:
Hello! I wanted to use a decorator to wrap partially applied function like this: from functools import * def never_throw(f): @wraps(f) def wrapper(*args, **kwargs): try: return f(*args,...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.