469,592 Members | 2,041 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,592 developers. It's quick & easy.

count items in generator

Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.

May 14 '06 #1
14 7763
BartlebyScrivener <rp*******@gmail.com> wrote:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.


My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath))
Alex
May 14 '06 #2
BartlebyScrivener wrote:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.


As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i

You can even shadow the builtin len() if you prefer:

import __builtin__

def len(iterable):
try: return __builtin__.len(iterable)
except:
i = 0
for x in iterable: i += 1
return i
HTH,
George

May 14 '06 #3
Thanks! And thanks for the Cookbook.

rd

"There is no abstract art. You must always start with something.
Afterward you can remove all traces of reality."--Pablo Picasso

May 14 '06 #4
"George Sakkis" <ge***********@gmail.com> writes:
As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.
May 14 '06 #5
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
"George Sakkis" <ge***********@gmail.com> writes:
As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.


True. Changing the except clause here to

except: return sum(1 for x in iterable)

keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.
Alex
May 14 '06 #6
In article <1h***************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
May 14 '06 #7
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!
May 14 '06 #8
>> True. Changing the except clause here to
except: return sum(1 for x in iterable) keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.


Every thing was going just great. Now I have to think again.

Thank you all.

rick

May 14 '06 #9
Paul Rubin wrote:
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!


How is this worse than list(itertools.count()) ?

May 14 '06 #10
Cameron Laird <cl****@lairds.us> wrote:
In article <1h***************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
.
.
.
My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath)) .
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.


Can you please give some examples? None comes readily to mind...

For that matter, would it be an advantage for len() to operate
on iterables? It could be faster and thriftier on memory than
either of the above, and my first impression is that it's
sufficiently natural not to offend those of suspicious of
language bloat.


I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).
Alex
May 14 '06 #11
George Sakkis <ge***********@gmail.com> wrote:
Paul Rubin wrote:
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!


How is this worse than list(itertools.count()) ?


It's a slightly worse trap because list(x) ALWAYS iterates on x (just
like "for y in x:"), while len(x) MAY OR MAY NOT iterate on x (under
Cameron's proposal; it currently never does).

Yes, there are other subtle traps of this ilk already in Python, such as
"if y in x:" -- this, too, may or may not iterate. But the fact that a
potential problem exists in some corner cases need not be a good reason
to extend the problem to higher frequency;-).
Alex
May 14 '06 #12
In article <1h**************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
Cameron Laird <cl****@lairds.us> wrote:
In article <1hfarom.1lfetjc18leddeN%al***@mac.com>,
Alex Martelli <al***@mac.com> wrote:
.
.
.
>My preference would be (with the original definition for
>words_of_the_file) to code
>
> numwords = sum(1 for w in words_of_the_file(thefilepath))

.
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.


Can you please give some examples? None comes readily to mind...

May 15 '06 #13
In article <1h**************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
May 15 '06 #14
George Sakkis a écrit :
(snip)
def length(iterable):
try: return len(iterable)
except:
except TypeError:
i = 0
for x in iterable: i += 1
return i

(snip)
May 15 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

16 posts views Thread by It's me | last post: by
2 posts views Thread by cefrancke | last post: by
9 posts views Thread by Alpha | last post: by
12 posts views Thread by Dave Dean | last post: by
26 posts views Thread by Ping | last post: by
9 posts views Thread by Kugutsumen | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.