473,395 Members | 2,467 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

count items in generator

Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.

May 14 '06 #1
14 8526
BartlebyScrivener <rp*******@gmail.com> wrote:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.


My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath))
Alex
May 14 '06 #2
BartlebyScrivener wrote:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.


As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i

You can even shadow the builtin len() if you prefer:

import __builtin__

def len(iterable):
try: return __builtin__.len(iterable)
except:
i = 0
for x in iterable: i += 1
return i
HTH,
George

May 14 '06 #3
Thanks! And thanks for the Cookbook.

rd

"There is no abstract art. You must always start with something.
Afterward you can remove all traces of reality."--Pablo Picasso

May 14 '06 #4
"George Sakkis" <ge***********@gmail.com> writes:
As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.
May 14 '06 #5
Paul Rubin <http://ph****@NOSPAM.invalid> wrote:
"George Sakkis" <ge***********@gmail.com> writes:
As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.


True. Changing the except clause here to

except: return sum(1 for x in iterable)

keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.
Alex
May 14 '06 #6
In article <1h***************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
May 14 '06 #7
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!
May 14 '06 #8
>> True. Changing the except clause here to
except: return sum(1 for x in iterable) keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.


Every thing was going just great. Now I have to think again.

Thank you all.

rick

May 14 '06 #9
Paul Rubin wrote:
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!


How is this worse than list(itertools.count()) ?

May 14 '06 #10
Cameron Laird <cl****@lairds.us> wrote:
In article <1h***************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
.
.
.
My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath)) .
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.


Can you please give some examples? None comes readily to mind...

For that matter, would it be an advantage for len() to operate
on iterables? It could be faster and thriftier on memory than
either of the above, and my first impression is that it's
sufficiently natural not to offend those of suspicious of
language bloat.


I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).
Alex
May 14 '06 #11
George Sakkis <ge***********@gmail.com> wrote:
Paul Rubin wrote:
cl****@lairds.us (Cameron Laird) writes:
For that matter, would it be an advantage for len() to operate
on iterables?


print len(itertools.count())

Ouch!!


How is this worse than list(itertools.count()) ?


It's a slightly worse trap because list(x) ALWAYS iterates on x (just
like "for y in x:"), while len(x) MAY OR MAY NOT iterate on x (under
Cameron's proposal; it currently never does).

Yes, there are other subtle traps of this ilk already in Python, such as
"if y in x:" -- this, too, may or may not iterate. But the fact that a
potential problem exists in some corner cases need not be a good reason
to extend the problem to higher frequency;-).
Alex
May 14 '06 #12
In article <1h**************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
Cameron Laird <cl****@lairds.us> wrote:
In article <1hfarom.1lfetjc18leddeN%al***@mac.com>,
Alex Martelli <al***@mac.com> wrote:
.
.
.
>My preference would be (with the original definition for
>words_of_the_file) to code
>
> numwords = sum(1 for w in words_of_the_file(thefilepath))

.
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.


Can you please give some examples? None comes readily to mind...

May 15 '06 #13
In article <1h**************************@mac.com>,
Alex Martelli <al***@mac.com> wrote:
May 15 '06 #14
George Sakkis a écrit :
(snip)
def length(iterable):
try: return len(iterable)
except:
except TypeError:
i = 0
for x in iterable: i += 1
return i

(snip)
May 15 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Pete | last post by:
There is a Summary/Example further down... On page one of my site I have a form with some checkboxes and detailed descriptions. When the form is submitted (to page two), the values of the...
16
by: It's me | last post by:
Okay, I give up. What's the best way to count number of items in a list? For instance, a=,4,5,] I want to know how many items are there in a (answer should be 7 - I don't want it to be 4)
2
by: cefrancke | last post by:
I can't seem to find a straight answer for my specific issue. Any help would be appreciated. I would like to count the various items in a table where the fields have a 'group' relationship. I...
0
by: chris.bender | last post by:
1. My problem: I am using a query to populate a Chart in MS Access 2k. 2. My query: SELECT .Status, ., Sum(.Amount) AS SumOfAmount, Sum(IIf(!="Debit",!,!*-1)) AS realAmount, Count(.Status) AS...
9
by: Alpha | last post by:
Hi, How can I set all the items in a listbox to be selected? I can't find a property or mehtod to do it so I thought I'll try using setselected method but I need to find out how many items are in...
4
by: rdraider | last post by:
We have an inventory table (Items) that contains item_no and qty_on_hand fields. Another table (Item_Serial) contains serial numbers for any item that has serial numbers. If an item has 10...
12
by: Dave Dean | last post by:
Hi all, I'm looking for a way to iterate through a list, two (or more) items at a time. Basically... myList = I'd like to be able to pull out two items at a time - simple examples would...
26
by: Ping | last post by:
Hi, I'm wondering if it is useful to extend the count() method of a list to accept a callable object? What it does should be quite intuitive: count the number of items that the callable returns...
9
by: Kugutsumen | last post by:
I am relatively new the python language and I am afraid to be missing some clever construct or built-in way equivalent to my 'chunk' generator below. def chunk(size, items): """generate N items...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.