473,386 Members | 1,720 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Howegrown wordcount

I've coded a little word counting routine that handles a reasonably
wide range of inputs. How could it be made to cover more, though
admittedly more remote, possibilites such as nested lists of lists,
items for which the string representation is a string containing lists
etc. etc. without significantly increasing the complexity of the
program?

Thomas Philips

def wordcount(input):

from string import whitespace

#Treat iterable inputs differently
if "__iter__" in dir(input):
wordList =(" ".join([str(item) for item in input])).split()
else:
wordList = [str(input)]

#Remove any words that are just whitespace
for i,word in enumerate(wordList):
while word and word[-1] in whitespace:
word = word[:-1]
wordList[i] = word
wc = len(filter(None,wordList)) #Filter out any empty strings
return wc
Jul 18 '05 #1
5 1290
Something like this?

def wordcount(input, sep=" "):
global words
if isinstance(input, str):
words+=len([x.strip() for x in input.split(sep)])
return words
else:
for item in input:
wordcount(item)

return words

#
# Test with a string
#
words=0
print wordcount("This is a test") # String test
words=0
print wordcount(["This is a test", "This is a test"]) # List test
words=0
print wordcount([["This is a test","This is a test"],
["This is a test","This is a test"]]) # List of lists
words=0
data=[["this is a test"],["this", "is", "a", "test"],"This is a test"]
print wordcount(data)

HTH,
Larry Bates
"Thomas Philips" <tk****@hotmail.com> wrote in message
news:b4*************************@posting.google.co m...
I've coded a little word counting routine that handles a reasonably
wide range of inputs. How could it be made to cover more, though
admittedly more remote, possibilites such as nested lists of lists,
items for which the string representation is a string containing lists
etc. etc. without significantly increasing the complexity of the
program?

Thomas Philips

def wordcount(input):

from string import whitespace

#Treat iterable inputs differently
if "__iter__" in dir(input):
wordList =(" ".join([str(item) for item in input])).split()
else:
wordList = [str(input)]

#Remove any words that are just whitespace
for i,word in enumerate(wordList):
while word and word[-1] in whitespace:
word = word[:-1]
wordList[i] = word
wc = len(filter(None,wordList)) #Filter out any empty strings
return wc

Jul 18 '05 #2
On Fri, Jun 11, 2004 at 11:05:32AM -0700, Thomas Philips wrote:
I've coded a little word counting routine that handles a reasonably
wide range of inputs. How could it be made to cover more, though
admittedly more remote, possibilites such as nested lists of lists,
items for which the string representation is a string containing lists
etc. etc. without significantly increasing the complexity of the
program?
Hello,

Such 'magical' behaviour is error prone and causes many a headache when
debugging. Some might think that even this is too much:
#Treat iterable inputs differently
if "__iter__" in dir(input):
wordList =(" ".join([str(item) for item in input])).split()
else:
wordList = [str(input)]


Myself included. Perhaps instead of increasing the complexity of this
function, why not write a few wrapper functions if you have the need.
David.

--
"Science is what we understand well enough to explain to a
computer. Art is everything else we do."
-- Donald Knuth

Jul 18 '05 #3
An embarrassing mistake on my part: I should have typed
#Treat iterable inputs differently
if "__iter__" in dir(input):
wordList =(" ".join([str(item) for item in input])).split()
else:
wordList = str(input).split()

I wish I knew how to treat all possible inputs in a uniform fashion,
but I'm nowhere near there as yet, hence the question. That said, it
addressess the situations that arise in practice fairly well, though I
am sure it can be sped up substantially.

Thomas Philips
Jul 18 '05 #4
Larry Bates wrote:
Something like this?

def wordcount(input, sep=" "):
global words
if isinstance(input, str):
words+=len([x.strip() for x in input.split(sep)])
What's the purpose of stripping the items in the list if you just count
their number ? Isn't this equivalent to
words += len(input.split(sep))
return words
else:
for item in input:
wordcount(item)

return words


Removing the global statement and sep param, you get:

def wordcount(input):
if isinstance(input, str):
return len(input.split())
else:
return sum([wordcount(item) for item in input])

--
Grégoire Dooms
Jul 18 '05 #5
Grégoire Dooms wrote:

What's the purpose of stripping the items in the list if you just count
their number ? Isn't this equivalent to
words += len(input.split(sep))
return words
else:
for item in input:
wordcount(item)

return words

Removing the global statement and sep param, you get:

def wordcount(input):
if isinstance(input, str):
return len(input.split())
else:
return sum([wordcount(item) for item in input])

--
Grégoire Dooms


After reading this thread, I decided to embark on a word counting
program of my own. One thing I like to do when learning new programming
languages is to try and emulate some of my favorite UNIX type programs.

That said, to get the count of words in a string, I merely did the
following:
# Beginning of program

import re

# Right now my simple wc program just reads piped data
if not sys.stdin.isatty(): input_data = sys.stdin.read()

print "number of words:", len(re.findall('[^\s]+', input_data))

# End of program

Though I've only done trivial tests on this up to now, the word count of
this script seems to match that of the wc on my system (RH Linux WS). I
ran some big RFC text files through this too.

There could be some flaws here; I don't know. I'll have to look at it
better when I get back from the gym. If anyone here finds a problem, I'd
be interested in hearing it.

Like I said, I love using these UNIX type programs to learn a new
language. It helps me learn things like file I/O, command line
arguments, string manipulations.. etc.

Keith P. Boruff


Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: noobcprogrammer | last post by:
#include "IndexADT.h" int IndexInit(IndexADT* word) { word->head = NULL; word->wordCount = 0; return 1; } int IndexCreate(IndexADT* wordList,char* argv)
4
by: naknak4 | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will...
6
by: naknak | last post by:
Introduction This assignment requires you to develop solutions to the given problem using several different approaches (which actually involves using three different STL containers). You will...
3
NewYorker
by: NewYorker | last post by:
Hello brothers and sisters, Please help me complete this program and get the output shown below. Here is all I have ___________________________________________________________ This...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.