Learning Python via a little word frequency program

Andrew Savige

I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending byname. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-xfor x in freq.values()], freq.keys())
deco.sort()
for v, k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem. Though I've read about the DSU Python sorting idiom,
I'm not sure I've strictly applied it above ... and the -x hack above to
achieve a descending sort feels a bit odd to me, though I couldn't think
of a better way to do it.

I also have a few specific questions. Instead of:

for name in names.split():
freq[name] = 1 + freq.get(name, 0)

I might try:

for name innames.split():
try:
freq[name] += 1
except KeyError:
freq[name] = 1

Which is preferred?

Ditto for:

deco = zip([-x for x in freq.values()], freq.keys())

versus:

deco = zip(map(operator.neg, freq.values()), freq.keys())

Finally, I might replace:

for v, k in deco:
print "%-10s: %d" % (k, -v)

with:

print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

Any feedback on good Python style, performance tips, good books
to read, etc. is appreciated.

Thanks,
/-\
Make the switch tothe world's best email. Get the new Yahoo!7 Mail now. www.yahoo7.com.au/worldsbestemail

Jan 9 '08 #1

Subscribe Post Reply

4063

Peter Otten

Andrew Savige wrote:

I'm learning Python by reading David Beazley's "Python Essential
Reference" book and writing a few toy programs. To get a feel for hashes
and sorting, I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock" freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-x for x in freq.values()], freq.keys()) deco.sort() for v,
k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem. Though I've read about the DSU Python sorting
idiom, I'm not sure I've strictly applied it above ... and the -x hack
above to achieve a descending sort feels a bit odd to me, though I
couldn't think of a better way to do it.

You can specify a reverse sort with

deco.sort(reverse=True)

Newer versions of Python have the whole idiom built in:

>>items = freq.items()
from operator import itemgetter
items.sort(key=itemgetter(1), reverse=True)
for item in items:

.... print "%-10s: %d" % item
....
kevin : 3
jock : 2
bill : 1
andrew : 1
fred : 1
freddy : 1

You can pass an arbitrary function as key. itemgetter(1) is equivalent to

def key(item): return item[1]

I also have a few specific questions. Instead of:

for name in names.split():
freq[name] = 1 + freq.get(name, 0)

I might try:

for name in names.split():
try:
freq[name] += 1
except KeyError:
freq[name] = 1

Which is preferred?

I have no strong opinion about that. Generally speaking try...except is
faster when you have many hits, i. e. the except suite is rarely invoked.
Starting with Python 2.5 you can alternatively use

from collections import defaultdict
freq = defaultdict(int)
for name in names.split():
freq[name] += 1

Ditto for:

deco = zip([-x for x in freq.values()], freq.keys())

versus:

deco = zip(map(operator.neg, freq.values()), freq.keys())

I think the list comprehension is slightly more readable.

Finally, I might replace:

for v, k in deco:
print "%-10s: %d" % (k, -v)

with:

print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

Again, I find the explicit for loop more readable, but sometimes use the
genexp, too.

Peter

Jan 9 '08 #2

Bruno Desthuilliers

Andrew Savige a Ã©crit :

I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-x for x in freq.values()], freq.keys())
deco.sort()
for v, k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem.

For a one-shot Q&D script:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freqs = [(- names.count(name), name) for name in set(names.split())]
print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))
Now I might choose a very different solution for a more serious
application, depending on detailed specs and intended use of the
"frequency table".

Though I've read about the DSU Python sorting idiom,
I'm not sure I've strictly applied it above ...

Perhaps not "strictly" since you don't really "undecorate", but that's
another application of the same principle : provided the appropriate
data structure, sort() (or sorted()) will do the right thing.

and the -x hack above to
achieve a descending sort feels a bit odd to me, though I couldn't think
of a better way to do it.

The "other" way would be to pass a custom comparison callback to sort,
which would be both slower and more complicated. Your solution is IMHO
the right thing to do here.

I also have a few specific questions. Instead of:

for name in names.split():
freq[name] = 1 + freq.get(name, 0)

I might try:

for name in names.split():
try:
freq[name] += 1
except KeyError:
freq[name] = 1

or a couple other solutions, including a defaultdict (python >= 2.5).

Which is preferred?

It's a FAQ - or it should be one. Globally: the second one tends to be
faster when there's no exception (ie the key already exists), but slower
when exceptions happen. So it mostly depends on what you expect your
dataset to be.

Now note that you don't necessarily need a dict here !-)

Ditto for:

deco = zip([-x for x in freq.values()], freq.keys())

versus:

deco = zip(map(operator.neg, freq.values()), freq.keys())

As far as I'm concerned, I'd favor the first solution here. Reads better
IMHO

Finally, I might replace:

for v, k in deco:
print "%-10s: %d" % (k, -v)

with:

print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

That's what I'd do here too - but it depends on context (ie: for huge
datasets and/or complex formating, i'd use a for loop).

Jan 9 '08 #3

MRAB

On Jan 9, 12:19 pm, Bruno Desthuilliers <bruno.
42.desthuilli...@wtf.websiteburo.oops.comwrote:

Andrew Savige a écrit :

I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-x for x in freq.values()], freq.keys())
deco.sort()
for v, k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem.

For a one-shot Q&D script:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freqs = [(- names.count(name), name) for name in set(names.split())]
print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))

[snip]
That actually prints:

kevin : 3
fred : 2
jock : 2
andrew : 1
bill : 1
freddy : 1

It says that "fred" occurs twice because of "freddy".

names = "freddy fred bill jock kevin andrew kevin kevin jock"
name_list = names.split()
freqs = [(- name_list.count(name), name) for name in set(name_list)]
print "\n".join("%-10s : %s" % (n, -f) for f, n in sorted(freqs))

Jan 9 '08 #4

Bruno Desthuilliers

MRAB a écrit :

On Jan 9, 12:19 pm, Bruno Desthuilliers <bruno.
42.desthuilli...@wtf.websiteburo.oops.comwrote:

(snip)

That actually prints:

kevin : 3
fred : 2
jock : 2
andrew : 1
bill : 1
freddy : 1

It says that "fred" occurs twice because of "freddy".

oops ! My bad, didn't spot that one :(

Thanks for pointing this out.

Jan 10 '08 #5

rent

import collections

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = collections.defaultdict(int)
for name in names.split():
freq[name] += 1
keys = freq.keys()
keys.sort(key = freq.get, reverse = True)
for k in keys:
print "%-10s: %d" % (k, freq[k])

On Jan 9, 6:58 pm, Andrew Savige <ajsav...@yahoo.com.auwrote:

I'm learning Python by reading David Beazley's "Python Essential Reference"
book and writing a few toy programs. To get a feel for hashes and sorting,
I set myself this little problem today (not homework, BTW):

Given a string containing a space-separated list of names:

names = "freddy fred bill jock kevin andrew kevin kevin jock"

produce a frequency table of names, sorted descending by frequency.
then ascending by name. For the above data, the output should be:

kevin : 3
jock : 2
andrew : 1
bill : 1
fred : 1
freddy : 1

Here's my first attempt:

names = "freddy fred bill jock kevin andrew kevin kevin jock"
freq = {}
for name in names.split():
freq[name] = 1 + freq.get(name, 0)
deco = zip([-x for x in freq.values()], freq.keys())
deco.sort()
for v, k in deco:
print "%-10s: %d" % (k, -v)

I'm interested to learn how more experienced Python folks would solve
this little problem. Though I've read about the DSU Python sorting idiom,
I'm not sure I've strictly applied it above ... and the -x hack above to
achieve a descending sort feels a bit odd to me, though I couldn't think
of a better way to do it.

I also have a few specific questions. Instead of:

for name in names.split():
freq[name] = 1 + freq.get(name, 0)

I might try:

for name in names.split():
try:
freq[name] += 1
except KeyError:
freq[name] = 1

Which is preferred?

Ditto for:

deco = zip([-x for x in freq.values()], freq.keys())

versus:

deco = zip(map(operator.neg, freq.values()), freq.keys())

Finally, I might replace:

for v, k in deco:
print "%-10s: %d" % (k, -v)

with:

print "\n".join("%-10s: %d" % (k, -v) for v, k in deco)

Any feedback on good Python style, performance tips, good books
to read, etc. is appreciated.

Thanks,
/-\

Make the switch to the world's best email. Get the new Yahoo!7 Mail now.www.yahoo7.com.au/worldsbestemail

Jan 11 '08 #6

Paul Rubin

rent <re******@gmail.comwrites:

keys = freq.keys()
keys.sort(key = freq.get, reverse = True)
for k in keys:
print "%-10s: %d" % (k, freq[k])

I prefer (untested):

def snd((x,y)): return y # I wish this was built-in
sorted_freq = sorted(freq.iteritems(), key=snd, reverse=True)
for k,f in sorted_freq:
print "%-10s: %d" % (k, f)

Jan 11 '08 #7

Mike Meyer

On 11 Jan 2008 03:50:53 -0800 Paul Rubin <"http://phr.cx"@NOSPAM.invalidwrote:

rent <re******@gmail.comwrites:
keys = freq.keys()
keys.sort(key = freq.get, reverse = True)
for k in keys:
print "%-10s: %d" % (k, freq[k])

I prefer (untested):

def snd((x,y)): return y # I wish this was built-in

What's wrong with operator.itemgetter?

sorted_freq = sorted(freq.iteritems(), key=snd, reverse=True)

(still untested)

from operator import itemgetter
sorted_freq = sorted(freq.iteritems(), key=itemgetter(2), reverse=True)

<mike

--
Mike Meyer <mw*@mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

Jan 11 '08 #8

Hrvoje Niksic

Mike Meyer <mw*@mired.orgwrites:

On 11 Jan 2008 03:50:53 -0800 Paul Rubin <"http://phr.cx"@NOSPAM.invalidwrote:

>rent <re******@gmail.comwrites:
keys = freq.keys()
keys.sort(key = freq.get, reverse = True)
for k in keys:
print "%-10s: %d" % (k, freq[k])

I prefer (untested):

def snd((x,y)): return y # I wish this was built-in

What's wrong with operator.itemgetter?

> sorted_freq = sorted(freq.iteritems(), key=snd, reverse=True)

(still untested)

from operator import itemgetter
sorted_freq = sorted(freq.iteritems(), key=itemgetter(2), reverse=True)

It should be itemgetter(1). See how easy it is to get it wrong? :-)
(Okay, this was too easy a shot to miss out on; I actually like
itemgetter.)

Jan 11 '08 #9

Similar topics

language learning vs. process

by: Ryan Walker | last post by:

Hi, I'm getting started with python and have almost zero programming experience. I'm finding that there are tons of tutorials on the internet -- such as the standard tutorial at python.org -- that...

Python

Python or PHP?

by: Lad | last post by:

Is anyone capable of providing Python advantages over PHP if there are any? Cheers, L.

Python

Starting University COSC and learning JAVA, advice please :D

by: David Van D | last post by:

Hi there, A few weeks until I begin my journey towards a degree in Computer Science at Canterbury University in New Zealand, Anyway the course tutors are going to be teaching us JAVA wth bluej...

Java

Learning C

by: mfasoccer | last post by:

I am sorry if this is an inappropriate place to put this post, if so please delete it. I am wondering about a few things. Do you guys recommend learning C as a second language, as someone who...

C / C++

Getting started with python

by: Eric | last post by:

Hello, after reading some of the book Programming Python it seems that python is something I would like to delve deeper into. The only thing is, I have no idea what I should try and write. So I was...

Python

word count problem

by: waynejr25 | last post by:

can anyone debug my program and get it to run. #include <fstream> #include <iostream> #include <string> #include <cstdlib> #include <map> using namespace std;

C / C++

how to modify my code to get every word & previos word from file? please help

by: alivip | last post by:

I write code to get most frequent words in the file I won't to implement bigram probability by modifying the code to do the following: How can I get every Token (word) and ...

Python

my code is trying to get double word from multube files but give errore please help

by: alivip | last post by:

How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word my code is trying to get all single word and double word (every Token (word) and...

Python

dict invert - learning question

by: dave | last post by:

Hello, here is a piece of code I wrote to check the frequency of values and switch them around to keys in a new dictionary. Just to measure how many times a certain key occurs: def...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server