Counting number of each item in a list.

sophie_newbie

I have a list a little something like this:

StringA
StringC
StringB
StringA
StringC
StringD
StringA
....
etc.

Basically I was wondering if there was an easy way to return how many
of each string are in the list, something like this:

StringA - 3
StringB - 1
StringC - 2
StringD - 1

I suppose that the easiest way to do that is to convert it to a 2
dimensional array? Is there any easy way?

Thanks.

Mar 19 '06 #1

Subscribe Post Reply

8008

Paul Rubin

"sophie_newbie" <pa**********@gmail.com> writes:

I suppose that the easiest way to do that is to convert it to a 2
dimensional array? Is there any easy way?

Use a hash table. Untested:

h = 0
for x in your_list:
h[x] = h.get(x, 0) + 1

You can then use something like sorted(h.items()) to get a sorted list
of those counts.

Mar 19 '06 #2

Bruno Desthuilliers

sophie_newbie a écrit :

I have a list a little something like this:

StringA
StringC
StringB
StringA
StringC
StringD
StringA
...
etc.

Basically I was wondering if there was an easy way to return how many
of each string are in the list, something like this:

StringA - 3
StringB - 1
StringC - 2
StringD - 1

There is.

str_list = ['StringA', 'StringC', 'StringB',
'StringA', 'StringC', 'StringD',
'StringA' ]

str_counts = dict((s, str_list.count(s) for s in set(str_list))

Mar 19 '06 #3

sophie_newbie

Hey Bruno,

I got an invalid syntax error when i tried using your "str_counts =
dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
there is a missing bracket or comma? Or maybe I need to import
something.

Thanks so much for your help.

Mar 19 '06 #4

sophie_newbie

Hi Paul,

Ur bit of code works, thanks so much, just for anyone reading this use
'h = {}' instead of h = 0.

Thanks

Mar 19 '06 #5

Kent Johnson

sophie_newbie wrote:

Hey Bruno,

I got an invalid syntax error when i tried using your "str_counts =
dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
there is a missing bracket or comma? Or maybe I need to import
something.

It should be
str_counts = dict((s, str_list.count(s)) for s in set(str_list))

or for Python < 2.4
str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])

Note that this solution iterates str_list once for each element of
str_list - the call to count traverses the entire list to create the
count. I expect Paul Rubin's solution will be dramatically faster for
large lists as it only iterates str_list once.

Kent

Mar 19 '06 #6

Bruno Desthuilliers

sophie_newbie a écrit :

Hey Bruno,

I got an invalid syntax error when i tried using your "str_counts =
dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
there is a missing bracket or comma?
Yes, sorry, see Kent's post for the correction.
Or maybe I need to import
something.

Nope, unless you're running python < 2.4

Mar 19 '06 #7

bearophileHUGS

With CPython this is probably the faster version, the version with
list.count() has to be used only if you are sure to always have a short
input list, because it's O(n^2):

str_list = ['StringA', 'StringC', 'StringB',
'StringA', 'StringC', 'StringD',
'StringA' ]

str_counts = {}
for s in str_list:
if s in str_counts:
str_counts[s] += 1
else:
str_counts[s] = 1

Bye,
bearophile

Mar 19 '06 #8

Bruno Desthuilliers

Kent Johnson a écrit :

sophie_newbie wrote:
Hey Bruno,

I got an invalid syntax error when i tried using your "str_counts =
dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
there is a missing bracket or comma? Or maybe I need to import
something.

It should be
str_counts = dict((s, str_list.count(s)) for s in set(str_list))

Of course, my bad :(
or for Python < 2.4
from sets import Set as set str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])

Note that this solution iterates str_list once for each element of
str_list
once for each *distinct* element or str_list (it iterates over the set
created from str_list).
- the call to count traverses the entire list to create the
count. I expect Paul Rubin's solution will be dramatically faster for
large lists as it only iterates str_list once.

Yeps. But I would benchmark it before choosing one or the other solution.

Mar 19 '06 #9

Bruno Desthuilliers

Bruno Desthuilliers a écrit :

Kent Johnson a écrit :
sophie_newbie wrote:
Hey Bruno,

I got an invalid syntax error when i tried using your "str_counts =
dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe
there is a missing bracket or comma? Or maybe I need to import
something.

It should be
str_counts = dict((s, str_list.count(s)) for s in set(str_list))

Of course, my bad :(
or for Python < 2.4

from sets import Set as set
str_counts = dict([(s, str_list.count(s)) for s in set(str_list)])

Note that this solution iterates str_list once for each element of
str_list

once for each *distinct* element or str_list (it iterates over the set
created from str_list).
- the call to count traverses the entire list to create the count. I
expect Paul Rubin's solution will be dramatically faster for large
lists as it only iterates str_list once.

Yeps. But I would benchmark it before choosing one or the other solution.

And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.

nb: A is mine, B is Paul's and C is bearophile's, and the number after
is the size of the list...

A100 (10000 times): 1.5801050663
B100 (10000 times): 1.87287902832
C100 (10000 times): 0.991976976395
A10000 (100 times): 1.083589077
B10000 (100 times): 1.30713891983
C10000 (100 times): 0.988032817841
A1000000 (10 times): 10.5345788002
B1000000 (10 times): 13.094493866
C1000000 (10 times): 9.67438292503
source:

def countA(lst):
return dict((s, lst.count(s)) for s in set(lst))

def countB(lst):
counts = {}
for s in lst:
counts[s] = counts.get(s, 0)+1
return counts

def countC(lst):
counts = {}
for s in lst:
if s in counts:
counts[s] += 1
else:
counts[s] = 1
return counts

def mklst(ln):
from random import choice
items = ['abc', 'def', 'ghi', 'jkl', 'mno',
'pqr', 'stu', 'vwx', 'yz']
return [choice(items) for i in range(ln)]

lst100 = mklst(100)
lst10000 = mklst(10000)
lst1000000 = mklst(1000000)

def run():
from timeit import Timer

timers = [
('A100',
Timer('countA(lst100)',
'from __main__ import countA, lst100'),
10000),
('B100',
Timer('countB(lst100)',
'from __main__ import countB, lst100'),
10000),
('C100',
Timer('countC(lst100)',
'from __main__ import countC, lst100'),
10000),
('A10000',
Timer('countA(lst10000)',
'from __main__ import countA, lst10000'),
100),
('B10000',
Timer('countB(lst10000)',
'from __main__ import countB, lst10000'),
100),
('C10000',
Timer('countC(lst10000)',
'from __main__ import countC, lst10000'),
100),
('A1000000',
Timer('countA(lst1000000)',
'from __main__ import countA, lst1000000'),
10),
('B1000000',
Timer('countB(lst1000000)',
'from __main__ import countB, lst1000000'),
10),
('C1000000',
Timer('countC(lst1000000)',
'from __main__ import countC, lst1000000'),
10),
]

for name, timer, repeat in timers:
print "%s (%s times): %s" % (name,
repeat,
timer.timeit(repeat))

Mar 19 '06 #10

Paul Rubin

Bruno Desthuilliers <bd*****************@free.quelquepart.fr> writes:

And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large
lists.

nb: A is mine, B is Paul's and C is bearophile's, and the number after
is the size of the list...

Interesting. I wonder if you could try it with a much larger number
of distinct values in the list.

Mar 19 '06 #11

Justin Azoff

Bruno Desthuilliers wrote:

And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.

Your version is only fast on lists with a very small number of unique
elements.

changing mklist to have
items = range(64) instead of the 9 item list and re-timing you will get
"better" results:

A100 (10000 times): 7.63829684258
B100 (10000 times): 1.34028482437
C100 (10000 times): 0.812223911285

A10000 (100 times): 9.78499102592
B10000 (100 times): 1.26520299911
C10000 (100 times): 0.857560873032

A1000000 (10 times): 87.6713900566
B1000000 (10 times): 12.7302949429
C1000000 (10 times): 8.35931396484

--
- Justin

Mar 19 '06 #12

Bruno Desthuilliers

Paul Rubin a écrit :

Bruno Desthuilliers <bd*****************@free.quelquepart.fr> writes:
And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large
lists.

nb: A is mine, B is Paul's and C is bearophile's, and the number after
is the size of the list...

Interesting. I wonder if you could try it with a much larger number
of distinct values in the list.

Please do, this could be interesting...

Mar 19 '06 #13

Bruno Desthuilliers

Justin Azoff a écrit :

Bruno Desthuilliers wrote:
And of course, I was right. My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.

Your version is only fast on lists with a very small number of unique
elements.

changing mklist to have
items = range(64) instead of the 9 item list and re-timing you will get
"better" results:

A100 (10000 times): 7.63829684258
B100 (10000 times): 1.34028482437
C100 (10000 times): 0.812223911285

A10000 (100 times): 9.78499102592
B10000 (100 times): 1.26520299911
C10000 (100 times): 0.857560873032

A1000000 (10 times): 87.6713900566
B1000000 (10 times): 12.7302949429
C1000000 (10 times): 8.35931396484

Lol !-)

So much for my benchmarking skills...

Mar 19 '06 #14

Bruno Desthuilliers

Justin Azoff a écrit :

Bruno Desthuilliers wrote:

My solution seems to be faster than Paul's
one (but slower than bearophile's), be it on small, medium or large lists.

Your version is only fast on lists with a very small number of unique
elements.

changing mklist to have
items = range(64) instead of the 9 item list and re-timing you will get
"better" results:

A100 (10000 times): 7.63829684258
B100 (10000 times): 1.34028482437
C100 (10000 times): 0.812223911285

A10000 (100 times): 9.78499102592
B10000 (100 times): 1.26520299911
C10000 (100 times): 0.857560873032

A1000000 (10 times): 87.6713900566
B1000000 (10 times): 12.7302949429
C1000000 (10 times): 8.35931396484

Lol ! So much for my benchmarking skills :-/

Mar 19 '06 #15

per9000

Dear counters,

I have also encountered the above problem, and I frequently use the two
blocks of code below.

The first one is commented already - except that this version does not
start from an empty dict.

The second "inverts" the first one. Meaning that a dict in word2hits
style:

my_dict['the'] = 120
my_dict['word'] = 4
....
my_dict['gene'] = 4

becomes a dict in hits2words style:

new_dict[4] = ['word','gene', ...]
new_dict[120] = ['the', ...]

I used these to count genes in in abstracts from pubmed the when I
"invented" them (but it took some time to remove them from the code and
use them as functions).

def dict_add(indict,inlist,init=1):
for item in inlist:
if indict.has_key(item):
indict[item] += 1
else:
indict[item] = init
return indict
# end def

def dict_invert(indict):
new_dict = {}
for key in indict.keys():
if new_dict.has_key(indict[key]):
new_dict[indict[key]].append(key)
else:
new_dict[indict[key]] = [key]
return new_dict
#end def

A good idea could be to change the header of the first one (if you want
the option to start counting from zero) into:
def dict_add(inlist,indict={},init=1):

/P9K

Mar 20 '06 #16

Peter Otten

per9000 wrote:

A good idea could be to change the header of the first one (if you want
the option to start counting from zero) into:
def dict_add(inlist,indict={},init=1):

make that

def dict_add(inlist, indict=None, init=1):
if indict is None:
indict = {}

See "5. Mutable default arguments" at
http://zephyrfalcon.org/labs/python_pitfalls.html for an explanation.

Peter

Mar 20 '06 #17

by: Mike | last post by:

I have a mulitpage form that I do not record the information into a dbase until the end. I need to pass the field values form on form to the next. I have the code for automatically generating...

ASP / Active Server Pages

Counting number of selects made

by: A Seel | last post by:

COUNTING NUMBER OF SELECTS MADE table mytable { id, data, hits }

Microsoft SQL Server

Counting number of IO Accesses per DB

by: Sourabh | last post by:

Hello, I am trying to write a VB.NET monitoring application for a MS SQL server. For that I need to know the following: 1. How do I count the number of read/write accesses per database on the...

Microsoft SQL Server

How do I enumerate each item in a combobox?

by: Tony | last post by:

TIA

C# / C Sharp

Adding attribute for each item in a checkbox list control

by: vikram | last post by:

How can i add a user define attribute to each checkbox item in a checkbox list control. I want to add a attribute "Tag". Is it possible

ASP.NET

Datalist- Panel for each item? Possible?

by: footballhead | last post by:

I have a datalist on a page and I'd like to inlcude a panel that I can show or hide for each item in the list. Actually, I only want to show 1 panel at a time. Is this possible? I was thinking...

ASP.NET

For each item in Request.Form

by: Drew | last post by:

I need to iterate through a submitted form, inserting data on each pass. In the past, I have always used form elements that were named with numbers at the end, like this, name1 relationship1...

ASP / Active Server Pages

Multiple checkboxlist for each item in listbox

by: Priya | last post by:

I have to display a checboxlist with someitems when each item in a listbox is selected. I have a listbox with a list of categories and when i click each category from the listbox, the checkbox list...

.NET Framework

SQL for counting number of times a varchar repeated in a table

by: raghunath1981 | last post by:

Hi All, Could you please help me with writing a sql for counting number of times a varchar (1000) repeated in the table. I think I cannot use groupby/distinct etc. Please give me an idea how I...

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Counting number of each item in a list.

Similar topics