Help | Site Map
Connecting Tech Pros Worldwide
Reply
 
LinkBack Thread Tools
  #1  
Old October 1st, 2008, 03:56 AM
Newbie
 
Join Date: Sep 2008
Posts: 4
Default word frequency

I'm trying to print email address and each of their frequency on it's right by reading a text file. The counting part is correct but it prints other things as well. I'm not sure how to solve this. Hope someone could enlighten me. Thank you very much.

Expand|Select|Wrap|Line Numbers
  1. import string
  2. fname = raw_input("Enter a file name: ")
  3. if len(fname) == 0 :
  4.     print "Assuming mbox-short.txt"
  5.     fname = "mbox-short.txt"
  6.  
  7. try:
  8.     infile = open(fname, "r")
  9. except:
  10.     print "File not found:", fname
  11.     exit()
  12.  
  13. counts = {}
  14. for line in infile:
  15.  
  16.     words = string.split(line)
  17.  
  18.  
  19.  
  20.     if (len (words) > 0 and words[0] == 'From'):
  21.         for w in words:
  22.             counts[w] = counts.get(w,0) + 1
  23.             print w, counts [w]
  24.  
  25.  
  26.  
Reply
  #2  
Old October 1st, 2008, 11:27 AM
bvdet's Avatar
Expert
 
Join Date: Oct 2006
Location: Nashville, TN
Posts: 1,213
Default

Post a representative sample of your text file, and we can help you solve the problem.
Reply
  #3  
Old October 1st, 2008, 01:13 PM
Newbie
 
Join Date: Sep 2008
Posts: 4
Default

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
Received: from holes.mr.itd.umich.edu (holes.mr.itd.umich.edu [141.211.14.79])
by flawless.mail.umich.edu () with ESMTP id m05EEFR1013674;
Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM paploo.uhi.ac.uk (app1.prod.collab.uhi.ac.uk [194.35.219.184])
BY holes.mr.itd.umich.edu ID 477F90B0.2DB2F.12494 ;
5 Jan 2008 09:14:10 -0500
Received: from paploo.uhi.ac.uk (localhost [127.0.0.1])
by paploo.uhi.ac.uk (Postfix) with ESMTP id 5F919BC2F2;
Sat, 5 Jan 2008 14:10:05 +0000 (GMT)
Message-ID: <200801051412.m05ECIaH010327@nakamura.uits.iupui.e du>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Received: from prod.collab.uhi.ac.uk ([194.35.219.182])
by paploo.uhi.ac.uk (JAMES SMTP Server 2.1.3) with SMTP ID 899
for <source@collab.sakaiproject.org>;
Sat, 5 Jan 2008 14:09:50 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (nakamura.uits.iupui.edu [134.68.220.122])
by shmi.uhi.ac.uk (Postfix) with ESMTP id A215243002
for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008 14:13:33 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (localhost [127.0.0.1])
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11) with ESMTP id m05ECJVp010329
for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008 09:12:19 -0500
Received: (from apache@localhost)
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit) id m05ECIaH010327
for source@collab.sakaiproject.org; Sat, 5 Jan 2008 09:12:18 -0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan 5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

Author: stephen.marquard@uct.ac.za
Date: 2008-01-05 09:12:07 -0500 (Sat, 05 Jan 2008)
New Revision: 39772

Modified:
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/ContentServiceSqlOracle.java
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/DbContentService.java
Log:
SAK-12501 merge to 2-5-x: r39622, r39624:5, r39632:3 (resolve conflict from differing linebreaks for r39622)

----------------------
This automatic notification message was sent by Sakai Collab (https://collab.sakaiproject.org/portal) from the Source site.
You can modify how you receive notifications at My Workspace > Preferences.



From louis@media.berkeley.edu Fri Jan 4 18:10:48 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.97])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Fri, 04 Jan 2008 18:10:48 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
Fri, 04 Jan 2008 18:10:48 -0500
Reply
  #4  
Old October 1st, 2008, 02:10 PM
bvdet's Avatar
Expert
 
Join Date: Oct 2006
Location: Nashville, TN
Posts: 1,213
Default

I'm not sure what your problem is, but this will print the email addresses and the number of times they occur:
Expand|Select|Wrap|Line Numbers
  1. fn = 'text.txt'
  2. f = open(fn)
  3.  
  4. emailDict = {}
  5. for line in f:
  6.     if line.lower().startswith('from'):
  7.         try:
  8.             email = line.split()[1]
  9.             emailDict[email] = emailDict.get(email,0)+1
  10.         except:
  11.             pass
  12.  
  13. f.close()
  14.  
  15. for key in emailDict:
  16.     print '%s: %d' % (key, emailDict[key])
I did little to your code except pull the print statements outside the loop.
Reply
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles