By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 1,052 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

how to modify my code to get every word & previos word from file? please help

P: 17
I write code to get most frequent words in the file
I won't to implement bigram probability by modifying the code to do the following:
How can I get every Token (word) and PreviousToken(Previous word) and frequency and probability
From text file and put each one in cell in table

For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

Probability of "has a" is 2/10 (Frequency of "has a" divided by all word )
Probability of "a price" is 2/10 (Frequency of "a price" divided by all word )
Probability of "Every man" is 1/10 (Frequency "Every man" divided by all word )

Probability of "man has" is 1/10 (Frequency of man has" divided by all word )

Probabilityof "Every woman" is 1/10 (Frequency of "Every woman" divided by all word )
Probability of "woman has" is 1/10 (Frequency of "woman has" divided by all word )


Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  browser= tkFileDialog.askdirectory()
  17.  #browser= os.listdir(a)
  18.  
  19.  word_freq = {}
  20.  for root, dirs, files in os.walk(browser):
  21.     #print 'Looking into %s' % root.split('\\')[-1]
  22.     #print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.     text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  24.     text1.insert(tk.INSERT, "\n")
  25.     for idx, file in enumerate(files):
  26.  
  27.      print 'File #%d: %s' % (idx + 1, file)
  28.        #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
  29.        #text1.insert(tk.INSERT, "\n")
  30.      ff = open (os.path.join(root, file), "r")
  31.      text = ff.read ( )
  32.      ff.close ( )
  33.      #word_freq = {}     
  34.      word_list = text.split()
  35.      for word in word_list:
  36.       word = word.lower()
  37.       word = word.rstrip('.,/"\ -_;\[](){} ')
  38.  
  39.       #if word.isalpha():
  40.                 # build the dictionary
  41.       count = word_freq.get(word, 0)
  42.       word_freq[word] = count + 1
  43.  
  44.        # create a list of (freq, word) tuples
  45.       freq_list = [(word,freq ) for freq,word  in word_freq.items()]
  46.  
  47.        # sort the list by the first element in each tuple (default)
  48.       freq_list.sort(reverse=True)
  49.  
  50.      for n, tup in enumerate(freq_list):
  51.     # print the first ten items
  52.       if n < 5:
  53.        if idx == 3:  
  54.         print "%s times: %s" % tup
  55.         text1.insert(tk.INSERT, "%s times: %s" % tup)
  56.        #text1.insert(tk.INSERT, word)
  57.         text1.insert(tk.INSERT, "\n")
  58.  
  59. # raw_input('\nHit enter to exit')
  60.  
  61. root = tk.Tk(className = " most_frequant_word")
  62. # text entry field, width=width chars, height=lines text
  63. v1 = tk.StringVar()
  64. text1 = tk.Text(root, width=50, height=50, bg='green')
  65. text1.pack()
  66. # function listed in command will be executed on button click
  67. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  68. button1.pack(pady=5)
  69. text1.focus()
  70. root.mainloop()
May 16 '08 #1
Share this question for a faster answer!
Share on Google+

Post your reply

Sign in to post your reply or Sign up for a free account.