my code is trying to get all single word and double word (every Token (word) and PreviousToken(Previous word)) from multube files and get frequency of both. it can get for single word but double word give error
line 50, in most_frequant_word
word1+= ' ' + word_list[ix+1]
IndexError: list index out of range
Expand|Select|Wrap|Line Numbers
- import __future__
- import Tkinter as tk
- import os, glob
- import sys
- import string
- import re
- import tkFileDialog
- def most_frequant_word():
- browser= tkFileDialog.askdirectory()
- word_freq={}
- word_freq1={}
- count11=0
- for root, dirs, files in os.walk(browser):
- text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
- text1.insert(tk.INSERT, "\n")
- for idx, file in enumerate(files):
- ff = open (os.path.join(root, file), "r")
- text = ff.read ( )
- ff.close ( )
- word_list = text.split()
- my_list = text.split()
- count11=len(word_list)+count11
- text1.insert(tk.INSERT, "total number of tokens %s" % pair_list)
- text1.insert(tk.INSERT, "\n")
- for ix, word in enumerate(word_list):
- word = word.lower()
- word = word.rstrip('.,/"\ -_;\[](){} ')
- # build the dictionary
- word1=word
- word1+= ' ' + word_list[ix+1]
- count = word_freq.get(word, 0)
- word_freq[word] = count + 1
- count1 = word_freq1.get(word1,0)
- word_freq1[word1] = count1 + 1
- # create a list of (freq, word) tuples
- freq_list = [(word,freq ) for freq,word in word_freq.items()]
- freq_list1 = [(word1,freq1 ) for freq1,word1 in word_freq.items()]
- # sort the list by the first element in each tuple (default)
- freq_list.sort(reverse=True)
- freq_list1.sort(reverse=True)
- for n, tup in enumerate(freq_list1):
- text1.insert(tk.INSERT, "%s times: %s" % tup)
- text1.insert(tk.INSERT, "\n")
- root = tk.Tk(className = " most_frequant_word")
- # text entry field, width=width chars, height=lines text
- v1 = tk.StringVar()
- text1 = tk.Text(root, width=50, height=50, bg='green')
- text1.pack()
- # function listed in command will be executed on button click
- button1 = tk.Button(root, text='Brows', command=most_frequant_word)
- button1.pack(pady=5)
- text1.focus()
- root.mainloop()
For example if the text file content is
"Every man has a price. Every woman has a price."
First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"
Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"
Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)
please I need help