By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,610 Members | 2,110 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,610 IT Pros & Developers. It's quick & easy.

my code is trying to get double word from multube files but give errore please help

P: 17
How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word

my code is trying to get all single word and double word (every Token (word) and PreviousToken(Previous word)) from multube files and get frequency of both. it can get for single word but double word give error

line 50, in most_frequant_word
word1+= ' ' + word_list[ix+1]
IndexError: list index out of range


Expand|Select|Wrap|Line Numbers
  1. import __future__
  2. import Tkinter as tk
  3. import os, glob
  4. import sys
  5. import string
  6. import re
  7. import tkFileDialog      
  8. def most_frequant_word():
  9.  browser= tkFileDialog.askdirectory()
  10.  word_freq={}
  11.  word_freq1={}
  12.  count11=0
  13.  for root, dirs, files in os.walk(browser):
  14.     text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  15.     text1.insert(tk.INSERT, "\n")
  16.     for idx, file in enumerate(files):
  17.      ff = open (os.path.join(root, file), "r")
  18.      text = ff.read ( )
  19.      ff.close ( )
  20.      word_list = text.split()
  21.      my_list = text.split()
  22.      count11=len(word_list)+count11
  23.      text1.insert(tk.INSERT, "total number of tokens %s" % pair_list)
  24.      text1.insert(tk.INSERT, "\n") 
  25.      for ix, word in enumerate(word_list):
  26.       word = word.lower()
  27.       word = word.rstrip('.,/"\ -_;\[](){} ')
  28.      # build the dictionary
  29.       word1=word
  30.       word1+= ' ' + word_list[ix+1]
  31.       count = word_freq.get(word, 0)
  32.       word_freq[word] = count + 1
  33.       count1 = word_freq1.get(word1,0)
  34.       word_freq1[word1] = count1 + 1
  35.        # create a list of (freq, word) tuples
  36.       freq_list = [(word,freq ) for freq,word  in word_freq.items()]
  37.       freq_list1 = [(word1,freq1 ) for freq1,word1  in word_freq.items()]
  38.        # sort the list by the first element in each tuple (default)
  39.       freq_list.sort(reverse=True)
  40.       freq_list1.sort(reverse=True)
  41.      for n, tup in enumerate(freq_list1):
  42.         text1.insert(tk.INSERT, "%s times: %s" % tup)
  43.         text1.insert(tk.INSERT, "\n")
  44.  
  45. root = tk.Tk(className = " most_frequant_word")
  46. # text entry field, width=width chars, height=lines text
  47. v1 = tk.StringVar()
  48. text1 = tk.Text(root, width=50, height=50, bg='green')
  49. text1.pack()
  50. # function listed in command will be executed on button click
  51. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  52. button1.pack(pady=5)
  53. text1.focus()
  54. root.mainloop()
the code subose to do
For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

please I need help
May 18 '08 #1
Share this Question
Share on Google+
5 Replies


Expert 100+
P: 849
First, please only post one thread per question. This should probably have gone in your other thread. Your error occurs because when you get to the last element of the list, using ix+1 means that you're outside the list, thus giving you an error.
May 18 '08 #2

P: 17
ok
but
how can I solve it?
May 18 '08 #3

jlm699
100+
P: 314
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list - 1):
  2.     break
May 19 '08 #4

Expert 100+
P: 849
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list - 1):
  2.     break
Surely you mean:
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list)-1:
  2.     break
May 19 '08 #5

jlm699
100+
P: 314
Oh yeah, absolutely. Thanks for the catch, I think I was typing that as fast as I can whilst working...
May 19 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.