473,399 Members | 3,302 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

my code is trying to get double word from multube files but give errore please help

17
How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word

my code is trying to get all single word and double word (every Token (word) and PreviousToken(Previous word)) from multube files and get frequency of both. it can get for single word but double word give error

line 50, in most_frequant_word
word1+= ' ' + word_list[ix+1]
IndexError: list index out of range


Expand|Select|Wrap|Line Numbers
  1. import __future__
  2. import Tkinter as tk
  3. import os, glob
  4. import sys
  5. import string
  6. import re
  7. import tkFileDialog      
  8. def most_frequant_word():
  9.  browser= tkFileDialog.askdirectory()
  10.  word_freq={}
  11.  word_freq1={}
  12.  count11=0
  13.  for root, dirs, files in os.walk(browser):
  14.     text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  15.     text1.insert(tk.INSERT, "\n")
  16.     for idx, file in enumerate(files):
  17.      ff = open (os.path.join(root, file), "r")
  18.      text = ff.read ( )
  19.      ff.close ( )
  20.      word_list = text.split()
  21.      my_list = text.split()
  22.      count11=len(word_list)+count11
  23.      text1.insert(tk.INSERT, "total number of tokens %s" % pair_list)
  24.      text1.insert(tk.INSERT, "\n") 
  25.      for ix, word in enumerate(word_list):
  26.       word = word.lower()
  27.       word = word.rstrip('.,/"\ -_;\[](){} ')
  28.      # build the dictionary
  29.       word1=word
  30.       word1+= ' ' + word_list[ix+1]
  31.       count = word_freq.get(word, 0)
  32.       word_freq[word] = count + 1
  33.       count1 = word_freq1.get(word1,0)
  34.       word_freq1[word1] = count1 + 1
  35.        # create a list of (freq, word) tuples
  36.       freq_list = [(word,freq ) for freq,word  in word_freq.items()]
  37.       freq_list1 = [(word1,freq1 ) for freq1,word1  in word_freq.items()]
  38.        # sort the list by the first element in each tuple (default)
  39.       freq_list.sort(reverse=True)
  40.       freq_list1.sort(reverse=True)
  41.      for n, tup in enumerate(freq_list1):
  42.         text1.insert(tk.INSERT, "%s times: %s" % tup)
  43.         text1.insert(tk.INSERT, "\n")
  44.  
  45. root = tk.Tk(className = " most_frequant_word")
  46. # text entry field, width=width chars, height=lines text
  47. v1 = tk.StringVar()
  48. text1 = tk.Text(root, width=50, height=50, bg='green')
  49. text1.pack()
  50. # function listed in command will be executed on button click
  51. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  52. button1.pack(pady=5)
  53. text1.focus()
  54. root.mainloop()
the code subose to do
For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

please I need help
May 18 '08 #1
5 2519
Laharl
849 Expert 512MB
First, please only post one thread per question. This should probably have gone in your other thread. Your error occurs because when you get to the last element of the list, using ix+1 means that you're outside the list, thus giving you an error.
May 18 '08 #2
alivip
17
ok
but
how can I solve it?
May 18 '08 #3
jlm699
314 100+
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list - 1):
  2.     break
May 19 '08 #4
Laharl
849 Expert 512MB
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list - 1):
  2.     break
Surely you mean:
Expand|Select|Wrap|Line Numbers
  1. if ix == len(my_list)-1:
  2.     break
May 19 '08 #5
jlm699
314 100+
Oh yeah, absolutely. Thanks for the catch, I think I was typing that as fast as I can whilst working...
May 19 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

242
by: James Cameron | last post by:
Hi I'm developing a program and the client is worried about future reuse of the code. Say 5, 10, 15 years down the road. This will be a major factor in selecting the development language. Any...
0
by: Agos | last post by:
Quello che segue è la parte della routine che utilizza la stampa unione per creare un nuovo documento: --------------------- Set docWord =...
25
by: Alvin Bruney | last post by:
C# is great but it does have some short comings. Here, I examine one of them which I definitely think is a shortcoming. Coming from C++, there seems to be no equivalent in C# to separate code...
4
by: Craig831 | last post by:
First off, I apologize if this gets long. I'm simply trying to give you all enough information to help me out. I'm writing (almost finished, actually), my first VB.Net application. It's a forms...
2
by: Peter Ignarson | last post by:
Hi there - I am writing a paint program (I am following a learning tutorial, there is no point to writing a paint program) and I want to extend it so that I can copy the contents of my drawing and...
6
by: TC | last post by:
Hi. I write a program in c language that read a text file and extrapolate the word. for all word the program calculate the number of the times that word is present in the text. The problem is the...
8
by: Frost | last post by:
Hi All, I am a newbie i have written a c program on unix for line by line comparison for two files now could some one help on how i could do word by word comparison in case both lines have the...
14
by: Vlad Dogaru | last post by:
Hello, I am trying to learn C, especially pointers. The following code attempts to count the appearences of each word in a text file, but fails invariably with Segmentation Fault. Please help me...
12
by: StephQ | last post by:
I face the following problem. I wrote a poor's man plotting function: computePlot. This function append some x-values belonging to step and the correseponding f( x ) (for a given const member...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.