473,407 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

how to modify my code to get every word & previos word from file? please help

17
I write code to get most frequent words in the file
I won't to implement bigram probability by modifying the code to do the following:
How can I get every Token (word) and PreviousToken(Previous word) and frequency and probability
From text file and put each one in cell in table

For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

Probability of "has a" is 2/10 (Frequency of "has a" divided by all word )
Probability of "a price" is 2/10 (Frequency of "a price" divided by all word )
Probability of "Every man" is 1/10 (Frequency "Every man" divided by all word )

Probability of "man has" is 1/10 (Frequency of man has" divided by all word )

Probabilityof "Every woman" is 1/10 (Frequency of "Every woman" divided by all word )
Probability of "woman has" is 1/10 (Frequency of "woman has" divided by all word )


Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  browser= tkFileDialog.askdirectory()
  17.  #browser= os.listdir(a)
  18.  
  19.  word_freq = {}
  20.  for root, dirs, files in os.walk(browser):
  21.     #print 'Looking into %s' % root.split('\\')[-1]
  22.     #print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.     text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  24.     text1.insert(tk.INSERT, "\n")
  25.     for idx, file in enumerate(files):
  26.  
  27.      print 'File #%d: %s' % (idx + 1, file)
  28.        #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
  29.        #text1.insert(tk.INSERT, "\n")
  30.      ff = open (os.path.join(root, file), "r")
  31.      text = ff.read ( )
  32.      ff.close ( )
  33.      #word_freq = {}     
  34.      word_list = text.split()
  35.      for word in word_list:
  36.       word = word.lower()
  37.       word = word.rstrip('.,/"\ -_;\[](){} ')
  38.  
  39.       #if word.isalpha():
  40.                 # build the dictionary
  41.       count = word_freq.get(word, 0)
  42.       word_freq[word] = count + 1
  43.  
  44.        # create a list of (freq, word) tuples
  45.       freq_list = [(word,freq ) for freq,word  in word_freq.items()]
  46.  
  47.        # sort the list by the first element in each tuple (default)
  48.       freq_list.sort(reverse=True)
  49.  
  50.      for n, tup in enumerate(freq_list):
  51.     # print the first ten items
  52.       if n < 5:
  53.        if idx == 3:  
  54.         print "%s times: %s" % tup
  55.         text1.insert(tk.INSERT, "%s times: %s" % tup)
  56.        #text1.insert(tk.INSERT, word)
  57.         text1.insert(tk.INSERT, "\n")
  58.  
  59. # raw_input('\nHit enter to exit')
  60.  
  61. root = tk.Tk(className = " most_frequant_word")
  62. # text entry field, width=width chars, height=lines text
  63. v1 = tk.StringVar()
  64. text1 = tk.Text(root, width=50, height=50, bg='green')
  65. text1.pack()
  66. # function listed in command will be executed on button click
  67. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  68. button1.pack(pady=5)
  69. text1.focus()
  70. root.mainloop()
May 16 '08 #1
0 1887

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Federico | last post by:
Hello, I have a problem: I want to increase the "upload_max_filesize" to upload bigger than 2Mb files. I have modified the php.ini file, but php continues applying the previos 2Mb limit. What...
4
by: pmud | last post by:
Hi I have a website (ASP.NET project using C# ) which is already put up on the server. I need to make some modification to some web pages.So the project files were copied to the a different server...
3
by: Timppa | last post by:
My problem is how could I get previos record in the form after I had deleted the current record. What should I write after code below ? docmd.showallrecords or what ? If I do so I'll get error...
6
by: Paolo Pignatelli | last post by:
I have an aspx code behind page that goes something like this in the HTML view: <asp:HyperLink id=HyperLink1 runat="server" NavigateUrl='<%#"mailto:" &amp;...
6
by: TPJ | last post by:
Help me please, because I really don't get it. I think it's some stupid mistake I make, but I just can't find it. I have been thinking about it for three days so far and I still haven't found any...
0
by: noobcprogrammer | last post by:
#include "IndexADT.h" int IndexInit(IndexADT* word) { word->head = NULL; word->wordCount = 0; return 1; } int IndexCreate(IndexADT* wordList,char* argv)
8
by: Bllich | last post by:
hello, I have winForm app and I have some text and pictures that I want to save into a word file when I read it from a database. I don't know how many text or pictures do I have for one value in...
5
by: nasse | last post by:
I am getting the following error msg whenever I try to login. I tried to turn my output_buffering = On in my php.ini but is not working for me. Would you please help me: Warning: Cannot modify...
5
by: alivip | last post by:
How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word my code is trying to get all single word and double word (every Token (word) and...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.