473,703 Members | 3,036 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to modify my code to get every word & previos word from file? please help

17 New Member
I write code to get most frequent words in the file
I won't to implement bigram probability by modifying the code to do the following:
How can I get every Token (word) and PreviousToken(P revious word) and frequency and probability
From text file and put each one in cell in table

For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(P revious word) is none(no previos)
Second Token(word) is "man" PreviousToken(P revious word) is "Every"
Third Token(word) is "has" PreviousToken(P revious word) is "man"
Forth Token(word) is "a" PreviousToken(P revious word) is "has"
Fifth Token(word) is "price" PreviousToken(P revious word) is "a"

Sixth Token(word) is "Every" PreviousToken(P revious word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(P revious word) is "Every"
Eighth Token(word) is "has" PreviousToken(P revious word) is "man"
Ninth Token(word) is "a" PreviousToken(P revious word) is "has"
Tenth Token(word) is "price" PreviousToken(P revious word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

Probability of "has a" is 2/10 (Frequency of "has a" divided by all word )
Probability of "a price" is 2/10 (Frequency of "a price" divided by all word )
Probability of "Every man" is 1/10 (Frequency "Every man" divided by all word )

Probability of "man has" is 1/10 (Frequency of man has" divided by all word )

Probabilityof "Every woman" is 1/10 (Frequency of "Every woman" divided by all word )
Probability of "woman has" is 1/10 (Frequency of "woman has" divided by all word )


Expand|Select|Wrap|Line Numbers
  1. # a look at the Tkinter Text widget
  2.  
  3. # use ctrl+c to copy, ctrl+x to cut selected text,
  4.  
  5. # ctrl+v to paste, and ctrl+/ to select all
  6.   # count words in a text and show the first ten items
  7.  # by decreasing frequency
  8.  
  9. import Tkinter as tk
  10. import os, glob
  11. import sys
  12. import string
  13. import re
  14. import tkFileDialog      
  15. def most_frequant_word():    
  16.  browser= tkFileDialog.askdirectory()
  17.  #browser= os.listdir(a)
  18.  
  19.  word_freq = {}
  20.  for root, dirs, files in os.walk(browser):
  21.     #print 'Looking into %s' % root.split('\\')[-1]
  22.     #print 'Found %d dirs and %d files' % (len(dirs), len(files))
  23.     text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
  24.     text1.insert(tk.INSERT, "\n")
  25.     for idx, file in enumerate(files):
  26.  
  27.      print 'File #%d: %s' % (idx + 1, file)
  28.        #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
  29.        #text1.insert(tk.INSERT, "\n")
  30.      ff = open (os.path.join(root, file), "r")
  31.      text = ff.read ( )
  32.      ff.close ( )
  33.      #word_freq = {}     
  34.      word_list = text.split()
  35.      for word in word_list:
  36.       word = word.lower()
  37.       word = word.rstrip('.,/"\ -_;\[](){} ')
  38.  
  39.       #if word.isalpha():
  40.                 # build the dictionary
  41.       count = word_freq.get(word, 0)
  42.       word_freq[word] = count + 1
  43.  
  44.        # create a list of (freq, word) tuples
  45.       freq_list = [(word,freq ) for freq,word  in word_freq.items()]
  46.  
  47.        # sort the list by the first element in each tuple (default)
  48.       freq_list.sort(reverse=True)
  49.  
  50.      for n, tup in enumerate(freq_list):
  51.     # print the first ten items
  52.       if n < 5:
  53.        if idx == 3:  
  54.         print "%s times: %s" % tup
  55.         text1.insert(tk.INSERT, "%s times: %s" % tup)
  56.        #text1.insert(tk.INSERT, word)
  57.         text1.insert(tk.INSERT, "\n")
  58.  
  59. # raw_input('\nHit enter to exit')
  60.  
  61. root = tk.Tk(className = " most_frequant_word")
  62. # text entry field, width=width chars, height=lines text
  63. v1 = tk.StringVar()
  64. text1 = tk.Text(root, width=50, height=50, bg='green')
  65. text1.pack()
  66. # function listed in command will be executed on button click
  67. button1 = tk.Button(root, text='Brows', command=most_frequant_word)
  68. button1.pack(pady=5)
  69. text1.focus()
  70. root.mainloop()
May 16 '08 #1
0 1921

Sign in to post your reply or Sign up for a free account.

Similar topics

2
3131
by: Federico | last post by:
Hello, I have a problem: I want to increase the "upload_max_filesize" to upload bigger than 2Mb files. I have modified the php.ini file, but php continues applying the previos 2Mb limit. What I'm doing wrong, please? Thanks in advance, Federico
4
1739
by: pmud | last post by:
Hi I have a website (ASP.NET project using C# ) which is already put up on the server. I need to make some modification to some web pages.So the project files were copied to the a different server where I could modify tha pages. But when I open the .aspx pages with Visual studio.net, then I just see the HTML code & no design view of the page. How should I edit the page when in Visual Studio I cant see the design view. Just in case...
3
4359
by: Timppa | last post by:
My problem is how could I get previos record in the form after I had deleted the current record. What should I write after code below ? docmd.showallrecords or what ? If I do so I'll get error message: "Index or primary key cannot contain a Null Value" I'am trying to delete current record with following code in vba: Private Sub Delete_record (glngLasno as long)
6
5512
by: Paolo Pignatelli | last post by:
I have an aspx code behind page that goes something like this in the HTML view: <asp:HyperLink id=HyperLink1 runat="server" NavigateUrl='<%#"mailto:" &amp; DataBinder.Eval(Container.DataItem,"StoreEmail") &amp; "&amp;Subject=" &amp; DataBinder.Eval(Container.DataItem,"ProductName") ....
6
2344
by: TPJ | last post by:
Help me please, because I really don't get it. I think it's some stupid mistake I make, but I just can't find it. I have been thinking about it for three days so far and I still haven't found any solution. My code can be downloaded from here: http://www.tprimke.net/konto/PyObject-problem.tar.bz2. There are some scripts for GNU/Linux system (bash to be precise). All you need to know is that there are four classes. (Of course, you may...
0
2182
by: noobcprogrammer | last post by:
#include "IndexADT.h" int IndexInit(IndexADT* word) { word->head = NULL; word->wordCount = 0; return 1; } int IndexCreate(IndexADT* wordList,char* argv)
8
3194
by: Bllich | last post by:
hello, I have winForm app and I have some text and pictures that I want to save into a word file when I read it from a database. I don't know how many text or pictures do I have for one value in a database, it varies from record to record.. I've seen some classes on the net, but I want to use one from Microsoft, so I Added a reference Microsoft Word 9.0 object library and done:
5
1997
by: nasse | last post by:
I am getting the following error msg whenever I try to login. I tried to turn my output_buffering = On in my php.ini but is not working for me. Would you please help me: Warning: Cannot modify header information - headers already sent by (output started at C:\Inetpub ........login\include\header.php:9) in C:\Inetpub\vhosts\.....\httpdocs\login\login.php on line 23 And here is the content of the header.php page which is included in all my...
5
2539
by: alivip | last post by:
How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word my code is trying to get all single word and double word (every Token (word) and PreviousToken(Previous word)) from multube files and get frequency of both. it can get for single word but double word give error line 50, in most_frequant_word word1+= ' ' + word_list IndexError: list index out of range import...
0
8759
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8670
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9252
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9122
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9017
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6588
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5922
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4687
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2455
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.