how to modify my code to get every word & previos word from file? please help

I write code to get most frequent words in the file
I won't to implement bigram probability by modifying the code to do the following:
How can I get every Token (word) and PreviousToken(Previous word) and frequency and probability
From text file and put each one in cell in table

For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"

Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

Probability of "has a" is 2/10 (Frequency of "has a" divided by all word )
Probability of "a price" is 2/10 (Frequency of "a price" divided by all word )
Probability of "Every man" is 1/10 (Frequency "Every man" divided by all word )

Probability of "man has" is 1/10 (Frequency of man has" divided by all word )

Probabilityof "Every woman" is 1/10 (Frequency of "Every woman" divided by all word )
Probability of "woman has" is 1/10 (Frequency of "woman has" divided by all word )

Expand|Select|Wrap|Line Numbers

 # a look at the Tkinter Text widget
 
# use ctrl+c to copy, ctrl+x to cut selected text,
 
# ctrl+v to paste, and ctrl+/ to select all

  # count words in a text and show the first ten items

 # by decreasing frequency
 
import Tkinter as tk

import os, glob

import sys

import string

import re

import tkFileDialog      

def most_frequant_word():    

 browser= tkFileDialog.askdirectory()

 #browser= os.listdir(a)
 
 word_freq = {}

 for root, dirs, files in os.walk(browser):

    #print 'Looking into %s' % root.split('\\')[-1]

    #print 'Found %d dirs and %d files' % (len(dirs), len(files))

    text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))

    text1.insert(tk.INSERT, "\n")

    for idx, file in enumerate(files):
 
     print 'File #%d: %s' % (idx + 1, file)

       #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))

       #text1.insert(tk.INSERT, "\n")

     ff = open (os.path.join(root, file), "r")

     text = ff.read ( )

     ff.close ( )

     #word_freq = {}     

     word_list = text.split()

     for word in word_list:

      word = word.lower()

      word = word.rstrip('.,/"\ -_;\[](){} ')
 
      #if word.isalpha():

                # build the dictionary

      count = word_freq.get(word, 0)

      word_freq[word] = count + 1
 
       # create a list of (freq, word) tuples

      freq_list = [(word,freq ) for freq,word  in word_freq.items()]
 
       # sort the list by the first element in each tuple (default)

      freq_list.sort(reverse=True)
 
     for n, tup in enumerate(freq_list):

    # print the first ten items

      if n < 5:

       if idx == 3:  

        print "%s times: %s" % tup

        text1.insert(tk.INSERT, "%s times: %s" % tup)

       #text1.insert(tk.INSERT, word)

        text1.insert(tk.INSERT, "\n")
 
# raw_input('\nHit enter to exit')
 
root = tk.Tk(className = " most_frequant_word")

# text entry field, width=width chars, height=lines text

v1 = tk.StringVar()

text1 = tk.Text(root, width=50, height=50, bg='green')

text1.pack()

# function listed in command will be executed on button click

button1 = tk.Button(root, text='Brows', command=most_frequant_word)

button1.pack(pady=5)

text1.focus()

root.mainloop()

May 16 '08 #1

Subscribe Post Reply

1887

Similar topics

php.ini & uploading files

by: Federico | last post by:

Hello, I have a problem: I want to increase the "upload_max_filesize" to upload bigger than 2Mb files. I have modified the php.ini file, but php continues applying the previos 2Mb limit. What...

PHP

modify web forms on a different server.CANT see the DESIGN VIEW!

by: pmud | last post by:

Hi I have a website (ASP.NET project using C# ) which is already put up on the server. I need to make some modification to some web pages.So the project files were copied to the a different server...

.NET Framework

Deleting in vba code

by: Timppa | last post by:

My problem is how could I get previos record in the form after I had deleted the current record. What should I write after code below ? docmd.showallrecords or what ? If I do so I'll get error...

Microsoft Access / VBA

How to bring aspx code (in HTML view) to the aspx.vb code-behind?

by: Paolo Pignatelli | last post by:

I have an aspx code behind page that goes something like this in the HTML view: <asp:HyperLink id=HyperLink1 runat="server" NavigateUrl='<%#"mailto:" &...

ASP.NET

A problem with some OO code.

by: TPJ | last post by:

Help me please, because I really don't get it. I think it's some stupid mistake I make, but I just can't find it. I have been thinking about it for three days so far and I still haven't found any...

Python

Full code here,experts pls, how to tokenize so strcmp is possible???

by: noobcprogrammer | last post by:

#include "IndexADT.h" int IndexInit(IndexADT* word) { word->head = NULL; word->wordCount = 0; return 1; } int IndexCreate(IndexADT* wordList,char* argv)

C / C++

Word & C#

by: Bllich | last post by:

hello, I have winForm app and I have some text and pictures that I want to save into a word file when I read it from a database. I don't know how many text or pictures do I have for one value in...

C# / C Sharp

Warning: Cannot modify headerm msg

by: nasse | last post by:

I am getting the following error msg whenever I try to login. I tried to turn my output_buffering = On in my php.ini but is not working for me. Would you please help me: Warning: Cannot modify...

PHP

my code is trying to get double word from multube files but give errore please help

by: alivip | last post by:

How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word my code is trying to get all single word and double word (every Token (word) and...

Python

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA