By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,952 Members | 916 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,952 IT Pros & Developers. It's quick & easy.

computing uni-gram and bigram probability using python

P: 9
I have 2 files. I have to calculate the monogram (uni-gram) and at the next step calculate bi-gram probability of the first file in terms of the words repetition of the second file. (the files are text files). for this, first I have to write a function that calculates the number of total words and unique words of the file, because the monogram is calculated by the division of unique word to the total word for each word. and at last write it to a new file. The code I wrote(it's just for computing uni-gram) doesn't work. how can I change it to work correctly? and how can I calculate bi-grams probability?

Expand|Select|Wrap|Line Numbers
  1. def CalculateMonoGram (file1, file2):
  2.     with open (file1, encoding="utf_8") as f1:
  3.         counts={}
  4.         s1=f1.read()
  5.         x1=s1.split()
  6.         for word in x1:
  7.             counts[word]=counts.get(word,0)+1
  8.  
  9.         total=sum(counts.values())            
  10.  
  11.     with open (file2, encoding="utf_8") as f2:
  12.         s2=f2.read()
  13.         x2=s2.split()
  14.  
  15.  
  16.     monogram=[]
  17.     for item in x2:
  18.         monogram[item]=counts(item)/total
  19.  
  20.  
  21.     with open ("LexiconMonogram.txt", "w", encoding="utf_8") as f3:
  22.         f3.write(monogram)
May 18 '15 #1
Share this question for a faster answer!
Share on Google+

Post your reply

Sign in to post your reply or Sign up for a free account.