469,307 Members | 2,131 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,307 developers. It's quick & easy.

computing uni-gram and bigram probability using python

I have 2 files. I have to calculate the monogram (uni-gram) and at the next step calculate bi-gram probability of the first file in terms of the words repetition of the second file. (the files are text files). for this, first I have to write a function that calculates the number of total words and unique words of the file, because the monogram is calculated by the division of unique word to the total word for each word. and at last write it to a new file. The code I wrote(it's just for computing uni-gram) doesn't work. how can I change it to work correctly? and how can I calculate bi-grams probability?

Expand|Select|Wrap|Line Numbers
  1. def CalculateMonoGram (file1, file2):
  2.     with open (file1, encoding="utf_8") as f1:
  3.         counts={}
  4.         s1=f1.read()
  5.         x1=s1.split()
  6.         for word in x1:
  7.             counts[word]=counts.get(word,0)+1
  8.  
  9.         total=sum(counts.values())            
  10.  
  11.     with open (file2, encoding="utf_8") as f2:
  12.         s2=f2.read()
  13.         x2=s2.split()
  14.  
  15.  
  16.     monogram=[]
  17.     for item in x2:
  18.         monogram[item]=counts(item)/total
  19.  
  20.  
  21.     with open ("LexiconMonogram.txt", "w", encoding="utf_8") as f3:
  22.         f3.write(monogram)
May 18 '15 #1
0 1848

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

15 posts views Thread by Brandon J. Van Every | last post: by
4 posts views Thread by The_Incubator | last post: by
8 posts views Thread by Sridhar R | last post: by
29 posts views Thread by 63q2o4i02 | last post: by
9 posts views Thread by dominiquevalentine | last post: by
53 posts views Thread by Vicent Giner | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.