By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 1,963 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

[Python] Read .txt file and analayze

P: 1
Hello all;

I'm working huffman coding of any .txt file, so first I need to analyse this text file. I need to read it, then analyse.
I need "exit" like table:
****************************
letter frequency(how many times same latter repeated) Huffman code(this will come later)

************************

I started with:
Expand|Select|Wrap|Line Numbers
  1. f = open('test.txt', 'r')    #open test.tx
  2.     for lines in f:
  3.         print lines          #to ensure if all work...
  4.  
Can anyone help me?
Dec 4 '10 #1
Share this Question
Share on Google+
4 Replies


Sean Pedersen
P: 30
Expand|Select|Wrap|Line Numbers
  1. def countLetters(line, letter):
  2.     ret = 0
  3.     for character in line:
  4.         if character == letter: ret += 1
  5.     return ret
  6.  
  7. for line in open("file.txt"):
  8.     line = line.strip()
  9.     print line
  10.     print "Has", countLetters(line, "a"), "of letter a."
  11.     print "Has", countLetters(line, "e"), "of letter e."
  12.     print "Has", countLetters(line, "i"), "of letter i."
  13.     print "Has", countLetters(line, "o"), "of letter o."
  14.     print "Has", countLetters(line, "u"), "of letter u."
  15.     print "Has", countLetters(line, "y"), "of letter y."
  16.     print
Dec 5 '10 #2

P: 8
Modifying Sean's Code:

Expand|Select|Wrap|Line Numbers
  1. #!/bin/python3
  2.  
  3. def countLetters(line, letter):
  4.    ret = 0
  5.    for character in line:
  6.       if character == letter: ret += 1
  7.    return ret
  8.  
  9. alphabet = 'abcdefghijklmnopqrstuvwxyz'
  10. #you can automate this using ascii codes too
  11.  
  12. for line in open("file.txt",'r'):
  13.    line = line.strip() #remove trailing spaces
  14.    print (line)
  15.    for letter in alphabet:
  16.       print ('Has {0} of letter {1}'.\
  17.       format(str(countLetters(line,letter)),letter))

output:

gbfchjwshfcjkwhndfnxh;iquw;qemiziqmeuzngyegbfyewgy bqgzeqzydglndhlqhd;jkjnenjcejnrcjercbvvbdbggngnjmm nmsnvsdmsnfsfcsf>mNmvbmsdbfluaheregnctfbaxzmasqojm wqi;htrugttgp
Has 3 of letter a
Has 9 of letter b
Has 7 of letter c
Has 7 of letter d
Has 10 of letter e
Has 9 of letter f
Has 12 of letter g
Has 8 of letter h
Has 4 of letter i
Has 9 of letter j
Has 2 of letter k
Has 3 of letter l
Has 11 of letter m
Has 13 of letter n
Has 1 of letter o
Has 1 of letter p
Has 8 of letter q
Has 4 of letter r
Has 8 of letter s
Has 4 of letter t
Has 4 of letter u
Has 4 of letter v
Has 5 of letter w
Has 2 of letter x
Has 4 of letter y
Has 5 of letter z
Dec 27 '10 #3

P: 2
Expand|Select|Wrap|Line Numbers
  1. #-------------------------------------------------#
  2. # Set Variables                                   #
  3. #-------------------------------------------------#
  4.  
  5. input = open("file.txt")
  6. whitelist = ('a','b','c','d','e','f','g') # whitelist of letters
  7. letters = {}
  8.  
  9. #-------------------------------------------------#
  10. #  Functions                                      #
  11. #-------------------------------------------------#
  12.  
  13. def count_letter(c):
  14.   if c in letters:
  15.     letters[c] += 1  # if letter in letters add one
  16.   else:
  17.     letters[c] = 1   # if letter not in letters set add letter to dictionary object
  18.  
  19.  
  20. def print_letters(letters):
  21.  
  22.   for k,v in letters.items():
  23.     if k in whitelist:
  24.       print "Has %s of letter %s" % (v,k) # print out count for each letter
  25.  
  26.  
  27. #-------------------------------------------------#
  28. #  Run code                                       #
  29. #-------------------------------------------------#
  30.  
  31.  
  32. for line in input:          # for each line in input file
  33.   for letter in line:       # for each letter in line
  34.     count_letter(letter)    # tally a count of each letter
  35.  
  36. print_letters(letters)
  37.  
Here I use a more pythonic syntax, which means less lines of code. If you count everything and whitelist the characters your concerned with then your code can be easily modified in the future.

Hope this helps!
Dec 31 '10 #4

P: 8
Very neat Michael
Dec 31 '10 #5

Post your reply

Sign in to post your reply or Sign up for a free account.