<kr*********@gm ail.comwrote:
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....
Well you have some right. I will make it more specific.
I have got something like that:
import os, os.path
def wyswietlanie_dr zewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_ path):
full_path = os.path.join(di r_path, name)
print full_path
if os.path.isdir(f ull_path):
wyswietlanie_dr zewa(full_path)
My question is how to get word frequencies from this files?
I will be glad to get any help.
You may want to consider os.walk as an alternative way to get all files;
it's easy to wrap it into a generator yielding all files in the subtree.
This, I would think, is the proper factoring in Python: have a generator
yielding each file, and a function taking a file and returning the word
frequencies for that one file. This neatly separates the two halves of
the task -- and you can easily factor things down further...
Give a text file, you can iterate on it: the items are the lines. Given
a line, you can extract all words in it and iterate on those: look at
the re module, and the \w feature of regular-expression pattern strings.
So, a generator that turns a file into a stream of words is also an easy
sub-task to accomplish.
Given a stream of words, and a set of "interestin g words", it's easy to
count the occurrences of interesting words. There, I'll supply that
part, to entice you to write the others, and thereby perhaps learn some
Python...:
def count_interesti ng_words(all_wo rds, interesting_wor ds):
d = dict.fromkeys(i nteresting_word s, 0)
for word in all_words:
if word in d: d[word] += 1
return d
Alex