473,395 Members | 1,641 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Getting word frequencies from files which are in folder.

Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.

I will be very gratefully for receiving any help.

Apr 4 '07 #1
5 2105
kr*********@gmail.com wrote:
Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

--Irmen
Apr 4 '07 #2
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....
Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_drzewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
print full_path
if os.path.isdir(full_path):
wyswietlanie_drzewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.

Krisbee

Apr 4 '07 #3

<kr*********@gmail.comwrote in message
news:11**********************@e65g2000hsc.googlegr oups.com...
|
| My question is how to get word frequencies from this files?
| I will be glad to get any help.

Go to
http://groups.google.com/group/comp.lang.python/topics
and search on "count word frequency" and you will find several previous
posts on this topic.

tjr

Apr 4 '07 #4
On Apr 4, 2:07 pm, krisbee1...@gmail.com wrote:
My question is how to get word frequencies from this files?
I will be glad to get any help.
--files have a read(), readline(), and readlines() method
--strings have a split() method, which splits the string on
whitespace(e.g. spaces)
--lists have a count() method

Apr 5 '07 #5
<kr*********@gmail.comwrote:
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_drzewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
print full_path
if os.path.isdir(full_path):
wyswietlanie_drzewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.
You may want to consider os.walk as an alternative way to get all files;
it's easy to wrap it into a generator yielding all files in the subtree.

This, I would think, is the proper factoring in Python: have a generator
yielding each file, and a function taking a file and returning the word
frequencies for that one file. This neatly separates the two halves of
the task -- and you can easily factor things down further...

Give a text file, you can iterate on it: the items are the lines. Given
a line, you can extract all words in it and iterate on those: look at
the re module, and the \w feature of regular-expression pattern strings.
So, a generator that turns a file into a stream of words is also an easy
sub-task to accomplish.

Given a stream of words, and a set of "interesting words", it's easy to
count the occurrences of interesting words. There, I'll supply that
part, to entice you to write the others, and thereby perhaps learn some
Python...:

def count_interesting_words(all_words, interesting_words):
d = dict.fromkeys(interesting_words, 0)
for word in all_words:
if word in d: d[word] += 1
return d
Alex
Apr 5 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

41
by: Ruby Tuesday | last post by:
Hi, I was wondering if expert can give me some lite to convert my word table into access database. Note: within each cell of my word table(s), some has multi-line data in it. In addition, there...
1
by: ken | last post by:
I'm using VB.Net to process information out of a word document. If the document fails a test, I would like to close it and move it to an error folder. However, when I try to do that (see below) it...
5
by: jpr | last post by:
Hello, I have a form with a cbo which get's its data from a table. This combo returns names of MS Word files in the following path: C:\shares\files\*.dot I would like to open these files...
2
by: reb0101 | last post by:
hey all, I would very much appreciate any help or ideas on how to do this as I am stumped. I need to develop an access database to track documents but also link to them. I’ll explain what it...
1
by: tnt84 | last post by:
I want to write a program that reads a text file and prints out the word frequencies using structure but I don't know exactly what the word frequency is and how can I write that program using...
12
by: Steve | last post by:
I've been building an application that will merge fields in a text file with a word template, save the resulting word file out to the user's hard drive, and then email the file as an attachment. ...
6
by: SteveM | last post by:
Hi, I am needing some help/advice on how to display a word document in my ASP.NET web pages that can update itself from a word document located on the server. The idea here is that when the user...
3
by: ArmageddonAsh | last post by:
I'm trying to make an application that will allow the user to enter data into a flexgrid (that's done) and then save the data from that flexgrid into a CSV file but even though the file is made none...
13
by: lawpoop | last post by:
Hello all - I have a two part question. First of all, I have a website under /home/user/www/. The index.php and all the other website pages are under /home/user/www/. For functions that are...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.