473,671 Members | 2,283 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Getting word frequencies from files which are in folder.

Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.

I will be very gratefully for receiving any help.

Apr 4 '07 #1
5 2113
kr*********@gma il.com wrote:
Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

--Irmen
Apr 4 '07 #2
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....
Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_dr zewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_ path):
full_path = os.path.join(di r_path, name)
print full_path
if os.path.isdir(f ull_path):
wyswietlanie_dr zewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.

Krisbee

Apr 4 '07 #3

<kr*********@gm ail.comwrote in message
news:11******** **************@ e65g2000hsc.goo glegroups.com.. .
|
| My question is how to get word frequencies from this files?
| I will be glad to get any help.

Go to
http://groups.google.com/group/comp.lang.python/topics
and search on "count word frequency" and you will find several previous
posts on this topic.

tjr

Apr 4 '07 #4
On Apr 4, 2:07 pm, krisbee1...@gma il.com wrote:
My question is how to get word frequencies from this files?
I will be glad to get any help.
--files have a read(), readline(), and readlines() method
--strings have a split() method, which splits the string on
whitespace(e.g. spaces)
--lists have a count() method

Apr 5 '07 #5
<kr*********@gm ail.comwrote:
This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_dr zewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_ path):
full_path = os.path.join(di r_path, name)
print full_path
if os.path.isdir(f ull_path):
wyswietlanie_dr zewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.
You may want to consider os.walk as an alternative way to get all files;
it's easy to wrap it into a generator yielding all files in the subtree.

This, I would think, is the proper factoring in Python: have a generator
yielding each file, and a function taking a file and returning the word
frequencies for that one file. This neatly separates the two halves of
the task -- and you can easily factor things down further...

Give a text file, you can iterate on it: the items are the lines. Given
a line, you can extract all words in it and iterate on those: look at
the re module, and the \w feature of regular-expression pattern strings.
So, a generator that turns a file into a stream of words is also an easy
sub-task to accomplish.

Given a stream of words, and a set of "interestin g words", it's easy to
count the occurrences of interesting words. There, I'll supply that
part, to entice you to write the others, and thereby perhaps learn some
Python...:

def count_interesti ng_words(all_wo rds, interesting_wor ds):
d = dict.fromkeys(i nteresting_word s, 0)
for word in all_words:
if word in d: d[word] += 1
return d
Alex
Apr 5 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

41
6089
by: Ruby Tuesday | last post by:
Hi, I was wondering if expert can give me some lite to convert my word table into access database. Note: within each cell of my word table(s), some has multi-line data in it. In addition, there is one row containing picture(s) as well. So far, what I did is doing it manually for each word docs I have. Select Table Convert Table to Text(I use ^ character for delimiter)
1
1145
by: ken | last post by:
I'm using VB.Net to process information out of a word document. If the document fails a test, I would like to close it and move it to an error folder. However, when I try to do that (see below) it says that another process is using the document. Is there some way to force the move to wait for the doc to close? Thanks. Dim wrd As Word.Application = New Word.Application Dim doc As Word.Document For Each sFile In Files strName =...
5
2862
by: jpr | last post by:
Hello, I have a form with a cbo which get's its data from a table. This combo returns names of MS Word files in the following path: C:\shares\files\*.dot I would like to open these files (actually it should open a copy of the template and not the dot file itself) using the OnChange event of my cbo. Is there a way or some code to help me? Thanks
2
2467
by: reb0101 | last post by:
hey all, I would very much appreciate any help or ideas on how to do this as I am stumped. I need to develop an access database to track documents but also link to them. I’ll explain what it needs to do; Every day there is a numbered (and titled) Word format document that is sent. Most, but not all of the time an accompanying excel file is also sent. The excel file is used for updates to the word document of the same name. Lets say...
1
1859
by: tnt84 | last post by:
I want to write a program that reads a text file and prints out the word frequencies using structure but I don't know exactly what the word frequency is and how can I write that program using structure
12
3444
by: Steve | last post by:
I've been building an application that will merge fields in a text file with a word template, save the resulting word file out to the user's hard drive, and then email the file as an attachment. The problem I'm having is that I can't delete the word file I saved at the end of the process due to the file being locked by the email process. It appears to take longer than the code takes to complete due to virus checking software that...
6
1151
by: SteveM | last post by:
Hi, I am needing some help/advice on how to display a word document in my ASP.NET web pages that can update itself from a word document located on the server. The idea here is that when the user makes changes to the document and uploads it to the website, that a refresh of the page will load the changes. Ideally what I would like would be to render the document as HTML because of the hyperlinks in the document that work nicely as HTML but...
3
5816
by: ArmageddonAsh | last post by:
I'm trying to make an application that will allow the user to enter data into a flexgrid (that's done) and then save the data from that flexgrid into a CSV file but even though the file is made none of the data from the flexgrid goes into the CSV file and so I have a couple of questions. 1, How do I make the application load the data from a specific CSV file? 2, How do I make it so that the user is able to add information to the data and...
13
2799
by: lawpoop | last post by:
Hello all - I have a two part question. First of all, I have a website under /home/user/www/. The index.php and all the other website pages are under /home/user/www/. For functions that are used in multiple files, I have php files under / home/user/www/functions/. These files simply have So, in index.php and other files, I have
0
8392
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8912
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8819
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7428
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6222
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5692
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4222
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4403
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2809
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.