473,289 Members | 1,884 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,289 software developers and data experts.

opening files via a directory

3
Hello,

I was browsing to see if I could find something similair to my problem. But I couldn't find anything..

I have this script that counts every word in a file. And then also says how many times that word occurs. Now I have this directory containing about 60 text files which I need to run this script on. Seeing as I'm not really a star in programming, I made a script that puts all those files in 1 file. And then that 1 file runs trough the counting script.

What I actually want.. is that I don't have to take that extra step. But to adjust the counting script so that it just loops trough the files in the given directory and then counts all the words in all the files so that my output is the same as if they were in 1 file.

Below here is what I have so far. Too bad I'm just not that grand in combining scripts.

Expand|Select|Wrap|Line Numbers
  1. import re
  2. lineList = open(r'/blablabla/bla/bla/file.txt').readlines()
  3. pat = "\w+"
  4. wordList = []
  5.  
  6. for line in lineList:
  7.     wordList += [w.lower() for w in re.findall(pat,line)]
  8.  
  9. wordCnt = [wordList.count(w) for w in wordList]
  10. dd = dict(zip(wordList,wordCnt))
  11.  
  12. for item in dd:
  13.     if dd[item] >40   and dd[item] < 200 :
  14.         print "Word '%s' occurs %d times." % (item, dd[item])
I discovered that for opening directories you have to import os.
Then instead of lineList = open blabla
I should
Expand|Select|Wrap|Line Numbers
  1. dirname=r'c:/blablabla/bla/bla/'
and then do something with os.path(dirname) or something like that.
I'm really getting confused/stressed by this. Could any of you perhaps give me a subtle hint to help me out?
Oct 16 '07 #1
5 2402
bartonc
6,596 Expert 4TB
This aught to give you a pretty good start:
Expand|Select|Wrap|Line Numbers
  1. from glob import glob
  2. import os
  3.  
  4. def ProcessFile(fileName):
  5.     print os.path.abspath(fileName)
  6.  
  7.  
  8. for fname in glob(".\*.py"):
  9.     ProcessFile(fname)
Oct 16 '07 #2
rhitam30111985
112 100+
to list all the files in a particular directory u can use the os module as folows:

Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3. file_list=os.listdir(directory)
  4.  
this will return a list of files in it.. then u can just iterate thru them and put all th words in a dictionary :

Expand|Select|Wrap|Line Numbers
  1. wordlist={}
  2. for item in file_list:
  3.          f=open(directory + '/' + item).read()       
  4.          f=f.split()
  5.          for word in f:
  6.            if word.isalpha():
  7.                if word not in dic :
  8.                          wordlist[word]=1
  9.                else:
  10.                          wordlist[word]+=1
  11.  
but i guess there r better solutions out there .. note that this code will ignore any word followed by a comma or a full stop etc .. so u gotta rid of them first to get the correct word count .. now thats another exercise.. .
Oct 16 '07 #3
bartonc
6,596 Expert 4TB
to list all the files in a particular directory u can use the os module as folows:
Thank you. I learned something about os.listdir(): I does list all the files in a directory.
but i guess there r better solutions out there
Expand|Select|Wrap|Line Numbers
  1. #
  2. # that's bad practice!
  3. # I encourage you to use:
  4.         f=open(directory + '/' + item)
  5.         text = f.read()
  6.         f.close()
Oct 16 '07 #4
rhitam30111985
112 100+
i thought readlines wud return lines of text in the file as elemants of list.. what i am trying to do here is return each word as an element of a list 'f'
Oct 16 '07 #5
bartonc
6,596 Expert 4TB
i thought readlines wud return lines of text in the file as elemants of list.. what i am trying to do here is return each word as an element of a list 'f'
You are correct. Sorry. I'll take it back.
Oct 16 '07 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Rob Mayo | last post by:
I know this seems like a noob question, but... Our group is working on a development server (Win2K Server, IIS 5.0) in which we are all Administors. On this server, we set up our projects by...
7
by: Fred H | last post by:
Hi I've tried to find out if and how I can open directories in standard C++, the same way I can open files. To my great suprise, I haven't managed to find out how this is done! I'm using MS...
1
by: Alfons | last post by:
Hello, I have build a program that can do file transferring between a Windows XP computer and a DOS computer via a serial port. The Windows program I have build in C++ with Visual Studio 6.0....
1
by: brian | last post by:
I am using the System.IO class to search for files in a directory. I input a couple strings on the main form and the code strings together a path. Then it searches and list all the files in that...
1
by: Vandana Rola | last post by:
Hello everyone, I am a beginner in VB.net. I am trying to write a program which will open around 300 files (which are .msg files)in a folder and extract the responses of the questions in those...
1
by: Praveen | last post by:
Hi, I copied a VS2005 sln and it's related files (just the aspx which is right beside it) from one m/c to another m/c into the exact same PHYSICAL DIR as well as the exact same VIRTUAL DIR and...
5
by: Poto | last post by:
Hello: Iam triying to open all the files that are located in one particular directory using a perl script. this is how I tryed : unless (opendir (DIR,"$dirname")){ print "Can't open the...
0
by: Jeremy Noring | last post by:
Hi, I have an application that loads various settings from the app.exe.config file. This has worked very well on w2k, XP, and Vista. However, recently I've been troubleshooting an issue on...
1
by: Bob | last post by:
Im setting up a fedora core 8 web server. Im currently running everything on another fedora core 6 server (working fine). Ive installed php 5.2.5 and apache 2.0.63 Im getting the following: ...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.