By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,570 Members | 1,245 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,570 IT Pros & Developers. It's quick & easy.

opening files via a directory

P: 3
Hello,

I was browsing to see if I could find something similair to my problem. But I couldn't find anything..

I have this script that counts every word in a file. And then also says how many times that word occurs. Now I have this directory containing about 60 text files which I need to run this script on. Seeing as I'm not really a star in programming, I made a script that puts all those files in 1 file. And then that 1 file runs trough the counting script.

What I actually want.. is that I don't have to take that extra step. But to adjust the counting script so that it just loops trough the files in the given directory and then counts all the words in all the files so that my output is the same as if they were in 1 file.

Below here is what I have so far. Too bad I'm just not that grand in combining scripts.

Expand|Select|Wrap|Line Numbers
  1. import re
  2. lineList = open(r'/blablabla/bla/bla/file.txt').readlines()
  3. pat = "\w+"
  4. wordList = []
  5.  
  6. for line in lineList:
  7.     wordList += [w.lower() for w in re.findall(pat,line)]
  8.  
  9. wordCnt = [wordList.count(w) for w in wordList]
  10. dd = dict(zip(wordList,wordCnt))
  11.  
  12. for item in dd:
  13.     if dd[item] >40   and dd[item] < 200 :
  14.         print "Word '%s' occurs %d times." % (item, dd[item])
I discovered that for opening directories you have to import os.
Then instead of lineList = open blabla
I should
Expand|Select|Wrap|Line Numbers
  1. dirname=r'c:/blablabla/bla/bla/'
and then do something with os.path(dirname) or something like that.
I'm really getting confused/stressed by this. Could any of you perhaps give me a subtle hint to help me out?
Oct 16 '07 #1
Share this Question
Share on Google+
5 Replies


bartonc
Expert 5K+
P: 6,596
This aught to give you a pretty good start:
Expand|Select|Wrap|Line Numbers
  1. from glob import glob
  2. import os
  3.  
  4. def ProcessFile(fileName):
  5.     print os.path.abspath(fileName)
  6.  
  7.  
  8. for fname in glob(".\*.py"):
  9.     ProcessFile(fname)
Oct 16 '07 #2

rhitam30111985
100+
P: 112
to list all the files in a particular directory u can use the os module as folows:

Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3. file_list=os.listdir(directory)
  4.  
this will return a list of files in it.. then u can just iterate thru them and put all th words in a dictionary :

Expand|Select|Wrap|Line Numbers
  1. wordlist={}
  2. for item in file_list:
  3.          f=open(directory + '/' + item).read()       
  4.          f=f.split()
  5.          for word in f:
  6.            if word.isalpha():
  7.                if word not in dic :
  8.                          wordlist[word]=1
  9.                else:
  10.                          wordlist[word]+=1
  11.  
but i guess there r better solutions out there .. note that this code will ignore any word followed by a comma or a full stop etc .. so u gotta rid of them first to get the correct word count .. now thats another exercise.. .
Oct 16 '07 #3

bartonc
Expert 5K+
P: 6,596
to list all the files in a particular directory u can use the os module as folows:
Thank you. I learned something about os.listdir(): I does list all the files in a directory.
but i guess there r better solutions out there
Expand|Select|Wrap|Line Numbers
  1. #
  2. # that's bad practice!
  3. # I encourage you to use:
  4.         f=open(directory + '/' + item)
  5.         text = f.read()
  6.         f.close()
Oct 16 '07 #4

rhitam30111985
100+
P: 112
i thought readlines wud return lines of text in the file as elemants of list.. what i am trying to do here is return each word as an element of a list 'f'
Oct 16 '07 #5

bartonc
Expert 5K+
P: 6,596
i thought readlines wud return lines of text in the file as elemants of list.. what i am trying to do here is return each word as an element of a list 'f'
You are correct. Sorry. I'll take it back.
Oct 16 '07 #6

Post your reply

Sign in to post your reply or Sign up for a free account.