473,397 Members | 1,969 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Searching for more than one word in multiple files

I have successfully created a program that searches for a word in multiple files but now I need to be able to search by more than one word. I have add code from a previous discussion to my original program but I am unsure how they should fit together. Can someone clear this up for me?

Expand|Select|Wrap|Line Numbers
  1. #!C:\PYTHON25\PYTHON.EXE
  2.  
  3. import os
  4. import re
  5. dir_name= r'c:\Python25\books\books\books'
  6. word=raw_input("Enter a word to search for: ")
  7. word2=raw_input("Enter a second word to search for: ")
  8. keyList = ['word', 'word2']
  9. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name) if os.path.isfile(os.path.join(dir_name, fn))]
  10. for file_name in entryList:
  11.     for line in file(file_name).readlines():
  12.         if word in line:
  13.             print line           
  14. patt = re.compile('|'.join(keyList), re.IGNORECASE)
  15. for fn in dir_name:
  16.     f = open(fn)
  17.     for line in f:
  18.         if patt.search(line.lower()):
  19.             print line
  20.     f.close()
  21.  
Jun 27 '07 #1
11 2102
bartonc
6,596 Expert 4TB
I have successfully created a program that searches for a word in multiple files but now I need to be able to search by more than one word. I have add code from a previous discussion to my original program but I am unsure how they should fit together. Can someone clear this up for me?

Expand|Select|Wrap|Line Numbers
  1. #!C:\PYTHON25\PYTHON.EXE
  2.  
  3. import os
  4. import re
  5. dir_name= r'c:\Python25\books\books\books'
  6. word=raw_input("Enter a word to search for: ")
  7. word2=raw_input("Enter a second word to search for: ")
  8. keyList = ['word', 'word2']
  9. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name) if os.path.isfile(os.path.join(dir_name, fn))]
  10. for file_name in entryList:
  11.     for line in file(file_name).readlines():
  12.         if word in line:
  13.             print line           
  14. patt = re.compile('|'.join(keyList), re.IGNORECASE)
  15. for fn in dir_name:
  16.     f = open(fn)
  17.     for line in f:
  18.         if patt.search(line.lower()):
  19.             print line
  20.     f.close()
  21.  
Here's one way:
Expand|Select|Wrap|Line Numbers
  1. import os
  2. import re
  3. dir_name = r'c:\Python25\books\books\books'
  4.  
  5. ##word2 = raw_input("Enter a second word to search for: ")
  6. ###removed quotes#
  7. ##keyList = [word, word2]
  8.  
  9. def FindWord(word, fileList):
  10.     for file_name in fileList:
  11.         for line in file(file_name).readlines():
  12.             if word in line:
  13.                 print line
  14.  
  15. def FindWords(wordList, fileList):
  16.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  17.     for fn in dir_name:
  18.         f = open(fn)
  19.         for line in f.readlines(): # added .readlines()
  20.             if patt.search(line.lower()): # probably don't need .lower()
  21.                 print line
  22.         f.close()
  23.  
  24.  
  25. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name)
  26.              if os.path.isfile(os.path.join(dir_name, fn))]
  27.  
  28. words = raw_input("Enter one or more words to search for: ")
  29. keyList = words.split()
  30. if len(keylist) > 1:
  31.     FindWords(keylist, entryList)
  32. else:
  33.     FindWord(words, entryList)
  34.  
Jun 28 '07 #2
Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
Thanks for all your help.

Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3. import re
  4. dir_name = r'c:\Python25\books\books\books'
  5.  
  6.  
  7. def FindWord(word, fileList):
  8.     for file_name in fileList:
  9.         for line in file(file_name).readlines():
  10.             if word in line:
  11.                 print line
  12.  
  13. def FindWords(wordList, fileList):
  14.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  15.     for fn in dir_name:
  16.         f = open(fn)
  17.         for line in f.readlines(): 
  18.             if patt.search(line): 
  19.                 print line
  20.         f.close()
  21.  
  22.  
  23. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name)
  24.              if os.path.isfile(os.path.join(dir_name, fn))]
  25.  
  26. words = raw_input("Enter one or more words to search for: ")
  27. keyList = words.split()
  28. print keyList
  29. if len(keyList) > 1:
  30.     FindWords(words, entryList)
  31. else:
  32.     FindWord(words, entryList)
  33.  
Enter one or more words to search for: bird goat tree
['bird', 'goat', 'tree']

Traceback (most recent call last):
File "C:/Python25/searchtest.py", line 29, in <module>
FindWords(words, entryList)
File "C:/Python25/searchtest.py", line 15, in FindWords
f = open(fn)
IOError: [Errno 2] No such file or directory: 'c'
Jun 28 '07 #3
bvdet
2,851 Expert Mod 2GB
Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
Thanks for all your help.

Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3. import re
  4. dir_name = r'c:\Python25\books\books\books'
  5.  
  6.  
  7. def FindWord(word, fileList):
  8.     for file_name in fileList:
  9.         for line in file(file_name).readlines():
  10.             if word in line:
  11.                 print line
  12.  
  13. def FindWords(wordList, fileList):
  14.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  15.     for fn in dir_name:
  16.         f = open(fn)
  17.         for line in f.readlines(): 
  18.             if patt.search(line): 
  19.                 print line
  20.         f.close()
  21.  
  22.  
  23. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name)
  24.              if os.path.isfile(os.path.join(dir_name, fn))]
  25.  
  26. words = raw_input("Enter one or more words to search for: ")
  27. keyList = words.split()
  28. print keyList
  29. if len(keyList) > 1:
  30.     FindWords(words, entryList)
  31. else:
  32.     FindWord(words, entryList)
  33.  
Enter one or more words to search for: bird goat tree
['bird', 'goat', 'tree']

Traceback (most recent call last):
File "C:/Python25/searchtest.py", line 29, in <module>
FindWords(words, entryList)
File "C:/Python25/searchtest.py", line 15, in FindWords
f = open(fn)
IOError: [Errno 2] No such file or directory: 'c'
You have left out some code. Look at this and then look at your code:
Expand|Select|Wrap|Line Numbers
  1. >>> dir_name = r'c:\Python25\books\books\books'
  2. >>> for fn in dir_name:
  3. ...     print fn
  4. ...     
  5. c
  6. :
  7. \
  8. P
  9. y
  10. t
  11. h
  12. o
  13. n
  14. 2
  15. 5
  16. \
  17. b
  18. o
  19. o
  20. k
  21. s
  22. \
  23. b
  24. o
  25. o
  26. k
  27. s
  28. \
  29. b
  30. o
  31. o
  32. k
  33. s
  34. >>> 
Jun 28 '07 #4
bartonc
6,596 Expert 4TB
Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
Thanks for all your help.

Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3. import re
  4. dir_name = r'c:\Python25\books\books\books'
  5.  
  6.  
  7. def FindWord(word, fileList):
  8.     for file_name in fileList:
  9.         for line in file(file_name).readlines():
  10.             if word in line:
  11.                 print line
  12.  
  13. def FindWords(wordList, fileList):
  14.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  15.     for fn in dir_name:
  16.         f = open(fn)
  17.         for line in f.readlines(): 
  18.             if patt.search(line): 
  19.                 print line
  20.         f.close()
  21.  
  22.  
  23. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name)
  24.              if os.path.isfile(os.path.join(dir_name, fn))]
  25.  
  26. words = raw_input("Enter one or more words to search for: ")
  27. keyList = words.split()
  28. print keyList
  29. if len(keyList) > 1:
  30.     FindWords(words, entryList)
  31. else:
  32.     FindWord(words, entryList)
  33.  
Enter one or more words to search for: bird goat tree
['bird', 'goat', 'tree']

Traceback (most recent call last):
File "C:/Python25/searchtest.py", line 29, in <module>
FindWords(words, entryList)
File "C:/Python25/searchtest.py", line 15, in FindWords
f = open(fn)
IOError: [Errno 2] No such file or directory: 'c'
My bad. Sorry. It should be:
Expand|Select|Wrap|Line Numbers
  1.  
  2. def FindWords(wordList, fileList):
  3.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  4.     for fn in fileList:
  5.         f = open(fn)
  6.         for line in f.readlines(): 
  7.             if patt.search(line): 
  8.                 print line
  9.         f.close()
  10.  
Jun 28 '07 #5
Thanks guys.
I got it where it is searching for all the words but I need to fine tune it some more.
First of all, when I put in a word like "eat", it is finding everything with those letters in it like "beat". Is there a way to make it only pull up the exact word?

Also it is bring up all of the lines that have one of the words in it. Is there a way to change it so that it only prints the lines that have all of the words in it?
Jun 29 '07 #6
Smygis
126 100+
#!C:\PYTHON25\PYTHON.EXE

Shuld always be

#!/usr/bin/env python

And never anything else.

Unlike windows who executes files after ther file extention *nix systems reads the first line of every file before its executed.

And if that line begins with #! the rest of the file is sent as an argument to the specified enviroment. in our case, python.
Jun 29 '07 #7
bartonc
6,596 Expert 4TB
Thanks guys.
I got it where it is searching for all the words but I need to fine tune it some more.
First of all, when I put in a word like "eat", it is finding everything with those letters in it like "beat". Is there a way to make it only pull up the exact word?

Also it is bring up all of the lines that have one of the words in it. Is there a way to change it so that it only prints the lines that have all of the words in it?
It would be very helpful if you would get in the habit of posting the working code (especially if you still have questions). It helps others figure out this type of problem when they get stuck and it helps us see what the heck you're talking about.

That said:
The "fine tuning" comes down to learning the Regular Expression language and I'm not sure that I'm reading to start calling this the Python/Regex Forum, just yet. Regular-Expression.info is a good place to start with that.
Jun 29 '07 #8
It would be very helpful if you would get in the habit of posting the working code (especially if you still have questions). It helps others figure out this type of problem when they get stuck and it helps us see what the heck you're talking about.

That said:
The "fine tuning" comes down to learning the Regular Expression language and I'm not sure that I'm reading to start calling this the Python/Regex Forum, just yet. Regular-Expression.info is a good place to start with that.

It's not much different that is listed above:

Expand|Select|Wrap|Line Numbers
  1. import os
  2. import re
  3. dir_name = r'c:\Python25\books\books\books'
  4.  
  5. def FindWord(word, fileList):
  6.     for file_name in fileList:
  7.         for line in file(file_name).readlines():
  8.             if word in line:
  9.                 print line
  10.  
  11. def FindWords(wordList, fileList):
  12.     patt = re.compile('|'.join(wordList), re.IGNORECASE)
  13.     for fn in fileList:
  14.         f = open(fn)
  15.         for line in f.readlines(): 
  16.             if patt.search(line): 
  17.                 print line
  18.         f.close()
  19. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name)
  20.             if os.path.isfile(os.path.join(dir_name, fn))]
  21.  
  22. words = raw_input("Enter one or more words to search for: ")
  23. keyList = words.split()
  24. if len(keyList) > 1:
  25.     FindWords(keyList, entryList)
  26. else:
  27.     FindWord(words, entryList)
  28.  
Jun 29 '07 #9
bvdet
2,851 Expert Mod 2GB
Given a file name and a key word list, this function will print any line that contains a word in the key word list:
Expand|Select|Wrap|Line Numbers
  1. def matchAnyWord(fn, keyList):
  2.     patt = re.compile('(?<![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList), re.IGNORECASE)
  3.     f = open(fn)
  4.     for line in f:
  5.         if patt.search(line.lower()):
  6.             print line
  7.     f.close()
Given a file name and a key word list, this function will print any line that contains all of the words in the key word list:
Expand|Select|Wrap|Line Numbers
  1. def matchAllWords(fn, keyList):
  2.     pattList = [re.compile('(?<![a-z])%s(?![a-z])' % key) for key in keyList]
  3.     f = open(fn)
  4.     for line in f:
  5.         matchList = []
  6.         for patt in pattList:
  7.             matchList.append(patt.search(line.lower()))
  8.         print matchList
  9.         if None not in matchList:
  10.             print line
  11.     f.close()
Jun 29 '07 #10
Great, thanks, bvdet. I'll try to incorporate that.

Given a file name and a key word list, this function will print any line that contains a word in the key word list:
Expand|Select|Wrap|Line Numbers
  1. def matchAnyWord(fn, keyList):
  2.     patt = re.compile('(?<![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList), re.IGNORECASE)
  3.     f = open(fn)
  4.     for line in f:
  5.         if patt.search(line.lower()):
  6.             print line
  7.     f.close()
Given a file name and a key word list, this function will print any line that contains all of the words in the key word list:
Expand|Select|Wrap|Line Numbers
  1. def matchAllWords(fn, keyList):
  2.     pattList = [re.compile('(?<![a-z])%s(?![a-z])' % key) for key in keyList]
  3.     f = open(fn)
  4.     for line in f:
  5.         matchList = []
  6.         for patt in pattList:
  7.             matchList.append(patt.search(line.lower()))
  8.         print matchList
  9.         if None not in matchList:
  10.             print line
  11.     f.close()
Jun 29 '07 #11
bvdet
2,851 Expert Mod 2GB
Here is an interactive exercise:
Expand|Select|Wrap|Line Numbers
  1. >>> keyList = ['thread', 'needle']
  2. >>> patt = re.compile('(?<![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList), re.IGNORECASE)
  3. >>> patt.search('The thread was threaded through several needles')
  4. <_sre.SRE_Match object at 0x00DB6138>
  5. >>> print patt.search('The threads were threaded through several needles')
  6. None
  7. >>> pattList = [re.compile('(?<![a-z])%s(?![a-z])' % key) for key in keyList]
  8. >>> for patt in pattList:
  9. ...     print patt.search('The thread was threaded through several needles')
  10. ...     
  11. <_sre.SRE_Match object at 0x00DB62F8>
  12. None
  13. >>> for patt in pattList:
  14. ...     print patt.search('The thread was threaded through a needle')
  15. ...     
  16. <_sre.SRE_Match object at 0x00DB6288>
  17. <_sre.SRE_Match object at 0x00DB6288>
  18. >>> 
The re expression was modified to exclude matches if a key word was preceded or followed by any letter in the set '[a-z]'.
Jun 29 '07 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: dpg | last post by:
How do site searches work? I want to create a MySQL database with a field called "keywords". Then a form with a search phrase input box. I can't figure how to get the results with multiple...
1
by: kindermaxiz | last post by:
hey yall I want to read a text file and check for a special word on it, how can I do that? Also I want to search for a special word such as "?>" and write something on the line that preceeds it if...
1
by: Derek Mortimer | last post by:
This is my first attempt to join a user group. I run an iMac and have recently upgraded to OSX, OfficeX, after years of problems. I am having difficulties with Find, or maybe do not understand its...
303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
1
by: Robert Oschler | last post by:
I read a while back that MySQL will only use one index per query. (If this is not so, please tell me and point me to a doc that gives a good explanation of MySQL's current index usage policy). ...
8
by: prabha | last post by:
Hello Everybody, I have to conert the word doc to multiple html files,according to the templates in the word doc. I had converted the word to xml.Also through Exsl ,had finished the multiple...
5
by: jayjay | last post by:
I'm trying to help a friend setup a database to track resumes. The candidates will submit their resume in a Word doc format, and I'd like to make a search that will do a context search of the...
8
by: Frost | last post by:
Hi All, I am a newbie i have written a c program on unix for line by line comparison for two files now could some one help on how i could do word by word comparison in case both lines have the...
1
by: veer | last post by:
Hi i made program on searching and if a word is present in a file more than one time this program search it one time and exit the file but i want to show all the locations of the searched word in...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.