473,407 Members | 2,546 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

searching for words in a file

26
If I have a large number of files how would I search for a particular word in all of the files then have the system print out all the lines in those files that have the word that I am looking for.

What would have to be different if I wanted to do another program that would search for two words of my choice and then only print out the lines in those files that contain both those words.
Jun 19 '07 #1
27 2913
bartonc
6,596 Expert 4TB
If I have a large number of files how would I search for a particular word in all of the files then have the system print out all the lines in those files that have the word that I am looking for.

What would have to be different if I wanted to do another program that would search for two words of my choice and then only print out the lines in those files that contain both those words.
1) Use the glob module to look through directory trees for the files. Open each file and read it into a buffer. Close the file. Use the re module to create a Regular Expression object. Print the (file name?) that glob just gave you (or the buffer if you meant "print the file") if your Regular Expression gets a match.

2) build your Regular Expression with variables.
Jun 19 '07 #2
texas22
26
I am not sure what you mean by the glob module or anything? What I am trying to do is to write a script that would search 60 files for a certain word and then print out all the lines that have that word, then I want to also in a different program write a script that given two words, would search for those words then print out the lines in those files that contain both those words.
Jun 19 '07 #3
texas22
26
What I have are about 70 different files in a folder called "files" In that folder each file if opened in a program like word contains a list of sentences. So what I need to do is to be able to type in a search word and then have the program return all the sentences out of all the files that contain that word.
Jun 19 '07 #4
bvdet
2,851 Expert Mod 2GB
What I have are about 70 different files in a folder called "files" In that folder each file if opened in a program like word contains a list of sentences. So what I need to do is to be able to type in a search word and then have the program return all the sentences out of all the files that contain that word.
Compile a list of file names in memory. The os module can be used for that. Create a for loop to iterate on the list of file names:
Expand|Select|Wrap|Line Numbers
  1. for file_name in fileList:
Open and iterate on each file:
Expand|Select|Wrap|Line Numbers
  1. for line in file_object:
Use the 'in' operator to test if the word is contained in line.
Expand|Select|Wrap|Line Numbers
  1. if word in line:
  2.     print line
For multiple keywords, a regex solution can be implemented.
Expand|Select|Wrap|Line Numbers
  1. import re
  2. keyList = ['word1', 'word2']
  3. patt = re.compile('|'.join(keyList), re.IGNORECASE)
  4. for fn in fileList:
  5.     f = open(fn)
  6.     for line in f:
  7.         if patt.search(line.lower()):
  8.             print line
  9.     f.close()
Jun 20 '07 #5
texas22
26
All of the files I need to search through are in a folder called 'files' on my c: drive so how would I tell it to look in the folder and then look through each of those text files and search for a word that I tell it to
Jun 21 '07 #6
bvdet
2,851 Expert Mod 2GB
All of the files I need to search through are in a folder called 'files' on my c: drive so how would I tell it to look in the folder and then look through each of those text files and search for a word that I tell it to
This returns a list of entries in 'dir_name':
Expand|Select|Wrap|Line Numbers
  1. >>> import os
  2. >>> dir_name = r'C:\files'
  3. >>> entryList = os.listdir(dir_name)
Subdirectories are included in the list. Iterate on the list to search for your keywords.
Jun 21 '07 #7
bartonc
6,596 Expert 4TB
All of the files I need to search through are in a folder called 'files' on my c: drive so how would I tell it to look in the folder and then look through each of those text files and search for a word that I tell it to
Hi texas22. I'm glad to see that bvdet has provided the os module call needed for your purposes.

I'd like to explain what I see going on in this thread:
1) You have provided lots of words describing your goal; That's good.
2) I have replied with lots of words (instead of code) describing a solution; That's due to #1, above
3) bvdet has provide the bare minimum to get you going; That's the way things work around here. We help (not do it for you).
4) You have not shown any attempt (by posting code) to make this work yourself; That's not good.

So give it an honest try. You'll see the level of participation from members go up!
Jun 21 '07 #8
I'm kind of in Texas22's stage here. You all know what you are talking about but it's all Greek to me. You assume we would know what a glob module is when I don't have the concept of what needs to happen. How can we try it? I don't mean to offend and I really appreciate your help but could you simplify your explanation for us python toddlers?


Hi texas22. I'm glad to see that bvdet has provided the os module call needed for your purposes.

I'd like to explain what I see going on in this thread:
1) You have provided lots of words describing your goal; That's good.
2) I have replied with lots of words (instead of code) describing a solution; That's due to #1, above
3) bvdet has provide the bare minimum to get you going; That's the way things work around here. We help (not do it for you).
4) You have not shown any attempt (by posting code) to make this work yourself; That's not good.

So give it an honest try. You'll see the level of participation from members go up!
Jun 21 '07 #9
texas22
26
Thanks, for all the input and yes you are right I am trying to use the help that I am given so I will start posting my code in order to become more involved. My big problem is just getting started and then trying to figure out what each step means by seeing it so anyways here is what I got so far. Being new to programming and especially no prior knowledge whatsoever on python this is a big learning curve for me. The comments I put next to each line is what I am assuming each line does so if you could clarify for me.

(code)
import os

dir_name = r'C:\books' #This tells the program where to look#

entryList = os.listdir(dir_name) # not sure what exactly this does#


for file_name in fileList: #not sure what this does but when I run the program says this is not defined#

for line in file_object: #what does this line exactly do#

if word in line:

print line

So guess my question is, is there a place in the program that will prompt the user for the word they want to search for. What does it take to prompt the user for such a thing cause as far as I can tell there doesn't seem to be a place that does that. Right now I guess all this will do is pretty much tell the program where to look for the files but it won't yet conduct the search. Am I seeing it right or am I way off base.
Thanks for your help
Jun 21 '07 #10
I have the following code. It prints out the list of files inside the folder but doesn't seem to be searching the files.

#!C:\PYTHON25\PYTHON.EXE

import os
dir_name= r'C:\python25\books\books\books'
entryList=os.listdir(dir_name)
print entryList
searchWord= "Moses"
for file_name in entryList:
for line in file_name:
if searchWord in line:
print line


Am I even close? Do I have to open and close each file?

Thanks for your help.
Jun 21 '07 #11
ilikepython
844 Expert 512MB
Thanks, for all the input and yes you are right I am trying to use the help that I am given so I will start posting my code in order to become more involved. My big problem is just getting started and then trying to figure out what each step means by seeing it so anyways here is what I got so far. Being new to programming and especially no prior knowledge whatsoever on python this is a big learning curve for me. The comments I put next to each line is what I am assuming each line does so if you could clarify for me.

(code)
import os

dir_name = r'C:\books' #This tells the program where to look#

entryList = os.listdir(dir_name) # not sure what exactly this does#


for file_name in fileList: #not sure what this does but when I run the program says this is not defined#

for line in file_object: #what does this line exactly do#

if word in line:

print line

So guess my question is, is there a place in the program that will prompt the user for the word they want to search for. What does it take to prompt the user for such a thing cause as far as I can tell there doesn't seem to be a place that does that. Right now I guess all this will do is pretty much tell the program where to look for the files but it won't yet conduct the search. Am I seeing it right or am I way off base.
Thanks for your help
Expand|Select|Wrap|Line Numbers
  1. entryList = os.listdir(dir_name)
  2.  
That line lists all the files and folders in dir_name.

In the next code you have your variable names mixed up a bit:
Expand|Select|Wrap|Line Numbers
  1. for file_name in fileList:          # should be entryList instead of fileList
  2.     for line in file_object:         # should be "for line in file(file_name).readlines():"
  3.         if word in line:
  4.             print line
  5.  
So the corrected code would look like this:
Expand|Select|Wrap|Line Numbers
  1. dir_name = r"C:\Books"
  2. entryList = os.listdirs(dir_name)
  3.  
  4. for file_name in entryList:                           # iterate over files in entryList
  5.     for line in file(file_name).readlines():        # iterate over lines in each file 
  6.         if word in line:
  7.             print line
  8.  
If you want to prompt the user for a word to search for you use raw_input().
The place where to put it depends on how often do you want to change the word to search for. If it's going to stay the whole program you can just add this line at the top of your file:
Expand|Select|Wrap|Line Numbers
  1. word = raw_input("Enter a word to search for: ")
  2.  
Jun 21 '07 #12
ilikepython
844 Expert 512MB
I have the following code. It prints out the list of files inside the folder but doesn't seem to be searching the files.

#!C:\PYTHON25\PYTHON.EXE

import os
dir_name= r'C:\python25\books\books\books'
entryList=os.listdir(dir_name)
print entryList
searchWord= "Moses"
for file_name in entryList:
for line in file_name:
if searchWord in line:
print line


Am I even close? Do I have to open and close each file?

Thanks for your help.
You are iterating over the actual string of the file name. You want to iterate over the file object. Please see my previous reply and use python code tags ([code=python ][/code ]).
Jun 21 '07 #13
texas22
26
[code=python ]

import os

dir_name = r"C:\books"

word = raw_input("Enter a word to search for: ")

entryList = os.listdirs(dir_name)


for file_name in entryList:

for line in file(file_name).readlines():



if word in line:

print line

[/code ]

This is what I have so far when I run it I am getting the error
"AttributeError: 'module' object has no attribute 'listdirs'

What does this mean and am I on the right track?
Jun 21 '07 #14
texas22
26
sorry guys should not have troubled you with that one just had to take the 's' off of the line
[code = python]

entrylist=os.listdir(dir_name) #no 's' on the end of list dir

The error I'm getting now is:
"for line in file(file_name).readlines():

IO Error: [Errno2] No such file or directory: '1ch'

The '1ch' is the name of one of the files contained in the folder books
Jun 21 '07 #15
bvdet
2,851 Expert Mod 2GB
Expand|Select|Wrap|Line Numbers
  1.  
  2. import os
  3.  
  4. dir_name = r"C:\books"
  5.  
  6. word = raw_input("Enter a word to search for: ")
  7.  
  8. entryList = os.listdirs(dir_name)
  9.  
  10.  
  11. for file_name in entryList:
  12.  
  13.     for line in file(file_name).readlines():
  14.  
  15.  
  16.  
  17.         if word in line:
  18.  
  19.             print line
  20.  
  21.  
This is what I have so far when I run it I am getting the error
"AttributeError: 'module' object has no attribute 'listdirs'

What does this mean and am I on the right track?
You are on the correct path. You have an extra space in your closing code tag. It is 'os.listdir' instead of 'os.listdirs'. You need to exclude subdirectories from the file list. The full path and file name must be used to open the files unless the directory is on the Python path. Try this:
Expand|Select|Wrap|Line Numbers
  1. import os
  2. dir_name = r'C:\my_directory'
  3. word = raw_input("Enter a word to search for: ")
  4. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name) if os.path.isfile(os.path.join(dir_name, fn))]
  5.  
  6. for file_name in entryList:
  7.     for line in file(file_name).readlines():
  8.         if word in line:
  9.             print line
Jun 21 '07 #16
I change the second line below to read the lines but I am getting the error message below.

Expand|Select|Wrap|Line Numbers
  1. for file_name in entryList:
  2.     for line in file(file_name).readlines():
  3.         if searchWord in line:
  4.             print line
  5.  
Traceback (most recent call last):
File "C:/Python25/search.py", line 10, in <module>
for line in file(file_name).readlines():
IOError: [Errno 2] No such file or directory: '1ch'

I double checked the file address and it is printing the file name out so I think the address is ok. I also opened the file in VIM and it seems ok. What am I doing wrong?
Jun 21 '07 #17
ghostdog74
511 Expert 256MB
I change the second line below to read the lines but I am getting the error message below.

Expand|Select|Wrap|Line Numbers
  1. for file_name in entryList:
  2.     for line in file(file_name).readlines():
  3.         if searchWord in line:
  4.             print line
  5.  
Traceback (most recent call last):
File "C:/Python25/search.py", line 10, in <module>
for line in file(file_name).readlines():
IOError: [Errno 2] No such file or directory: '1ch'

I double checked the file address and it is printing the file name out so I think the address is ok. I also opened the file in VIM and it seems ok. What am I doing wrong?
check the file names in those directories, whether there are spaces in them.
Jun 22 '07 #18
check the file names in those directories, whether there are spaces in them.

None of the files have a space in the name.
Jun 22 '07 #19
bvdet
2,851 Expert Mod 2GB
None of the files have a space in the name.
Try this:
Expand|Select|Wrap|Line Numbers
  1. open(os.path.join(dir_name, file_name)).readlines()
Jun 22 '07 #20
Try this:
Expand|Select|Wrap|Line Numbers
  1. open(os.path.join(dir_name, file_name)).readlines()
I am not sure where to place that so I put it before the loop and got an error that file_name wasn't defined. I placed it inside the loop and it didn't make a difference.
Jun 22 '07 #21
bvdet
2,851 Expert Mod 2GB
I am not sure where to place that so I put it before the loop and got an error that file_name wasn't defined. I placed it inside the loop and it didn't make a difference.
Expand|Select|Wrap|Line Numbers
  1. import os
  2. dir_name = 'your_directory'
  3. word = raw_input("Enter a word to search for: ")
  4. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name) if os.path.isfile(os.path.join(dir_name, fn))]
  5.  
  6. for file_name in entryList:
  7.     for line in file(file_name).readlines():
  8.         if word in line:
  9.             print line
OR
Expand|Select|Wrap|Line Numbers
  1. import os
  2. dir_name = r'C:\SDS2_7.0\macro\New Versions'
  3. word = raw_input("Enter a word to search for: ")
  4. entryList = []
  5. for fn in os.listdir(dir_name):
  6.     fn = os.path.join(dir_name, fn)
  7.     if os.path.isfile(fn):
  8.         entryList.append(fn)
  9.  
  10. # See above for the rest of the code
Jun 22 '07 #22
texas22
26
Thanks for all the help guys
Jun 22 '07 #23
That worked, now I need to work on more than one search word. Thanks!

Expand|Select|Wrap|Line Numbers
  1. import os
  2. dir_name = 'your_directory'
  3. word = raw_input("Enter a word to search for: ")
  4. entryList = [os.path.join(dir_name, fn) for fn in os.listdir(dir_name) if os.path.isfile(os.path.join(dir_name, fn))]
  5.  
  6. for file_name in entryList:
  7.     for line in file(file_name).readlines():
  8.         if word in line:
  9.             print line
OR
Expand|Select|Wrap|Line Numbers
  1. import os
  2. dir_name = r'C:\SDS2_7.0\macro\New Versions'
  3. word = raw_input("Enter a word to search for: ")
  4. entryList = []
  5. for fn in os.listdir(dir_name):
  6.     fn = os.path.join(dir_name, fn)
  7.     if os.path.isfile(fn):
  8.         entryList.append(fn)
  9.  
  10. # See above for the rest of the code
Jun 22 '07 #24
bvdet
2,851 Expert Mod 2GB
That worked, now I need to work on more than one search word. Thanks!
See post #5 this thread.
Jun 22 '07 #25
bartonc
6,596 Expert 4TB
I'm kind of in Texas22's stage here. You all know what you are talking about but it's all Greek to me. You assume we would know what a glob module is when I don't have the concept of what needs to happen. How can we try it? I don't mean to offend and I really appreciate your help but could you simplify your explanation for us python toddlers?
As you are new to the Python Forum here on TheScripts (and I really hate to be the "enforcer"), I have allowed your posts to remain in this tread even though (and I hope you can see it, too) it makes quite a mess when there are two people asking questions in the same thread. I'm anxious for this question to be resolved so that you two "toddlers" can start new threads of your own.

Please feel free to hit that "start discussion" button any ol' time. And, please, don't jump in on someone else's thread in the future.

Thanks.

By the way, we actually have some Posting Guidelines which may help you get to know some of the rules and policies better.
Jun 23 '07 #26
Sorry, I thought this line in the guidelines and thought I should use Texas22's rather than start an identical one.

"If you wish to post a question do not post it in a discussion created by someone else unless it is about exactly the same problem."

Thanks


As you are new to the Python Forum here on TheScripts (and I really hate to be the "enforcer"), I have allowed your posts to remain in this tread even though (and I hope you can see it, too) it makes quite a mess when there are two people asking questions in the same thread. I'm anxious for this question to be resolved so that you two "toddlers" can start new threads of your own.

Please feel free to hit that "start discussion" button any ol' time. And, please, don't jump in on someone else's thread in the future.

Thanks.

By the way, we actually have some Posting Guidelines which may help you get to know some of the rules and policies better.
Jun 25 '07 #27
bartonc
6,596 Expert 4TB
Sorry, I thought this line in the guidelines and thought I should use Texas22's rather than start an identical one.

"If you wish to post a question do not post it in a discussion created by someone else unless it is about exactly the same problem."

Thanks
OK. I'll buy that. No hard feelings, I hope.
Jun 25 '07 #28

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: fartsniff | last post by:
hello all, this is continued from a previous post. i am still stuck, and really could use some help. basically, via a form, search words are submitted, i.e. "apple". it then searches through...
4
by: Michi | last post by:
I was wondering what the best solution is for making large numbers of TEXT (or BLOB?) fields searchable. For example, if I have a forum, what is the best way to be able to search for specific...
1
by: Robert Oschler | last post by:
I read a while back that MySQL will only use one index per query. (If this is not so, please tell me and point me to a doc that gives a good explanation of MySQL's current index usage policy). ...
1
by: thehumantrashcan | last post by:
Hi, This is the first database I have ever created, so please bear with me. I've created a simple database with 1 column and about 80,000 rows. In each row is a word (basically a dictionary...
7
by: pbd22 | last post by:
Hi. I am somewhat new to this and would like some advice. I want to search my xml file using "keyword" search and return results based on "proximity matching" - in other words, since the search...
1
by: warheart | last post by:
hi im kinda new to programming i aint lazy to search and learn :D but i am getting despirate... ive been looking everywhere for a code to search a txt file and find a string, and then give...
15
by: Gigs_ | last post by:
Hi all! I have text file (english-croatian dictionary) with words in it in alphabetical order. This file contains 179999 words in this format: english word: croatian word I want to make...
1
by: =?Utf-8?B?UmFm?= | last post by:
Hello, I have to write an application that scans textfiles for certain words. I'm talking about approximately 5000 words. The only way I can think of to do this is to scan each textfile for each...
12
by: Alexnb | last post by:
This is similar to my last post, but a little different. Here is what I would like to do. Lets say I have a text file. The contents look like this, only there is A LOT of the same thing. () A...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.