473,322 Members | 1,431 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Script that searches for text file

I am very new to UNIX. I need a script that searches for a text file with most occurrences of a given keyword.
May 11 '07 #1
17 6560
savanm
85
hi,

I cant get ur need clearly

find pathname -name *.txt -exec ls -l {} \;
May 11 '07 #2
savanm
85
find pathname -name '*.txt'
May 11 '07 #3
Thanks for your assistance below is what I am being asked:



Due Day 7 (Monday):



Create a script that searches for a text file with most occurrences of a given keyword. For instance, if I would like to search for a script with most usages of if statement, I would execute:



find_most_relevant.sh /home/yevgeniy/myscripts if



The script find_most_relevant.sh should take two arguments. The first one is the directory where text files are located (recursive search is optional for this assignment). The second argument is a keyword. The output of the script should either state:



No file with word <keyword> found in the directory <directory-name>.



Or



file <filename>: found X occurrences of word <keyword>



(for the file with most occurrences of keyword only)



Hint 1: If you would like, you can use command grep -o key filename | wc -l to count number of occurrences in a single file.



Hint 2: My version of full-credit script is only 22 lines long. Yours can be either more or less than that - just don't assume that this assignment asks for anything terribly complicated.
May 11 '07 #4
ghostdog74
511 Expert 256MB
Expand|Select|Wrap|Line Numbers
  1. #!bin/sh
  2. awk -v search='if' '{
  3. n=gsub(search, "");
  4. count+=n;calc[FILENAME]=count;
  5. }
  6. END{
  7. for (i in calc) print "File: " i " Count: " calc[i]
  8. }' *.sh
  9.  
if leave you to do the rest.
May 12 '07 #5
Thanks a million this will help me a lot.
May 12 '07 #6
prn
254 Expert 100+
I couldn't resist:

Expand|Select|Wrap|Line Numbers
  1. grep -c $2 $1 | sort -n -t':' +1 | tail -1 | cut -f1 -d':'
  2.  
Paul
May 22 '07 #7
ghostdog74
511 Expert 256MB
I couldn't resist:

Expand|Select|Wrap|Line Numbers
  1. grep -c $2 $1 | sort -n -t':' +1 | tail -1 | cut -f1 -d':'
  2.  
Paul
grep -c counts only matching lines with the search word, but it will not count number of the same word that appears on a line..
eg
Expand|Select|Wrap|Line Numbers
  1. this is a line but this is last part of the line.
  2.  
grep -c "line" "file"
will give count of 1, not 2...
May 23 '07 #8
prn
254 Expert 100+
grep -c counts only matching lines with the search word, but it will not count number of the same word that appears on a line..
Good catch, ghostdog.

It looks like there's no simple one-liner for this. Actually I find a couple more problems, now that I look more closely. I decided to see what I could come up with using perl and I came up with this (hardly a one-liner):

Expand|Select|Wrap|Line Numbers
  1. #! /usr/bin/perl
  2. use strict;
  3. my $dirname = $ARGV[0];
  4. my $pat = $ARGV[1];
  5. opendir DIR, $dirname or die "could not open $dirname: $!\n";
  6. my @files = grep !/^\./, readdir DIR;
  7. closedir DIR;
  8. my %counts;
  9. foreach my $file (@files) {
  10.   my $cnt=0;
  11.   open IN, "<$file" or die "Could not open $file for read: $!\n";
  12.   while (<IN>){ $cnt += s/$pat/$pat/g; }
  13.   close IN;
  14.   $counts{$file} = $cnt;
  15. }
  16. foreach my $k (reverse sort {$counts{$a} cmp $counts{$b}} keys %counts) {print "$counts{$k} \t $k \n" };
I saved this as count.pl and since there were not many occurrence of "if" in the directory where I was going to test it, I ran it with the line:
Expand|Select|Wrap|Line Numbers
  1. ./count.pl . for
and got output like:
Expand|Select|Wrap|Line Numbers
  1. 3        count.pl 
  2. 3        count.pl~ 
  3. 2        test.txt 
  4. 2        count.sh 
  5. ...
Note that count.pl is the script itself and you can see the string "for" in lines 9, 11 and 16.
I also tried your (ghostdog74's) script, suitably modified:
Expand|Select|Wrap|Line Numbers
  1. #! /bin/sh
  2. awk -v search='for' '{
  3.   n=gsub(search, "");
  4.   count+=n;calc[FILENAME]=count;
  5. }
  6. END{
  7.   for (i in calc) print "File: " i " Count: " calc[i]
  8. }' *
(Note that I changed the search target to "for" and the fileglob to "*" instead of "*.sh". I also added the leading slash to "/bin/sh".)
I got output like this:
Expand|Select|Wrap|Line Numbers
  1. ...
  2. File: count.sh Count: 8
  3. File: test.txt Count: 14
  4. ...
  5. File: count.pl Count: 3
  6. ...
  7. File: count.pl~ Count: 6
(Omitting several others to show just the same files as the top four from the Perl output.)
The files did not change between these two runs, yet the awk script is giving counts that are wildly at variance with the correct answers. You can see the actual files count.pl and count.sh in this post and they respectively contain 3 and 1 instances of the string "for", not 3 and 8 instances. The previous version of the perl script, count.pl~ also contains 3 instances, not 6. I don't know what is happening here.

By now, I should point out why I have been talking about 'the string "for"' rather than 'the word "for"'. The perl script, at least, has also been finding and counting "foreach", "format" or "performance" as instances of "for", which is probably not what the OP wanted. The most straightforward fix is probably to replace line 12 in count.pl with
Expand|Select|Wrap|Line Numbers
  1.   while (<IN>){ $cnt += s/\b$pat\b/$pat/g; }
where the "\b" represents a "word boundary" declaration (including whitespace, punctuation, line ends, etc.). Now count.pl outputs:
Expand|Select|Wrap|Line Numbers
  1. 2        count.sh 
  2. ...
  3. 1        test.txt 
  4. ...
  5. 1        count.pl 
  6. 1        count.pl~ 
for the same four files and this count is only for the actual word "for" with none of the spurious matches from the other version above.

So now I know where I went wrong before, but I'm not at all sure what is wrong with the awk. I may have done something to it, but I don't know what. I copied the script from post 5, pasted it into a script, and made only the 3 changes I listed. We may be using different awks, but that seems odd too. The one I am testing with is on a box running Fedora Core 6 and for version information:
Expand|Select|Wrap|Line Numbers
  1.  rpm -qa | grep awk
  2. gawk-3.1.5-14.fc6
Best Regards,
Paul
May 23 '07 #9
Motoma
3,237 Expert 2GB
Yikes! Homework question!
May 23 '07 #10
prn
254 Expert 100+
Yikes! Homework question!
Yep. That's what it looks like, but OTOH, it's long expired by now so we can have some fun with it. I found Ghostdog's suggestion interesting and challenging and he pointed out a big flaw in my first try and ...

So maybe it's not so bad, especially not anymore.

JHMO,
Paul
May 23 '07 #11
ghostdog74
511 Expert 256MB
i think its because in the awk script that i wrote, the count variable is not reset once the next file is processed. So i think it accumulated. I had originally written the script to count all occurrences of the search pattern for all *.sh files as total ( while i think OP only wants to count 1 file ?) anyway... "count" variable should be reset to get the correct output. hope it clarifies a bit.
May 23 '07 #12
Motoma
3,237 Expert 2GB
Yep. That's what it looks like, but OTOH, it's long expired by now so we can have some fun with it. I found Ghostdog's suggestion interesting and challenging and he pointed out a big flaw in my first try and ...

So maybe it's not so bad, especially not anymore.

JHMO,
Paul
Fair enough; however, this is a very common homework question for an Intro to Unix professor to ask. I may end up deleting the thread on the principal of forcing people to actually learn Linux (instead of just feigning it).

Anyway, I'm interested in how this concludes, so finish up the discussion and I will probably bump this off at a later time.
May 23 '07 #13
Motoma
3,237 Expert 2GB
grep -c counts only matching lines with the search word, but it will not count number of the same word that appears on a line..
eg
Expand|Select|Wrap|Line Numbers
  1. this is a line but this is last part of the line.
  2.  
grep -c "line" "file"
will give count of 1, not 2...
You could always sed the file first, replacing all spaces with newlines.
*shrug*
Just a thought.
May 23 '07 #14
prn
254 Expert 100+
I imagine it is something like that, but what puzzles me is why the numbers don't keep increasing in that case. I get numbers going up and down all over the place.

While I'm back and at it, I decided to take the hint in the original (presumably HW) statement of the problem and have:
Expand|Select|Wrap|Line Numbers
  1. for f in $1/*; do c=`grep -wo $2 $f | wc -l`; echo "$c    $f" ;done | sort -nr | head -1 | cut -f2
Not too bad. :)

Paul
May 23 '07 #15
Hi,
I am new to this Perl.
I need a script which searches for a keyword in a Directory which contains the subdirectories...The subdirectories will contain the logfiles.
I need to search the logfiles for a keyword and its count in each logfile if found.
I need to display the filename along with th count.
Hope someone will help me....
Thanks in Advance.......
I tried the above perl script but its says read permission denied....
can u plzzzz gudie meeeeeeee
Aug 10 '07 #16
its working if I copy that script in the desired directory and run it.
I need to run for all the sub directories....Also I need the output in a textfile becoz when it runs,I can only see the results of few logfile in the commadn promptttt
hope can someone help meeeeeee

its kinda urgent....

thanks in advance............
Aug 10 '07 #17
prn
254 Expert 100+
Hi bhumikas,

its working if I copy that script in the desired directory and run it.
I need to run for all the sub directories....Also I need the output in a textfile becoz when it runs,I can only see the results of few logfile in the commadn promptttt
hope can someone help meeeeeee

its kinda urgent....

thanks in advance............
Evidently, you have figured out why you were having problems with read permission, so I won't comment further on that. The perl script I posted above in post #9 of this thread does not dig down into subdirectories. It was posted simply as an illustration, not to solve your specific problem. Notice that I commented that you could call it with a command line like :
Expand|Select|Wrap|Line Numbers
  1. ./count.pl . for
where the "." is a reference to the current directory. If you want to run the same script in the current directory and all of its subdirectories, the simplest modification is simply to call it with:
Expand|Select|Wrap|Line Numbers
  1. find . -type d -exec ./count.pl {} for \;
And, of course, remember that you can always capture the output of ANY unix/linux process by redirecting it to a file. For example:
Expand|Select|Wrap|Line Numbers
  1. find . -type d -exec ./count.pl {} for >foo.txt \;
Naturally, I could equally well modify the perl script, but unix/linux has always had a "tools" approach that makes it unnecessary to rewrite everything just to do normal tasks.

HTH,
Paul
Aug 13 '07 #18

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: David Thomas | last post by:
Hi there, a while ago, I posted a question regarding reading japanese text from a text file. Well, since I solved the problem, I thought I'd post my solution for the benefit of other people with...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
0
by: jpauthement | last post by:
I have an application which searches through a comma-delimited text file which looks similar to this: "012233010","PAMIC 6X8","FA","0.000","0.000" "012233011","PAMIC 8X8","FA","1.000","0.000" ...
1
by: jshaulis06 | last post by:
ok, heres my problem. im trying to make a program that searches a text file with a key word created by the user and prints it in a list box. sorta like a dictionay program in vb.net heres what i...
1
by: Osoccer | last post by:
...to a different folder and in the relocated file concatenates all of the lines in one long string with a space between each line element. Here is a fuller statement of the problem: I need a...
3
by: Gary | last post by:
Hi in a simple application that consists of a couple of user input forms. I'm wondering what the difference is between using a database technology and a plain text file? I've been working on...
8
by: Max Steel | last post by:
Hey gang, I'm new to python coding. I'm trying to find the simplest way to open a text file (on the same server) and display it's content. The text file is plain text (no markup language of any...
1
by: 848lu | last post by:
hi, im trying to do a search on my Array, where a users searches on a HTML screen and then the PHP searches the array for the data under the roice entred by user.....text file data is under the php...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.