473,889 Members | 1,927 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Script that searches for text file

3 New Member
I am very new to UNIX. I need a script that searches for a text file with most occurrences of a given keyword.
May 11 '07 #1
17 6595
savanm
85 New Member
hi,

I cant get ur need clearly

find pathname -name *.txt -exec ls -l {} \;
May 11 '07 #2
savanm
85 New Member
find pathname -name '*.txt'
May 11 '07 #3
heine6ken
3 New Member
Thanks for your assistance below is what I am being asked:



Due Day 7 (Monday):



Create a script that searches for a text file with most occurrences of a given keyword. For instance, if I would like to search for a script with most usages of if statement, I would execute:



find_most_relev ant.sh /home/yevgeniy/myscripts if



The script find_most_relev ant.sh should take two arguments. The first one is the directory where text files are located (recursive search is optional for this assignment). The second argument is a keyword. The output of the script should either state:



No file with word <keyword> found in the directory <directory-name>.



Or



file <filename>: found X occurrences of word <keyword>



(for the file with most occurrences of keyword only)



Hint 1: If you would like, you can use command grep -o key filename | wc -l to count number of occurrences in a single file.



Hint 2: My version of full-credit script is only 22 lines long. Yours can be either more or less than that - just don't assume that this assignment asks for anything terribly complicated.
May 11 '07 #4
ghostdog74
511 Recognized Expert Contributor
Expand|Select|Wrap|Line Numbers
  1. #!bin/sh
  2. awk -v search='if' '{
  3. n=gsub(search, "");
  4. count+=n;calc[FILENAME]=count;
  5. }
  6. END{
  7. for (i in calc) print "File: " i " Count: " calc[i]
  8. }' *.sh
  9.  
if leave you to do the rest.
May 12 '07 #5
heine6ken
3 New Member
Thanks a million this will help me a lot.
May 12 '07 #6
prn
254 Recognized Expert Contributor
I couldn't resist:

Expand|Select|Wrap|Line Numbers
  1. grep -c $2 $1 | sort -n -t':' +1 | tail -1 | cut -f1 -d':'
  2.  
Paul
May 22 '07 #7
ghostdog74
511 Recognized Expert Contributor
I couldn't resist:

Expand|Select|Wrap|Line Numbers
  1. grep -c $2 $1 | sort -n -t':' +1 | tail -1 | cut -f1 -d':'
  2.  
Paul
grep -c counts only matching lines with the search word, but it will not count number of the same word that appears on a line..
eg
Expand|Select|Wrap|Line Numbers
  1. this is a line but this is last part of the line.
  2.  
grep -c "line" "file"
will give count of 1, not 2...
May 23 '07 #8
prn
254 Recognized Expert Contributor
grep -c counts only matching lines with the search word, but it will not count number of the same word that appears on a line..
Good catch, ghostdog.

It looks like there's no simple one-liner for this. Actually I find a couple more problems, now that I look more closely. I decided to see what I could come up with using perl and I came up with this (hardly a one-liner):

Expand|Select|Wrap|Line Numbers
  1. #! /usr/bin/perl
  2. use strict;
  3. my $dirname = $ARGV[0];
  4. my $pat = $ARGV[1];
  5. opendir DIR, $dirname or die "could not open $dirname: $!\n";
  6. my @files = grep !/^\./, readdir DIR;
  7. closedir DIR;
  8. my %counts;
  9. foreach my $file (@files) {
  10.   my $cnt=0;
  11.   open IN, "<$file" or die "Could not open $file for read: $!\n";
  12.   while (<IN>){ $cnt += s/$pat/$pat/g; }
  13.   close IN;
  14.   $counts{$file} = $cnt;
  15. }
  16. foreach my $k (reverse sort {$counts{$a} cmp $counts{$b}} keys %counts) {print "$counts{$k} \t $k \n" };
I saved this as count.pl and since there were not many occurrence of "if" in the directory where I was going to test it, I ran it with the line:
Expand|Select|Wrap|Line Numbers
  1. ./count.pl . for
and got output like:
Expand|Select|Wrap|Line Numbers
  1. 3        count.pl 
  2. 3        count.pl~ 
  3. 2        test.txt 
  4. 2        count.sh 
  5. ...
Note that count.pl is the script itself and you can see the string "for" in lines 9, 11 and 16.
I also tried your (ghostdog74's) script, suitably modified:
Expand|Select|Wrap|Line Numbers
  1. #! /bin/sh
  2. awk -v search='for' '{
  3.   n=gsub(search, "");
  4.   count+=n;calc[FILENAME]=count;
  5. }
  6. END{
  7.   for (i in calc) print "File: " i " Count: " calc[i]
  8. }' *
(Note that I changed the search target to "for" and the fileglob to "*" instead of "*.sh". I also added the leading slash to "/bin/sh".)
I got output like this:
Expand|Select|Wrap|Line Numbers
  1. ...
  2. File: count.sh Count: 8
  3. File: test.txt Count: 14
  4. ...
  5. File: count.pl Count: 3
  6. ...
  7. File: count.pl~ Count: 6
(Omitting several others to show just the same files as the top four from the Perl output.)
The files did not change between these two runs, yet the awk script is giving counts that are wildly at variance with the correct answers. You can see the actual files count.pl and count.sh in this post and they respectively contain 3 and 1 instances of the string "for", not 3 and 8 instances. The previous version of the perl script, count.pl~ also contains 3 instances, not 6. I don't know what is happening here.

By now, I should point out why I have been talking about 'the string "for"' rather than 'the word "for"'. The perl script, at least, has also been finding and counting "foreach", "format" or "performanc e" as instances of "for", which is probably not what the OP wanted. The most straightforward fix is probably to replace line 12 in count.pl with
Expand|Select|Wrap|Line Numbers
  1.   while (<IN>){ $cnt += s/\b$pat\b/$pat/g; }
where the "\b" represents a "word boundary" declaration (including whitespace, punctuation, line ends, etc.). Now count.pl outputs:
Expand|Select|Wrap|Line Numbers
  1. 2        count.sh 
  2. ...
  3. 1        test.txt 
  4. ...
  5. 1        count.pl 
  6. 1        count.pl~ 
for the same four files and this count is only for the actual word "for" with none of the spurious matches from the other version above.

So now I know where I went wrong before, but I'm not at all sure what is wrong with the awk. I may have done something to it, but I don't know what. I copied the script from post 5, pasted it into a script, and made only the 3 changes I listed. We may be using different awks, but that seems odd too. The one I am testing with is on a box running Fedora Core 6 and for version information:
Expand|Select|Wrap|Line Numbers
  1.  rpm -qa | grep awk
  2. gawk-3.1.5-14.fc6
Best Regards,
Paul
May 23 '07 #9
Motoma
3,237 Recognized Expert Specialist
Yikes! Homework question!
May 23 '07 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

1
9084
by: David Thomas | last post by:
Hi there, a while ago, I posted a question regarding reading japanese text from a text file. Well, since I solved the problem, I thought I'd post my solution for the benefit of other people with the same problem. The plan was to make a script to read and display japanese text. I will use it for making a japanese proverb script and for a japanese language study script.
60
49245
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't be indexed/sorted. I don't want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a requirement -- the target is to locate the string in 10 seconds or less for a 100 MB file. The...
0
935
by: jpauthement | last post by:
I have an application which searches through a comma-delimited text file which looks similar to this: "012233010","PAMIC 6X8","FA","0.000","0.000" "012233011","PAMIC 8X8","FA","1.000","0.000" The text file is a dump from an asp/400 system. And it is automatically ftp-ed to the website. The question i have is: Is it possible to do a query on the text file similar to
1
1632
by: jshaulis06 | last post by:
ok, heres my problem. im trying to make a program that searches a text file with a key word created by the user and prints it in a list box. sorta like a dictionay program in vb.net heres what i got so far dim sr as system.io.streamreader=file.opentext.fromfile("file.txt") dim def, word as string def=listbox.text word=textbox.text
1
4785
by: Osoccer | last post by:
...to a different folder and in the relocated file concatenates all of the lines in one long string with a space between each line element. Here is a fuller statement of the problem: I need a Visual Basic Script file, call it "Move and Reformat Text File.VBS," that will run from a Windows Script Host command-prompt-based version as follows: C:\> Cscript.exe "Move and Reformat Text File.VBS" The objective of the VBScript file, "Move...
3
2084
by: Gary | last post by:
Hi in a simple application that consists of a couple of user input forms. I'm wondering what the difference is between using a database technology and a plain text file? I've been working on this program for a week or so now and its a hobby project. I have never understood how to work with databases in visual studio (although I have strung together a couple of Microsoft access databases in the past, by trial and error.) As I don't...
8
2002
by: Max Steel | last post by:
Hey gang, I'm new to python coding. I'm trying to find the simplest way to open a text file (on the same server) and display it's content. The text file is plain text (no markup language of any kind). The filename gets found and placed into a variable named I'm using python to find the text files, and a .ezt template to display the list of the files. But I can't figure out how to read the contents of the file, and display it...
1
2042
by: 848lu | last post by:
hi, im trying to do a search on my Array, where a users searches on a HTML screen and then the PHP searches the array for the data under the roice entred by user.....text file data is under the php code <HTML> <BODY> <? $filename = "cdlist.txt"; $filepointer = fopen($filename,"r"); // open for read $cdArray = file ($filename);
0
11203
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10794
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10896
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10443
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9612
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7999
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7151
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5830
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4251
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.