473,387 Members | 1,678 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

grep or a simple script?

10
Since I found this very helpful forum and friendly people, I have a small request.

I use AIX Unix and would like to know if there's a simple way how to do a kind of compare between two files. Should I use some grep or script?

I have 2 text files, let's name them file1 and file2.

What I need:

I want to read line by line file1 - certain field or better 2 fields of the file1 and look up the information in the file2.

To be more descriptive: I'd like to extract field1 at position 10-15 and field2 at position 25-31 from file1 and search or find if the same information is somewhere inside the file2. When no, I'd like to output the field1 & field2 off the file1 doesn't exit in file2.

let's say file1 contains something like:

123232 3232 2323 2323123123213 trterert

and file2 e.g.

123232 3232 2323 2323123123213 XXXXXXXX

and let's say I want 123232 & trterert from file1 to be searched in file2 ==> I get an "error" because the script would find only 123232 and not trterert.....

I'd be more than happy if someone can help me out or recommend a link or appropriate command or script.

I just need to check some record(s) in one file - and if they're available in the other file...

Any suggestions please? Thanks.
Apr 9 '08 #1
18 3576
ashitpro
542 Expert 512MB
check the below script:
Expand|Select|Wrap|Line Numbers
  1.  
  2. for a_res in `awk '{print $1 $5}' file1`
  3. do
  4.         for b_res in `awk '{print $1 $5}' file2`
  5.         do
  6.                 if [ $a_res == $b_res ]
  7.                 then
  8.                         echo "match found"
  9.                 fi
  10.         done
  11. done
  12.  
  13.  
here $1 and $5 are the number of columns that you want to match..
you can change it according to your requirement.
Apr 9 '08 #2
netrom
10
Thanks...and if not every record has the same column layout? Is there a possibility extract those 2 fields from certain position in the first file and look them up in the second file?
Apr 9 '08 #3
ashitpro
542 Expert 512MB
Thanks...and if not every record has the same column layout? Is there a possibility extract those 2 fields from certain position in the first file and look them up in the second file?

How would you extract 'those 2 fields' from first file?
I mean there must be some logic..

Like in above code we are sure that 1st and 5th field is to be checked..
Now if your column layout is something like below:

1st record:5 columns
2nd record:3 columns
3 record:7 columns..... etc

on which basis we'll find fields from records?
Apr 9 '08 #4
netrom
10
It's like this, file#1:

[line 1]124323432423 423423423423423423443243243432432424
[line 2]432432432 4234234 434324324 4343423432423423423

extract only for example positions: 1-6 and then 20-27 that is 2 strings from this file...

so from line 1: string1=124323 and string2=2342342
and THEN lookup these 2 strings in file#2, that is if there any line in file#2 that contains both string1 and string2 from file#1.

I'm sorry to make it unclear - I'm really an amateur and can't explain in programming language :-)

thanks for all your help!
Apr 9 '08 #5
ashitpro
542 Expert 512MB
check this out..
here we'll extract two fields (1-6 and 20-27 ) from each line...and look into other file for it's occurrence....
we are using grep command...just make sure that file2 always present otherwise...Bang......


Expand|Select|Wrap|Line Numbers
  1.  
  2. for line in `cat file1`
  3. do
  4.         f1=""
  5.         f2=""
  6.         for (( i = 1 ; i <= 6 ; i++ ))
  7.         do
  8.                 p=`echo $line | cut -c $i`
  9.                 f1=$f1$p
  10.         done
  11.  
  12.         echo "First word is :$f1"      
  13.  
  14.         for (( i = 20 ; i <= 27 ; i++ ))
  15.         do
  16.                 p=`echo $line | cut -c $i`
  17.                 f2=$f2$p
  18.         done
  19.  
  20.         echo "second word is :$f2"
  21.  
  22.         res=`grep "$f1.*$f2" file2`
  23.  
  24.         [ -z $res ]
  25.         if [ $? -eq 1 ]
  26.         then
  27.                 echo "match found...for line:$line"
  28.         fi
  29.  
  30. done
  31.  
  32.  
Apr 9 '08 #6
netrom
10
When I inserted the code and run, it says:

0403-057 Syntax error at line 5 : `(' is not expected.

Can this be fixed please?
Apr 9 '08 #7
prn
254 Expert 100+
Given: file1=
Expand|Select|Wrap|Line Numbers
  1. 124323432423 423423423423423423443243243432432424
  2. 432432432 4234234 789789789 58769507-67986785765
and file2=
Expand|Select|Wrap|Line Numbers
  1. [line 1]The quick brown fox jumps over the lazy dog.
  2. [line 2]124323432423 423423423423423423443243243432432424
  3. [line 3]Jackdaws love my big sphinx of quartz.
  4. [line 4]Pack my box with five dozen liquor jugs.
How about something like;
Expand|Select|Wrap|Line Numbers
  1. #! /bin/bash
  2.  
  3. PATFILE="file1"
  4. TESTFILE="file2"
  5.  
  6. cat $PATFILE | while read LINE
  7. do
  8.         STR1=`echo $LINE | cut -c1-6`
  9.         STR2=`echo $LINE | cut -c21-27`
  10.         echo "str1 is $STR1     str2 is $STR2"
  11.         RESULT=`grep $STR1.*$STR2 $TESTFILE`
  12.         echo "result is $RESULT"
  13. done
  14.  
with the result:
Expand|Select|Wrap|Line Numbers
  1. [prn@deimos ~]$ netrom.sh
  2. str1 is 124323  str2 is 2342342
  3. result is [line 2]124323432423 423423423423423423443243243432432424
  4. str1 is 432432  str2 is 9789789
  5. result is
  6.  
You can then modify that for the results you actually want.

HTH,
Paul
Apr 9 '08 #8
ashitpro
542 Expert 512MB
Given: file1=
Expand|Select|Wrap|Line Numbers
  1. 124323432423 423423423423423423443243243432432424
  2. 432432432 4234234 789789789 58769507-67986785765
and file2=
Expand|Select|Wrap|Line Numbers
  1. [line 1]The quick brown fox jumps over the lazy dog.
  2. [line 2]124323432423 423423423423423423443243243432432424
  3. [line 3]Jackdaws love my big sphinx of quartz.
  4. [line 4]Pack my box with five dozen liquor jugs.
How about something like;
Expand|Select|Wrap|Line Numbers
  1. #! /bin/bash
  2.  
  3. PATFILE="file1"
  4. TESTFILE="file2"
  5.  
  6. cat $PATFILE | while read LINE
  7. do
  8.         STR1=`echo $LINE | cut -c1-6`
  9.         STR2=`echo $LINE | cut -c21-27`
  10.         echo "str1 is $STR1     str2 is $STR2"
  11.         RESULT=`grep $STR1.*$STR2 $TESTFILE`
  12.         echo "result is $RESULT"
  13. done
  14.  
with the result:
Expand|Select|Wrap|Line Numbers
  1. [prn@deimos ~]$ netrom.sh
  2. str1 is 124323  str2 is 2342342
  3. result is [line 2]124323432423 423423423423423423443243243432432424
  4. str1 is 432432  str2 is 9789789
  5. result is
  6.  
You can then modify that for the results you actually want.

HTH,
Paul

this one is great....
by the way...are you using 'bash' or something else
Apr 9 '08 #9
netrom
10
Thanks a lot for your help...just tried the latest example...since the output of every line gets into $LINE, the positions are now changed and I had to change from-to in the command cut.... the results were not correct though....it somehow changed the look of each line - perhaps it was cause by spaces, slashes and various characters.

Is there another way like excluding read lines by lines, instead direct cutting with command cut and then finding the string1 and string2 in file2?

Thanks again.
Apr 9 '08 #10
prn
254 Expert 100+
Ashitpro: Yes, my example uses bash, but it ought to work with the Bourne shell or Korn shell too.

Thanks a lot for your help...just tried the latest example...since the output of every line gets into $LINE, the positions are now changed and I had to change from-to in the command cut.... the results were not correct though....it somehow changed the look of each line - perhaps it was cause by spaces, slashes and various characters.

Is there another way like excluding read lines by lines, instead direct cutting with command cut and then finding the string1 and string2 in file2?

Thanks again.
Netrom: I don't understand at all what you mean by "the output of every line gets into $LINE" I sort of gather that you must mean the character positions in the lines of file1 are not constant. Is that it?

If that is what you mean, then is there some other way of determining what strings you are looking for? There are lots of ways to isolate substrings from a larger string, but neither I nor the script can read your mind. There absolutely must be some way to recognize the strings you need. The algorithm can be quite complex, but there must be one. I'm going to assume that the strings in your example are not the real data here. Can you post real data? Or if the real data is confidential (and that would not be at all surprising), perhaps you could post somewhat sanitized data? It is just impossible to suggest a way to extract the strings without some clue about what the strings look like.

Best Regards,
Paul
Apr 9 '08 #11
ghostdog74
511 Expert 256MB
Any suggestions please? Thanks.
Expand|Select|Wrap|Line Numbers
  1. # more file
  2. 123232 3232 2323 2323123123213 trterert
  3. # more file1
  4. 123232 3232 2323 2323123123213 XXXXXXXX
  5. 123232 3232 2323 2323123123213 trterert
  6. # ./test.sh
  7. 123232 or trterert not in line 1
  8. 123232 3232 2323 2323123123213 trterert
  9. # cat test.sh
  10. #!/bin/sh
  11.  
  12. awk 'FNR==NR{ a[FNR]=$0;next }
  13. {
  14.  for( i in a ){
  15.     if  ( ( a[i] ~ $1) && ( a[i] ~ $NF ) ) {
  16.         print $0
  17.     }else {
  18.         print $1 " or "$NF " not in line "FNR
  19.     }
  20.  }
  21. }
  22. ' file1 file
  23.  
  24.  
Apr 9 '08 #12
netrom
10
Ok, Paul, I know what you mean, so I've uploaded 2 test files, each one with 5 records/lines at: http://thetechiebuddy.com/hm/

The filenames are file1_test and file2_test. I've amended the records - so they're not real data....what I like:

search line by line and get string1 (length=16, position 63-78) and string2 (length=6, positions (183-188) in file1_test

...then

search for those 2 strings in each line of file2_test, that is e.g. string1 is 1234567890123456 and string2 is e.g. 998877

and then search for both these string in file2_test if there are line that contains these strings (at this moment it doesn't matter what positions are the strings found in file2_test - the aim is to find the 2 strings in one line or more lines - if there's any of course)....

then the output may say: string1 and string2 found in file2_test here, or better string1 and string2 NOT found (anywhere) in file2_test.

Many thanks Paul! You're very helpful.
Apr 9 '08 #13
netrom
10
Expand|Select|Wrap|Line Numbers
  1. # more file
  2. 123232 3232 2323 2323123123213 trterert
  3. # more file1
  4. 123232 3232 2323 2323123123213 XXXXXXXX
  5. 123232 3232 2323 2323123123213 trterert
  6. # ./test.sh
  7. 123232 or trterert not in line 1
  8. 123232 3232 2323 2323123123213 trterert
  9. # cat test.sh
  10. #!/bin/sh
  11.  
  12. awk 'FNR==NR{ a[FNR]=$0;next }
  13. {
  14.  for( i in a ){
  15.     if  ( ( a[i] ~ $1) && ( a[i] ~ $NF ) ) {
  16.         print $0
  17.     }else {
  18.         print $1 " or "$NF " not in line "FNR
  19.     }
  20.  }
  21. }
  22. ' file1 file
  23.  
  24.  
will try this as well. many thanks!
Apr 9 '08 #14
prn
254 Expert 100+
Ok, Paul, I know what you mean, so I've uploaded 2 test files, each one with 5 records/lines at: http://thetechiebuddy.com/hm/

The filenames are file1_test and file2_test. I've amended the records - so they're not real data....what I like:

search line by line and get string1 (length=16, position 63-78) and string2 (length=6, positions (183-188) in file1_test

...then

search for those 2 strings in each line of file2_test, that is e.g. string1 is 1234567890123456 and string2 is e.g. 998877

and then search for both these string in file2_test if there are line that contains these strings (at this moment it doesn't matter what positions are the strings found in file2_test - the aim is to find the 2 strings in one line or more lines - if there's any of course)....

then the output may say: string1 and string2 found in file2_test here, or better string1 and string2 NOT found (anywhere) in file2_test.

Many thanks Paul! You're very helpful.
Hi Netrom,

OK. I've downloaded those two files and made the corresponding changes in the script. It might have helped if you had used an example where there was at least one match, though.

Here's a script:
Expand|Select|Wrap|Line Numbers
  1. #! /bin/bash
  2.  
  3. PATFILE="file1_test"
  4. TESTFILE="file2_test"
  5.  
  6. cat $PATFILE | while read LINE
  7. do
  8.         STR1=`echo $LINE | cut -c63-78`
  9.         STR2=`echo $LINE | cut -c183-188`
  10.         echo "str1 is $STR1     str2 is $STR2"
  11.         RESULT=`grep $STR1.*$STR2 $TESTFILE`
  12.         if [ ! -z $RESULT ]; then
  13.                 echo $STR1 and $STR2 both found in \"$RESULT\"
  14.         else
  15.                 echo $STR1 and $STR2 not found in \"$TESTFILE\"
  16.         fi
  17.         echo
  18. done
and its output is:
[prn@deimos ~]$ netrom.sh
str1 is 7020800034800000 str2 is 348000
7020800034800000 and 348000 not found in "file2_test"

str1 is 7295700034800000 str2 is 348000
7295700034800000 and 348000 not found in "file2_test"

str1 is 9851500034800000 str2 is 348000
9851500034800000 and 348000 not found in "file2_test"

str1 is 3840000034800000 str2 is 348000
3840000034800000 and 348000 not found in "file2_test"

str1 is 0760100034800000 str2 is 348000
0760100034800000 and 348000 not found in "file2_test"
Is this what you wanted?

Best Regards,
Paul
Apr 9 '08 #15
netrom
10
I'm eager to test it Paul! Will let you know.

Many thanks!
Apr 10 '08 #16
netrom
10
Here's the problem, Paul:

when I just cut the file1_test I'll receive different output:

$ cut -c 63-78 file1_test

4543534554344637
5243534543534017
5345345345344370
5454353453445505
5445345435342216

and then the second cut:

$ cut -c 183-188 file1_test

970232
R63908
941080
711285
053621

But when running your script the output, $STR1 and $STR2 is different. So it seems like when you read the file with cat and put the result in $LINE - it's a bit different and it actually 'changes' the structure of the records....

Is there a solution to this?

Thanks in advance.
Apr 10 '08 #17
prn
254 Expert 100+
Here's the problem, Paul:
...
But when running your script the output, $STR1 and $STR2 is different. So it seems like when you read the file with cat and put the result in $LINE - it's a bit different and it actually 'changes' the structure of the records....

Is there a solution to this?

Thanks in advance.
Hi Netrom,

I haven't managed to figure out just exactly what the problem is, but I think I can confidently say that the script does not 'change' the file structure.

It looks a lot like it has something to do with the character encoding in the file. I also note that when I examine the file with "less" it is different from what it looks like if I use emacs or vi. It looks like you are Magyar (Hungarian for the English speakers :) ) so it does not surprise me that you are using a different character encoding, but what I don't yet understand is why it is showing up differently in the script from when you just type the command at the prompt. I looked at the environment by typing "env" at the prompt and by putting the same command in the script and did could not find any interesting differences.

At the moment, I don't understand what could be causing this difference when the same command is run in the script and at the prompt. It ought to be something about the environment, but that does not seem to be the case. In particular, the LANG environment variable, which in my case is LANG=en_US, is the same in both situations.

I'll undoubtedly have more thoughts in a while, but right now, I'm puzzled.

Paul
Apr 10 '08 #18
netrom
10
Thank you very much Paul.
Apr 15 '08 #19

Sign in to post your reply or Sign up for a free account.

Similar topics

13
by: sf | last post by:
Just started thinking about learning python. Is there any place where I can get some free examples, especially for following kind of problem ( it must be trivial for those using python) I have...
2
by: John E. Jardine | last post by:
Hi, Problem: Executing 's///' has a side effect on grep null string matching. If line 62, the substitution, is executed the last two values returned by grep and printed on lines 68, 69 are...
3
by: David Isaac | last post by:
What's the standard replacement for the obsolete grep module? Thanks, Alan Isaac
4
by: jonniethecodeprince | last post by:
I've found a program and interpreter for PERL 5.8 so to everyone who tried to help me with that thank you. Now my attention turns to the GREP function in PERL. From what I understand GREP is a...
2
by: ravir | last post by:
Hi, I am new to this group. I am working in Perl and shellscripts. I have a clarification regarding perl grep and pattern matching. I am writing a perl script to automate the process of code...
13
by: Anton Slesarev | last post by:
I've read great paper about generators: http://www.dabeaz.com/generators/index.html Author say that it's easy to write analog of common linux tools such as awk,grep etc. He say that performance...
47
by: Henning_Thornblad | last post by:
What can be the cause of the large difference between re.search and grep? This script takes about 5 min to run on my computer: #!/usr/bin/env python import re row="" for a in range(156000):...
2
by: ajd335 | last post by:
Hi all, When I use the "find" and "grep" command on SSH shell gives me correct results. But when i use the same line in PHP....It's not working. The line of code is find 2008-06* -name...
8
by: Peter Otten | last post by:
hofer wrote: for line in open("file"): # read from file try: a, b = map(int, line.split(None, 2)) # remove extra columns, # convert to integer except ValueError: pass # remove comments,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.