By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,811 Members | 1,978 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,811 IT Pros & Developers. It's quick & easy.

perl string comparison

P: 89
Hi,
I have one column of strings in 1st file file and another file which consists of 5 clumns in each line and my basic objective is to find each item/line of 1st file is available in 3rd column of 2 nd file.
And I tried the following logic. It might be bit round about way but as a beginner am trying as follows.

The column in the 1st file is having data as example
Expand|Select|Wrap|Line Numbers
  1. NS008_456_R0030_3008
The 2nd file data is as follows:

Expand|Select|Wrap|Line Numbers
  1.  
  2. +   test   NS008_456_R0030_3008   67   223
  3.  
My logic is as follows:

I am opening the 1st file in an array and for each item I am opening the second file and scanning through each line and checking whether the array content is equal to $V[2] of second file. The logic seems to work even though the search is taking time.

But I considered
Expand|Select|Wrap|Line Numbers
  1. NS008_456_R0030_3008   
as a string literal and my if loop is as below:

Expand|Select|Wrap|Line Numbers
  1.  
  2. if($rawdata[0] eq $v[2]) {
  3. do something here
  4. }
  5.  
  6.  
But it does not seem to work. Anything wrong in considering the data as string literal or when I read the file contents in an array, anymore maniputaion is wrong with string comparison? Please let me know. Regards
Jan 12 '09 #1
Share this Question
Share on Google+
10 Replies


numberwhun
Expert Mod 2.5K+
P: 3,503
@lilly07
Can you please post the rest of your code so that we can see how you go to this point? That will give us a better understanding all around.

Regards,

Jeff
Jan 12 '09 #2

KevinADC
Expert 2.5K+
P: 4,059
if you compare the strings using "eq" they must be an exact match, including spaces, control chracters, and upper/lower case of any alpha characters. My guess is that you need to chomp() the records in the first file before comparing to records in the second file, but why make us guess? Post the code. ;)
Jan 12 '09 #3

P: 89
Thx, Kevin I tried chomping the data from the first file but still it is not working. Please find my code as below. It is not complaining about compilation error. Even though I didn't copy, I retyped again.

Expand|Select|Wrap|Line Numbers
  1.  
  2. #!usr/bin/perl
  3. $first_data = "first.txt";
  4. open(DAT,$first_data) || die("Could not open file!");
  5. @search_data = <DAT>;
  6. $searchSize = scalar( @search_data);
  7.  
  8. $second_file = "second.txt";
  9. for($count=0, $count < $searchSize; $count++) {
  10.  open (RF, $second_file) || die("Could not open file!");
  11.  
  12.  $find_raw = @search_data[$count];
  13.  $find = chomp $find_raw;
  14.  
  15.   while($line=<RF>) {
  16.    chomp $line;
  17.    @v = split(/\s+/,$line);
  18.  
  19.     if($v[2] eq $find){
  20.      print "$line \n";
  21.     }
  22.   }
  23.  
  24.  close RF;
  25.  
Jan 12 '09 #4

P: 89
Actually the program works if I modify the following code
  1. Expand|Select|Wrap|Line Numbers
    1. $find_raw = @search_data[$count];
    2.  $find = chomp $find_raw;
as below:

Expand|Select|Wrap|Line Numbers
  1. $find_raw = @search_data[$count]; 
  2. chomp $find_raw;
  3.  
Is there any tricky way or shorter way for this kind of search as it takes a longer duration. Thanks.
Jan 13 '09 #5

KevinADC
Expert 2.5K+
P: 4,059
When you assign the return value of chomp to a scalar it returns the number of times chomp() was succesful. So in your case $find was probably either a 0 or 1.

This line:

$find_raw = @search_data[$count];

should be:

$find_raw = $search_data[$count];

using @ for a single array element is long deprecated. Use $ for a single array element and @ for multiple array elelments.
Jan 13 '09 #6

KevinADC
Expert 2.5K+
P: 4,059
If neither file is too big you can do something like this:

Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl
  2. use strict;
  3. use warnings;
  4.  
  5. my $first_data = "first.txt";
  6. open(DAT,$first_data) or die "Could not open file: $!";
  7. my @search_data = <DAT>;
  8. close DAT;
  9. chomp @search_data;
  10.  
  11. my $second_file = "second.txt";
  12. open (RF, $second_file) or die "Could not open file: $!";
  13. while(my $line = <RF>) {
  14.    chomp $line;
  15.    foreach my $find (@search_data) {
  16.       my $v = (split(/\s+/,$line))[2];
  17.       if ($v eq $find){
  18.          print "Found '$find' in second.txt at line number $. : [$line] \n";
  19.          last;
  20.       }
  21.    }
  22. }
  23. close RF;
  24.  
Jan 13 '09 #7

P: 89
yes Kevin, you are right initially after chomping the value was 1 and hence I overcame that as I did.

My objective is to find all the possible 1st file columns available in the second file and print them and hence
Expand|Select|Wrap|Line Numbers
  1.  last; 
may not work in my case. I just thought that whether I am doing a round about way? Thanks again.
Cheers
Jan 13 '09 #8

KevinADC
Expert 2.5K+
P: 4,059
Try the code I posted. "last" ends the "foreach" loop after an element in the array is found in the file. It then goes to the next line in the file and searches the entire array again. Now this entire process could probably be speeded up considerably using a hash and/or the memoize module.

Memoize - perldoc.perl.org
Jan 13 '09 #9

P: 89
Hi Kevin, Thanks for your help.

Basically my data file (second file looks as follows)

Expand|Select|Wrap|Line Numbers
  1.  
  2. +   test   NS008_456_R0030_3008   67   223 
  3. +   ghi    NS008_456_R0030_3678   17   678
  4. +   ggl    NS008_456_R0030_3678   17   270
  5. +   ghi    NS008_456_R0030_3672   17   209
  6. +   ghi    NS008_456_R0030_3690   17   280
  7. +   ghi    NS008_456_R0030_3690   15   267
  8.  
My objective is to find the records which has multiple enteries on the 3rd column. For example in the above case,
Expand|Select|Wrap|Line Numbers
  1. +   ghi    NS008_456_R0030_3678   17   678
  2. +   ggl    NS008_456_R0030_3678   17   270
  3.  
and
Expand|Select|Wrap|Line Numbers
  1. +   ghi    NS008_456_R0030_3690   17   280
  2. +   ghi    NS008_456_R0030_3690   15   267
  3.  
are the candidate record which I am interested.

And my logic is as follows:
1. I added the 3rd column and 4th in a hashmap and checked all the values in the hash map. If the value in the hash map is more than 1, then I collect them as a multiple records and store 3rd column
Expand|Select|Wrap|Line Numbers
  1. NS008_456_R0030_3690   
in a file ($first_file) Then I search for the records in the second_file as I had explained before. But this is taking enormous amount of time as the file is huge and hence extensive search.
Is tehre anyway to pick up from second_file directly. I need the records which shows multiple entries in the 3rd column. Please let me know.
I tried your code also and the sript is still executing and hence I thought let me explain you about the whole picture.
Thanks.
Jan 13 '09 #10

P: 89
I would like to know whether any shell script would do?
Jan 13 '09 #11

Post your reply

Sign in to post your reply or Sign up for a free account.