469,315 Members | 1,802 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,315 developers. It's quick & easy.

Compare Two csv files using perl

Vasuki Masilamani
Hi,

Can any one help me in writing a script in Perl to compare two csv files and pick out the records which show differences?

Any responses would be appreciated.

Thanks,
Vasuki
May 17 '07 #1
11 13628
KevinADC
4,059 Expert 2GB
post your current code and someone will probably help.
May 17 '07 #2
I tried and got the entire script. It is work fine now. Please find the script below.

Expand|Select|Wrap|Line Numbers
  1. $f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
  2. open FILE1, "$f1" or die "Could not open file chm_dirx_bud_28.csv \n";
  3. $f2= 'C:\Vasuki\chm_dirx_bud_29.csv';
  4. open FILE2, "$f2" or die "Could not open file chm_dirx_bud_29.csv \n";
  5.  
  6. $outfile = 'C:\Vasuki\chm_dirx_bud.csv';
  7.  
  8. my @outlines;
  9.  
  10. foreach (<FILE1>) {
  11.     $y = 0;
  12.     $outer_text = $_;
  13.  
  14.     seek(FILE2,0,0);
  15.  
  16.     foreach (<FILE2>) {
  17.         $inner_text = $_;
  18.  
  19.         if($outer_text eq $inner_text) {
  20.             $y = 1;
  21.             print "Match Found \n";
  22.             last;
  23.         }
  24.     }
  25.  
  26.     if($y != 1) {
  27.         print "No Match Found \n";
  28.         push(@outlines, $outer_text);
  29.     }
  30. }
  31.  
  32. open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
  33. print OUTFILE @outlines;
  34. close OUTFILE;
  35.  
  36. close FILE1;
  37. close FILE2;
  38.  
This script is running very slow in case of large number of records. Can anyone suggest some ideas to fine tune this script? Thanks in advance.
May 17 '07 #3
miller
1,089 Expert 1GB
Well, of course it's slow. You're scanning through a large portion of file2 for every line in file1. This means that your your execute time is relative to the square of the size of the files.

Ignoring your current algorithm for now though, I would suggest that you look into a cpan module to do this for you.

cpan Text::Diff


The fact that your files are CSV files is irrelavent for what you're trying to do, so just go back to simply file comparing. I don't know what type of output this module will provide, but I'm almost certainly that it can be adapted in such a way to acheive the results you desire.

- Miller
May 17 '07 #4
KevinADC
4,059 Expert 2GB
if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.
May 17 '07 #5
AdrianH
1,251 Expert 1GB
if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.
The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


Adrian
May 18 '07 #6
AdrianH
1,251 Expert 1GB
The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


Adrian
Rethinking this, if the key is at begining of the line, you could sort and then use diff.


Adrian
May 18 '07 #7
KevinADC
4,059 Expert 2GB
Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
May 18 '07 #8
AdrianH
1,251 Expert 1GB
Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
I'm not assuming Unix. There are GNU ports of Unix utilities all over the place.


Adrian
May 18 '07 #9
KevinADC
4,059 Expert 2GB
True enough

(filler for message too short)
May 18 '07 #10
ghostdog74
511 Expert 256MB
you can try memory mapping
May 20 '07 #11
ad4x2l
1
csvdiff a GPL Perl Tool
Sep 27 '07 #12

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by Sam | last post: by
8 posts views Thread by pjsimon | last post: by
1 post views Thread by ibmcmr | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
1 post views Thread by Geralt96 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.