Compare Two csv files using perl

18 New Member

Hi,

Can any one help me in writing a script in Perl to compare two csv files and pick out the records which show differences?

Any responses would be appreciated.

Thanks,
Vasuki

May 17 '07 #1

Subscribe Reply

14199

KevinADC

4,059

Recognized Expert Specialist

post your current code and someone will probably help.

May 17 '07 #2

Vasuki Masilamani

New Member

I tried and got the entire script. It is work fine now. Please find the script below.

Expand|Select|Wrap|Line Numbers

 
$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';

open FILE1, "$f1" or die "Could not open file chm_dirx_bud_28.csv \n";

$f2= 'C:\Vasuki\chm_dirx_bud_29.csv';

open FILE2, "$f2" or die "Could not open file chm_dirx_bud_29.csv \n";
 
$outfile = 'C:\Vasuki\chm_dirx_bud.csv';
 
my @outlines;
 
foreach (<FILE1>) {

    $y = 0;

    $outer_text = $_;
 
    seek(FILE2,0,0);
 
    foreach (<FILE2>) {

        $inner_text = $_;
 
        if($outer_text eq $inner_text) {

            $y = 1;

            print "Match Found \n";

            last;

        }

    }
 
    if($y != 1) {

        print "No Match Found \n";

        push(@outlines, $outer_text);

    }

}
 
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";

print OUTFILE @outlines;

close OUTFILE;
 
close FILE1;

close FILE2;

This script is running very slow in case of large number of records. Can anyone suggest some ideas to fine tune this script? Thanks in advance.

May 17 '07 #3

miller

1,089

Recognized Expert Top Contributor

Well, of course it's slow. You're scanning through a large portion of file2 for every line in file1. This means that your your execute time is relative to the square of the size of the files.

Ignoring your current algorithm for now though, I would suggest that you look into a cpan module to do this for you.

cpan Text::Diff

The fact that your files are CSV files is irrelavent for what you're trying to do, so just go back to simply file comparing. I don't know what type of output this module will provide, but I'm almost certainly that it can be adapted in such a way to acheive the results you desire.

- Miller

May 17 '07 #4

KevinADC

4,059

Recognized Expert Specialist

if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.

May 17 '07 #5

AdrianH

1,251

Recognized Expert Top Contributor

if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.

The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.

Adrian

May 18 '07 #6

AdrianH

1,251

Recognized Expert Top Contributor

The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.

Adrian

Rethinking this, if the key is at begining of the line, you could sort and then use diff.

Adrian

May 18 '07 #7

KevinADC

4,059

Recognized Expert Specialist

Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';

May 18 '07 #8

AdrianH

1,251

Recognized Expert Top Contributor

Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';

I'm not assuming Unix. There are GNU ports of Unix utilities all over the place.

Adrian

May 18 '07 #9

KevinADC

4,059

Recognized Expert Specialist

True enough

(filler for message too short)

May 18 '07 #10

ghostdog74

511

Recognized Expert Contributor

you can try memory mapping

May 20 '07 #11

ad4x2l

New Member

csvdiff a GPL Perl Tool

Sep 27 '07 #12

by: Sam | last post by:

I would like to store html templates in a database. By using perl I would like to retrive the template ask the user to fill the template and store the whole file is template + the user data in a...

Compare Two csv files using perl

Similar topics