473,432 Members | 1,522 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,432 software developers and data experts.

Compare Two csv files using perl

Vasuki Masilamani
Hi,

Can any one help me in writing a script in Perl to compare two csv files and pick out the records which show differences?

Any responses would be appreciated.

Thanks,
Vasuki
May 17 '07 #1
11 14197
KevinADC
4,059 Expert 2GB
post your current code and someone will probably help.
May 17 '07 #2
I tried and got the entire script. It is work fine now. Please find the script below.

Expand|Select|Wrap|Line Numbers
  1. $f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
  2. open FILE1, "$f1" or die "Could not open file chm_dirx_bud_28.csv \n";
  3. $f2= 'C:\Vasuki\chm_dirx_bud_29.csv';
  4. open FILE2, "$f2" or die "Could not open file chm_dirx_bud_29.csv \n";
  5.  
  6. $outfile = 'C:\Vasuki\chm_dirx_bud.csv';
  7.  
  8. my @outlines;
  9.  
  10. foreach (<FILE1>) {
  11.     $y = 0;
  12.     $outer_text = $_;
  13.  
  14.     seek(FILE2,0,0);
  15.  
  16.     foreach (<FILE2>) {
  17.         $inner_text = $_;
  18.  
  19.         if($outer_text eq $inner_text) {
  20.             $y = 1;
  21.             print "Match Found \n";
  22.             last;
  23.         }
  24.     }
  25.  
  26.     if($y != 1) {
  27.         print "No Match Found \n";
  28.         push(@outlines, $outer_text);
  29.     }
  30. }
  31.  
  32. open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
  33. print OUTFILE @outlines;
  34. close OUTFILE;
  35.  
  36. close FILE1;
  37. close FILE2;
  38.  
This script is running very slow in case of large number of records. Can anyone suggest some ideas to fine tune this script? Thanks in advance.
May 17 '07 #3
miller
1,089 Expert 1GB
Well, of course it's slow. You're scanning through a large portion of file2 for every line in file1. This means that your your execute time is relative to the square of the size of the files.

Ignoring your current algorithm for now though, I would suggest that you look into a cpan module to do this for you.

cpan Text::Diff


The fact that your files are CSV files is irrelavent for what you're trying to do, so just go back to simply file comparing. I don't know what type of output this module will provide, but I'm almost certainly that it can be adapted in such a way to acheive the results you desire.

- Miller
May 17 '07 #4
KevinADC
4,059 Expert 2GB
if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.
May 17 '07 #5
AdrianH
1,251 Expert 1GB
if the file isn't too large, I would try reading the first file into a hash and just increment the hash while reading the second file. I think Text::Diff might be overkill if it's just a simple comparison of matching lines between the two files. Text::Diff also has the unfortunate behavior of slurping all files into memory, which may or may not be a problem.
The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


Adrian
May 18 '07 #6
AdrianH
1,251 Expert 1GB
The easist way is to use something that is already made.

Try using diff. It is a Unix utility and is designed for this sort of work.

Of course it will not work if the records are not in the same order. In which case, you would have to go back to perl.


Adrian
Rethinking this, if the key is at begining of the line, you could sort and then use diff.


Adrian
May 18 '07 #7
KevinADC
4,059 Expert 2GB
Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
May 18 '07 #8
AdrianH
1,251 Expert 1GB
Why are you assuming unix? Looks like windows to me.

$f1 = 'C:\Vasuki\chm_dirx_bud_28.csv';
I'm not assuming Unix. There are GNU ports of Unix utilities all over the place.


Adrian
May 18 '07 #9
KevinADC
4,059 Expert 2GB
True enough

(filler for message too short)
May 18 '07 #10
ghostdog74
511 Expert 256MB
you can try memory mapping
May 20 '07 #11
ad4x2l
1
csvdiff a GPL Perl Tool
Sep 27 '07 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Sam | last post by:
I would like to store html templates in a database. By using perl I would like to retrive the template ask the user to fill the template and store the whole file is template + the user data in a...
8
by: pjsimon | last post by:
I want to compare two files like MS Visual SourceSafe's Show Differences feature. Is there a way to access methods in VB.Net that will let me use existing MS code to show the differences between...
3
crazy4perl
by: crazy4perl | last post by:
Hi All, I have some doubt related to xml. Actually I want to update a file which is in some format. So I am converting that file using Tap3edit perl module in a hash. Now I m trying to create a...
1
by: ibmcmr | last post by:
Hi Is there any way I can create PDF files from Postscript files uing perl ? I was using ghostscrtip till now, but it has some license problem using for business purpose. So is there any perl...
3
by: Davo1977 | last post by:
Does anyone know a regular expression that will rename multiple files that have different extensions to have the same extension. For example, you could use this code when several text files exist in...
0
by: koti688 | last post by:
Hi Mates, Can u Please tell me how to connect to Berkely Db using Perl . I have two files , i need to access these file using perl. These files with .db extension contains Keys and its...
3
by: vibhakhushi | last post by:
How to compare two files in perl. I have two files as shown below. First XML File <Data> <indep voltage> +1.20000000000e+01 </indep> <indep current> +5.08474576271e-04
3
by: rajesh dogra | last post by:
Hi All, i am trying to move a bunch of files in a newly created directory: here is what i am trying to do. : i read my source dir. : i search for a file which has the folder name information...
3
by: Susan StLouis | last post by:
I'm writing a program that can be used to compare files. The program features a select that contains a list of files. After selecting several of the files. a "Biggest" button can be pushed to find...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.