473,382 Members | 1,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Compare two files in perl

4
I have a problem. Currently I am trying to compare two text files which has high amount of data. I have developed a perl script to cross check both files. But it takes very long time. The codes are working fine for small number of data. The sample files are attached here.

I want the 1st line of chr.txt file to check all the lines in exon.txt. it should repeat the process until all the lines from chr.txt is checked with lines from exon.txt.

This the code which i developed.

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3.  
  4. my $file1 = "exon.txt";
  5. my $file2 = "chr.txt";
  6.  
  7. open(FILE1, $file1) || die "couldn't open the file!";
  8. open(FILE2, $file2) || die "couldn't open the file!";
  9.  
  10. open(OUT,">result.txt");
  11.  
  12. my @arr1 =<FILE1>;
  13. my @arr2 =<FILE2>;
  14.  
  15. foreach my $arr1 (@arr1){
  16.  
  17.     chomp $arr1;
  18.     my ($eChr,$eStart,$eEnd,$eCat)=split(/\t/,$arr1);
  19.  
  20.     foreach my $arr2 (@arr2) {
  21.  
  22.         my($cChr, $cStart, $cEnd)=split(/\t/, $arr2);
  23.         if (($mChr eq $eChr)&&($mStart >= $eStart) && ($mEnd <= $eEnd)) {
  24.                 print OUT "$mChr\t$mStart\t$mEnd\t$eCat\t$eStart\t$eEnd\n";
  25.  
  26.                 }
  27.             }
  28.         }
  29. close(FILE1);
  30. close(FILE2);
  31. close OUT;
  32.  
Attached Files
File Type: txt chr.txt (75 Bytes, 850 views)
File Type: txt exon.txt (76 Bytes, 739 views)
Feb 7 '14 #1
8 8944
RonB
589 Expert Mod 512MB
You're looping over the data too many times.

Load the first file (exon.txt) into a HoAoH (Hash of Array of Hashes) where the key is the "chr" and the hash ref would hold the rest of the data. Then loop over the chr.txt file line-by-line checking for the existence of the "chr" key.

The sample data you posted won't produce any matching results, but presumably your real data set will.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. use strict;
  4. use warnings;
  5.  
  6. my $file1 = "exon.txt";
  7. open my $exon_fh, '<', $file1 or die "couldn't open $file1 $!";
  8.  
  9. my %exon;
  10. while (my $line = <$exon_fh>) {
  11.     next if $line =~ /^\s*$/;
  12.     chomp $line;
  13.     my ($chr,$start,$end,$cat) = split(/\t/, $line);
  14.     push @{$exon{$chr}}, {
  15.         start => $start,
  16.         end   => $end,
  17.         cat   => $cat,
  18.     };
  19. }
  20. close $exon_fh;
  21.  
  22. my $file2 = "chr.txt";
  23. open my $chr_fh, '<', $file2 or die "couldn't open $file2 $!";
  24.  
  25. while (my $line = <$chr_fh>) {
  26.     next if $line =~ /^\s*$/;
  27.     chomp $line;
  28.     my ($chr,$start,$end) = split(/\t/, $line);
  29.     next unless exists $exon{$chr};
  30.  
  31.     foreach my $exon ( $exon{$chr} ) {
  32.         if ($start >= $exon{start} && $end <= $exon{end} ) {
  33.             print join("\t", $chr, $start, $exon{start}, $end <= $exon{end}) . "\n";
  34.         }
  35.     }
  36. }
  37. close $chr_fh;
  38.  
Feb 7 '14 #2
RonB
589 Expert Mod 512MB
I just noticed that I had an error in the print statement. It should be:
Expand|Select|Wrap|Line Numbers
  1. print join("\t", $chr, $start, $exon{cat}, $exon{start}, $exon{end}) . "\n";
Feb 7 '14 #3
raj14
4
Thanks for the help RonB. But when I run this script, it prompts error. Use of uninitialized Value.

Can you explain this part.

Expand|Select|Wrap|Line Numbers
  1. push @{$exon{$chr}}, ;{
  2.         start&nbsp;=> $start,
  3.         end&nbsp;  => $end,
  4.         cat&nbsp;  => $cat,
  5.     };
Feb 17 '14 #4
RonB
589 Expert Mod 512MB
Which part do you want explained? The syntax errors that you added to the code I gave you or what the code should do without your syntax errors?
Feb 17 '14 #5
raj14
4
@RonB

The errors is "Use of uninitialized value in numeric ge (>=)". So i guess the syntax has some problem. This part of your syntax has error. I attach it here.
Expand|Select|Wrap|Line Numbers
  1. next if $line =~ /^\s*$/;
  2.         chomp $line;
  3.         my ($chr,$start,$end,$cat) = split(/\t/, $line);
  4.         push @{$exon{$chr}}, ;{
  5.             start => $start,
  6.             end  => $end,
  7.             cat  => $cat,
  8.         };
Feb 18 '14 #6
RonB
589 Expert Mod 512MB
Remove the semi colon in this line:
Expand|Select|Wrap|Line Numbers
  1. push @{$exon{$chr}}, ;{
Feb 18 '14 #7
raj14
4
Its still produce the same error.
Feb 18 '14 #8
RonB
589 Expert Mod 512MB
You didn't say which line the warning message was referring to.

The only line in the code I gave that does that numerical test is this one (line 32):
Expand|Select|Wrap|Line Numbers
  1. if ($start >= $exon{start} && $end <= $exon{end} ) {
You need to dump those 4 vars (via the Data::Dumper module) to see which one is undefined.
Feb 18 '14 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Lad | last post by:
Hi, What is the best method for comparing two files by words? I was thinking about reading files by words and compare them but a word in one file can be linked with a new line character ( \n) and...
2
by: SP | last post by:
Hi All, I need to compare two files line by line and copy the differing lines to a new files. I.e. Compare file1 and file2 line by line. Copy only differing lines to file3. I tried a couple...
8
by: pjsimon | last post by:
I want to compare two files like MS Visual SourceSafe's Show Differences feature. Is there a way to access methods in VB.Net that will let me use existing MS code to show the differences between...
3
by: shona | last post by:
Hi, can any one told me how to compare files with same name but different extension.. for eg. if a.txt & a.doc then ans is same files.. Thanks
0
by: ds81 | last post by:
I am trying to read a large number of image (BMP, JPG) files, and need to know if any are identical. I have been trying to store the hashcodes of the files, so that they then can be compared later. ...
4
by: Clay Hobbs | last post by:
I am making a program that (with urllib) that downloads two jpeg files and, if they are different, displays the new one. I need to find a way to compare two files in Python. How is this done? ...
0
by: norseman | last post by:
Timothy Grant wrote: =================================== If you are on a Unix platform: man cmp man identify man display (ImageMagick) gimp If you use mc (MidnightCommander) the F3 key can...
0
by: zw | last post by:
Hi I have 2 log files, each with a timestamp on the first 2 fields. However, when I do a awk '/ / {print $1,$2}' logs/x.log on a log file, it is complicated by the fact that I also get other...
3
by: Susan StLouis | last post by:
I'm writing a program that can be used to compare files. The program features a select that contains a list of files. After selecting several of the files. a "Biggest" button can be pushed to find...
1
by: optimusprime | last post by:
Hi All, I have been surfing to get some idea on how to compare same files from two different paths. one path will have oldfiles directory and another path will have newfiles directory. Each...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.