By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,277 Members | 1,548 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,277 IT Pros & Developers. It's quick & easy.

reading and merging two files at the same time

P: 6
hi

I have two files,
first file format: (It is 4882 raws)
userId country operatingSystem


Second file format is: (It is 400,000 raws)
userId time


I need to merge them as follows:
If userId in both files is the same then the second file shouldd be like this:
userId time country operating system

I made the program in matlab but it takes along time and I'd like to know how to do it in perl. is it possible to read from 2 files in the same time in perl?

Here is Matlab code:

Expand|Select|Wrap|Line Numbers
  1. for i=1:4882
  2.     for j=400,000
  3.         if (user(i,:) == Xtrain(j));
  4.            Xtrain(j,:) = user(i,:);
  5.         end
  6.     end 
  7. end
  8.  
Thanks for help
Mar 4 '08 #1
Share this Question
Share on Google+
7 Replies


eWish
Expert 100+
P: 971
I would suggest that you open and read the contents of file 2 and store the data in a hash having the userid as the key and time as the value. Then read the the other file line by line. Split the line into the 3 parts. While looping through the file compare the keys with the first element of the split. Then write it to your file in the format you desire.

--Kevin
Mar 4 '08 #2

eWish
Expert 100+
P: 971
If you get stuck or have questions on what I posted just let us know and we will try to help.

--Kevin
Mar 5 '08 #3

P: 3
Expand|Select|Wrap|Line Numbers
  1. open (HANDLE2,"+<file2.txt");
  2. my %hash1=();
  3.  
  4. while( <HANDLE2>)
  5. {
  6.  my ($uid,$time)=split /\s/,$_;
  7.  
  8.  
  9.   $hash1{"$uid"}="$time";
  10.  
  11.    }
  12.  
  13.  open(HANDLE1,"file1.txt");
  14.  
  15.  while(<HANDLE1>)
  16.  {
  17.   my ($uid,$country,$osname)=split /\s/,$_ ;
  18.  
  19.    open HANDLE,">output.txt";
  20.  
  21.  
  22.     if(exists $hash1{$uid})
  23.     {
  24.             print HANDLE "$uid\t $hash1{$uid}\t$country\t$osname\n";
  25.  
  26.       }
  27.       }
  28.    close HANDLE2;
Mar 5 '08 #4

P: 3
Expand|Select|Wrap|Line Numbers
  1. open (HANDLE2,"+<file2.txt");
  2. my %hash1=();
  3.  
  4. while( <HANDLE2>)
  5. {
  6.    my ($uid,$time)=split /\s/,$_;
  7.    $hash1{"$uid"}="$time";
  8.  
  9.  }
  10.  
  11.  open(HANDLE1,"file1.txt");
  12.  
  13.  while(<HANDLE1>)
  14.  {
  15.     my ($uid,$country,$osname)=split /\s/,$_ ;
  16.     open HANDLE,">output.txt";
  17.     if(exists $hash1{$uid}) {
  18.             print HANDLE2 "$uid\t $hash1{$uid}\t$country\t$osname\n";
  19.        #you can also open a new file and put ypor output in that file
  20.       }
  21.  }
  22.    close HANDLE2;
  23.    close HANDLE1;
  24.  
  25. _DATA_
  26. file1:
  27. 3444 india sunsolaris
  28. 3456 japan sun os
  29. 3452 eng windows
  30. 3224 germany ubuntu
  31. 1234  usa  linux
  32.  
  33. file2:-
  34.  
  35. 1234 12.3.45
  36. 3452 03.2.23

..i think this will work but i wrote it in a hurry so please check with some other combinations before implementing.
Mar 5 '08 #5

P: 12
Given this data:

_DATA_
file1:
3444 india sunsolaris
3456 japan sun os
3452 eng windows
3224 germany ubuntu
1234 usa linux
This line is going to cause one minor problem:

my ($uid,$country,$osname)=split /\s/,$_ ;
In the case of a $osname that should be "sun os" you will get $osname of simple "sun" and the "os" will be discarded as a the 4th positional return from the split is not captured in your assignment.

There are probably a few ways to solve this problem. Off the top of my head, the one that comes to mind is something like:

Expand|Select|Wrap|Line Numbers
  1. my @temparray = split;
  2. my $uid = shift @temparray;
  3. my $country = shift @temparray;
  4. my $osname = join(' ',@temparray);
  5.  
Since you've chosen to split on '\s' instead if ' ' this may alter the $osname if you wish to preserve tabs or other forms of whitespace.

Another possibility (which I prefer) is to do something like:

Expand|Select|Wrap|Line Numbers
  1. my ($uid, $country, $osname) = /^(\d+)\s(\w+)\s(\.*)$/;
  2.  
There are problems with both aproaches, depending on how well formed your data is. For instance, I'm still assuming that all countries are expressed with a single word name, not multiple words as in "united states" (since I see you using "usa" already). I also assume that all instances of $uid are expressed as numbers and that all instances of $county are expressed as alphanumeric only. In each case the matches could be more generalized buy using \S in place of either \d or \w.

Evaluating the assignment inside a conditional of some sort would also allow you to check the format of the file to some degree as you go.

Expand|Select|Wrap|Line Numbers
  1. $linenum++;
  2. unless (my ($uid, $counry, $osname) = /^(\d+)\s(\w+)\s(\.*)$/ ) then {
  3.      print STDERR "Skipping line number $linenum as it does not seem to conform to 'uid country osname' format expectations.\n";
  4. }
  5.  
Mar 5 '08 #6

P: 6
mmm in fact yes I'm stuck!

It's my first time to deal with hashes and I still can't get it!
Mar 5 '08 #7

eWish
Expert 100+
P: 971
Here is a simple script that should do what you are wanting. I have added comments to help explain what is happening.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -T
  2.  
  3. use strict;
  4. use warnings;
  5.  
  6. my $source_file1 = 'path/to/somefile1.txt';
  7. my $source_file2 = 'path/to/somefile2.txt';
  8. my $dest_file = 'path/to/somefile3.txt';
  9.  
  10. my %hash = ();
  11.  
  12. # Open the file that contains the userid and time data.  
  13. open (my $FILE1, '<', $source_file1) || die "Can't open $source_file1: $!\n";
  14.     while(my $line = <$FILE1>) {
  15.  
  16.         # Get rid of the newline characters.
  17.         # Split the line into userid and time. Splitting on the tab.  Change if not a tab delimited file.
  18.         chomp my ($userid, $time) = split(/\t/, $line);
  19.  
  20.         # Add the userid to the hash.  
  21.         # The key is the userid and then time will be the value.
  22.         $hash{$userid} = $time;
  23.     }
  24. close($FILE1);
  25.  
  26. # Open the file that contains the userid, country and operatingSystem data.  
  27. open (my $FILE2, '<', $source_file2) || die "Can't open $source_file2: $!\n";
  28.  
  29. # Open the new file where the data will be stored.    
  30. # This will overwrite the entire file if it exists.  
  31. # If it does not exist it will be created.
  32. open (my $FILE3, '>', $dest_file) || die "Can't open $dest_file: $!\n";
  33.     while(my $line = <$FILE2>) {
  34.  
  35.         # Get rid of the newline characters.
  36.         # Split the line into userid, country, operatingSystem.  
  37.         # Splitting on the tab.  Change if not a tab delimited file.
  38.         chomp my ($userid, $country, $operatingSystem ) = split(/\t/, $line);
  39.  
  40.             # Only grab the userid's that match the userid of the first file.
  41.             for ( grep /^$userid$/, keys %hash )  {
  42.  
  43.                 # Create the line of data formatted however it is desired.
  44.                 my $data = qq{$userid\t$hash{$userid}\t$country\t$operatingSystem\n};
  45.  
  46.                 # Print $data to the file.
  47.                 print $FILE3 $data;
  48.             }
  49.     }
  50.  
  51. # Close the two files.
  52. close($FILE3);
  53. close($FILE2);
--Kevin
Mar 5 '08 #8

Post your reply

Sign in to post your reply or Sign up for a free account.