473,320 Members | 1,946 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

reading and merging two files at the same time

hi

I have two files,
first file format: (It is 4882 raws)
userId country operatingSystem


Second file format is: (It is 400,000 raws)
userId time


I need to merge them as follows:
If userId in both files is the same then the second file shouldd be like this:
userId time country operating system

I made the program in matlab but it takes along time and I'd like to know how to do it in perl. is it possible to read from 2 files in the same time in perl?

Here is Matlab code:

Expand|Select|Wrap|Line Numbers
  1. for i=1:4882
  2.     for j=400,000
  3.         if (user(i,:) == Xtrain(j));
  4.            Xtrain(j,:) = user(i,:);
  5.         end
  6.     end 
  7. end
  8.  
Thanks for help
Mar 4 '08 #1
7 2562
eWish
971 Expert 512MB
I would suggest that you open and read the contents of file 2 and store the data in a hash having the userid as the key and time as the value. Then read the the other file line by line. Split the line into the 3 parts. While looping through the file compare the keys with the first element of the split. Then write it to your file in the format you desire.

--Kevin
Mar 4 '08 #2
eWish
971 Expert 512MB
If you get stuck or have questions on what I posted just let us know and we will try to help.

--Kevin
Mar 5 '08 #3
Expand|Select|Wrap|Line Numbers
  1. open (HANDLE2,"+<file2.txt");
  2. my %hash1=();
  3.  
  4. while( <HANDLE2>)
  5. {
  6.  my ($uid,$time)=split /\s/,$_;
  7.  
  8.  
  9.   $hash1{"$uid"}="$time";
  10.  
  11.    }
  12.  
  13.  open(HANDLE1,"file1.txt");
  14.  
  15.  while(<HANDLE1>)
  16.  {
  17.   my ($uid,$country,$osname)=split /\s/,$_ ;
  18.  
  19.    open HANDLE,">output.txt";
  20.  
  21.  
  22.     if(exists $hash1{$uid})
  23.     {
  24.             print HANDLE "$uid\t $hash1{$uid}\t$country\t$osname\n";
  25.  
  26.       }
  27.       }
  28.    close HANDLE2;
Mar 5 '08 #4
Expand|Select|Wrap|Line Numbers
  1. open (HANDLE2,"+<file2.txt");
  2. my %hash1=();
  3.  
  4. while( <HANDLE2>)
  5. {
  6.    my ($uid,$time)=split /\s/,$_;
  7.    $hash1{"$uid"}="$time";
  8.  
  9.  }
  10.  
  11.  open(HANDLE1,"file1.txt");
  12.  
  13.  while(<HANDLE1>)
  14.  {
  15.     my ($uid,$country,$osname)=split /\s/,$_ ;
  16.     open HANDLE,">output.txt";
  17.     if(exists $hash1{$uid}) {
  18.             print HANDLE2 "$uid\t $hash1{$uid}\t$country\t$osname\n";
  19.        #you can also open a new file and put ypor output in that file
  20.       }
  21.  }
  22.    close HANDLE2;
  23.    close HANDLE1;
  24.  
  25. _DATA_
  26. file1:
  27. 3444 india sunsolaris
  28. 3456 japan sun os
  29. 3452 eng windows
  30. 3224 germany ubuntu
  31. 1234  usa  linux
  32.  
  33. file2:-
  34.  
  35. 1234 12.3.45
  36. 3452 03.2.23

..i think this will work but i wrote it in a hurry so please check with some other combinations before implementing.
Mar 5 '08 #5
Given this data:

_DATA_
file1:
3444 india sunsolaris
3456 japan sun os
3452 eng windows
3224 germany ubuntu
1234 usa linux
This line is going to cause one minor problem:

my ($uid,$country,$osname)=split /\s/,$_ ;
In the case of a $osname that should be "sun os" you will get $osname of simple "sun" and the "os" will be discarded as a the 4th positional return from the split is not captured in your assignment.

There are probably a few ways to solve this problem. Off the top of my head, the one that comes to mind is something like:

Expand|Select|Wrap|Line Numbers
  1. my @temparray = split;
  2. my $uid = shift @temparray;
  3. my $country = shift @temparray;
  4. my $osname = join(' ',@temparray);
  5.  
Since you've chosen to split on '\s' instead if ' ' this may alter the $osname if you wish to preserve tabs or other forms of whitespace.

Another possibility (which I prefer) is to do something like:

Expand|Select|Wrap|Line Numbers
  1. my ($uid, $country, $osname) = /^(\d+)\s(\w+)\s(\.*)$/;
  2.  
There are problems with both aproaches, depending on how well formed your data is. For instance, I'm still assuming that all countries are expressed with a single word name, not multiple words as in "united states" (since I see you using "usa" already). I also assume that all instances of $uid are expressed as numbers and that all instances of $county are expressed as alphanumeric only. In each case the matches could be more generalized buy using \S in place of either \d or \w.

Evaluating the assignment inside a conditional of some sort would also allow you to check the format of the file to some degree as you go.

Expand|Select|Wrap|Line Numbers
  1. $linenum++;
  2. unless (my ($uid, $counry, $osname) = /^(\d+)\s(\w+)\s(\.*)$/ ) then {
  3.      print STDERR "Skipping line number $linenum as it does not seem to conform to 'uid country osname' format expectations.\n";
  4. }
  5.  
Mar 5 '08 #6
mmm in fact yes I'm stuck!

It's my first time to deal with hashes and I still can't get it!
Mar 5 '08 #7
eWish
971 Expert 512MB
Here is a simple script that should do what you are wanting. I have added comments to help explain what is happening.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -T
  2.  
  3. use strict;
  4. use warnings;
  5.  
  6. my $source_file1 = 'path/to/somefile1.txt';
  7. my $source_file2 = 'path/to/somefile2.txt';
  8. my $dest_file = 'path/to/somefile3.txt';
  9.  
  10. my %hash = ();
  11.  
  12. # Open the file that contains the userid and time data.  
  13. open (my $FILE1, '<', $source_file1) || die "Can't open $source_file1: $!\n";
  14.     while(my $line = <$FILE1>) {
  15.  
  16.         # Get rid of the newline characters.
  17.         # Split the line into userid and time. Splitting on the tab.  Change if not a tab delimited file.
  18.         chomp my ($userid, $time) = split(/\t/, $line);
  19.  
  20.         # Add the userid to the hash.  
  21.         # The key is the userid and then time will be the value.
  22.         $hash{$userid} = $time;
  23.     }
  24. close($FILE1);
  25.  
  26. # Open the file that contains the userid, country and operatingSystem data.  
  27. open (my $FILE2, '<', $source_file2) || die "Can't open $source_file2: $!\n";
  28.  
  29. # Open the new file where the data will be stored.    
  30. # This will overwrite the entire file if it exists.  
  31. # If it does not exist it will be created.
  32. open (my $FILE3, '>', $dest_file) || die "Can't open $dest_file: $!\n";
  33.     while(my $line = <$FILE2>) {
  34.  
  35.         # Get rid of the newline characters.
  36.         # Split the line into userid, country, operatingSystem.  
  37.         # Splitting on the tab.  Change if not a tab delimited file.
  38.         chomp my ($userid, $country, $operatingSystem ) = split(/\t/, $line);
  39.  
  40.             # Only grab the userid's that match the userid of the first file.
  41.             for ( grep /^$userid$/, keys %hash )  {
  42.  
  43.                 # Create the line of data formatted however it is desired.
  44.                 my $data = qq{$userid\t$hash{$userid}\t$country\t$operatingSystem\n};
  45.  
  46.                 # Print $data to the file.
  47.                 print $FILE3 $data;
  48.             }
  49.     }
  50.  
  51. # Close the two files.
  52. close($FILE3);
  53. close($FILE2);
--Kevin
Mar 5 '08 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Klatuu | last post by:
Whew, I've struggled my way through figuring out how to use XML to transport data..now I can imagine what having a baby is like :) But, I'm stuck now. I generate the XML (single table, no...
7
by: jamait | last post by:
Hi all, I m trying to read in a text file into a datatable... Not sure on how to split up the information though, regex or substrings...? sample: Col1 Col2 ...
2
by: Cy Huckaba | last post by:
I have an XML document that is linked to other document and I can't figure out what the best way to try and merge them before query qith an XpathNavigator. Simple example...a root xml document...
3
by: Mike | last post by:
Hi! I also asked this question in C# group with no results: I have 2 datasets loaded with data from two xml files having the same schema. The files contain data from yesterday and today. I'd...
2
by: Nikhil Prashar | last post by:
I'm trying to merge two XML files that have the same structure but not necessarily the same nodes in the same order. I've tried opening the files as datasets and using the DataSet.Merge() function,...
10
by: n o s p a m p l e a s e | last post by:
Is it possible to merge two DLL files into one? If so, how? Thanx/NSP
1
by: keveen | last post by:
Can someone tell me how I can import tables from another non-Joomla mysql file into Joomla? Basically it is just from one mySQL database into another. I use phpMyAdmin to import and export the entire...
0
by: veer | last post by:
Hello sir. I am making a program on merging in Visual Basic. The program is that I have a folder which is not on my hard drive contain 80 Mdb files and each Mdb file contains two tables. I want to...
0
by: Albert-jan Roskam | last post by:
Hi John, Thanks! Using a higher xlrd version did the trick! Regarding your other remarks: -yep, input files with multiple sheets don't work yet. I kinda repressed that ;-) Spss outputs only...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.