hi
I have two files,
first file format: (It is 4882 raws)
userId country operatingSystem
Second file format is: (It is 400,000 raws)
userId time
I need to merge them as follows:
If userId in both files is the same then the second file shouldd be like this:
userId time country operating system
I made the program in matlab but it takes along time and I'd like to know how to do it in perl. is it possible to read from 2 files in the same time in perl?
Here is Matlab code: - for i=1:4882
-
for j=400,000
-
if (user(i,:) == Xtrain(j));
-
Xtrain(j,:) = user(i,:);
-
end
-
end
-
end
-
Thanks for help
7 2562
I would suggest that you open and read the contents of file 2 and store the data in a hash having the userid as the key and time as the value. Then read the the other file line by line. Split the line into the 3 parts. While looping through the file compare the keys with the first element of the split. Then write it to your file in the format you desire.
--Kevin
If you get stuck or have questions on what I posted just let us know and we will try to help.
--Kevin
- open (HANDLE2,"+<file2.txt");
-
my %hash1=();
-
-
while( <HANDLE2>)
-
{
-
my ($uid,$time)=split /\s/,$_;
-
-
-
$hash1{"$uid"}="$time";
-
-
}
-
-
open(HANDLE1,"file1.txt");
-
-
while(<HANDLE1>)
-
{
-
my ($uid,$country,$osname)=split /\s/,$_ ;
-
-
open HANDLE,">output.txt";
-
-
-
if(exists $hash1{$uid})
-
{
-
print HANDLE "$uid\t $hash1{$uid}\t$country\t$osname\n";
-
-
}
-
}
-
close HANDLE2;
- open (HANDLE2,"+<file2.txt");
-
my %hash1=();
-
-
while( <HANDLE2>)
-
{
-
my ($uid,$time)=split /\s/,$_;
-
$hash1{"$uid"}="$time";
-
-
}
-
-
open(HANDLE1,"file1.txt");
-
-
while(<HANDLE1>)
-
{
-
my ($uid,$country,$osname)=split /\s/,$_ ;
-
open HANDLE,">output.txt";
-
if(exists $hash1{$uid}) {
-
print HANDLE2 "$uid\t $hash1{$uid}\t$country\t$osname\n";
-
#you can also open a new file and put ypor output in that file
-
}
-
}
-
close HANDLE2;
-
close HANDLE1;
-
-
_DATA_
-
file1:
-
3444 india sunsolaris
-
3456 japan sun os
-
3452 eng windows
-
3224 germany ubuntu
-
1234 usa linux
-
-
file2:-
-
-
1234 12.3.45
-
3452 03.2.23
..i think this will work but i wrote it in a hurry so please check with some other combinations before implementing.
Given this data:
_DATA_
file1:
3444 india sunsolaris
3456 japan sun os
3452 eng windows
3224 germany ubuntu
1234 usa linux
This line is going to cause one minor problem:
my ($uid,$country,$osname)=split /\s/,$_ ;
In the case of a $osname that should be "sun os" you will get $osname of simple "sun" and the "os" will be discarded as a the 4th positional return from the split is not captured in your assignment.
There are probably a few ways to solve this problem. Off the top of my head, the one that comes to mind is something like: -
my @temparray = split;
-
my $uid = shift @temparray;
-
my $country = shift @temparray;
-
my $osname = join(' ',@temparray);
-
Since you've chosen to split on '\s' instead if ' ' this may alter the $osname if you wish to preserve tabs or other forms of whitespace.
Another possibility (which I prefer) is to do something like: -
my ($uid, $country, $osname) = /^(\d+)\s(\w+)\s(\.*)$/;
-
There are problems with both aproaches, depending on how well formed your data is. For instance, I'm still assuming that all countries are expressed with a single word name, not multiple words as in "united states" (since I see you using "usa" already). I also assume that all instances of $uid are expressed as numbers and that all instances of $county are expressed as alphanumeric only. In each case the matches could be more generalized buy using \S in place of either \d or \w.
Evaluating the assignment inside a conditional of some sort would also allow you to check the format of the file to some degree as you go. -
$linenum++;
-
unless (my ($uid, $counry, $osname) = /^(\d+)\s(\w+)\s(\.*)$/ ) then {
-
print STDERR "Skipping line number $linenum as it does not seem to conform to 'uid country osname' format expectations.\n";
-
}
-
mmm in fact yes I'm stuck!
It's my first time to deal with hashes and I still can't get it!
Here is a simple script that should do what you are wanting. I have added comments to help explain what is happening. - #!/usr/bin/perl -T
-
-
use strict;
-
use warnings;
-
-
my $source_file1 = 'path/to/somefile1.txt';
-
my $source_file2 = 'path/to/somefile2.txt';
-
my $dest_file = 'path/to/somefile3.txt';
-
-
my %hash = ();
-
-
# Open the file that contains the userid and time data.
-
open (my $FILE1, '<', $source_file1) || die "Can't open $source_file1: $!\n";
-
while(my $line = <$FILE1>) {
-
-
# Get rid of the newline characters.
-
# Split the line into userid and time. Splitting on the tab. Change if not a tab delimited file.
-
chomp my ($userid, $time) = split(/\t/, $line);
-
-
# Add the userid to the hash.
-
# The key is the userid and then time will be the value.
-
$hash{$userid} = $time;
-
}
-
close($FILE1);
-
-
# Open the file that contains the userid, country and operatingSystem data.
-
open (my $FILE2, '<', $source_file2) || die "Can't open $source_file2: $!\n";
-
-
# Open the new file where the data will be stored.
-
# This will overwrite the entire file if it exists.
-
# If it does not exist it will be created.
-
open (my $FILE3, '>', $dest_file) || die "Can't open $dest_file: $!\n";
-
while(my $line = <$FILE2>) {
-
-
# Get rid of the newline characters.
-
# Split the line into userid, country, operatingSystem.
-
# Splitting on the tab. Change if not a tab delimited file.
-
chomp my ($userid, $country, $operatingSystem ) = split(/\t/, $line);
-
-
# Only grab the userid's that match the userid of the first file.
-
for ( grep /^$userid$/, keys %hash ) {
-
-
# Create the line of data formatted however it is desired.
-
my $data = qq{$userid\t$hash{$userid}\t$country\t$operatingSystem\n};
-
-
# Print $data to the file.
-
print $FILE3 $data;
-
}
-
}
-
-
# Close the two files.
-
close($FILE3);
-
close($FILE2);
--Kevin
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Klatuu |
last post by:
Whew, I've struggled my way through figuring out how to use XML to transport
data..now I can imagine what having a baby is like :)
But, I'm stuck now. I generate the XML (single table, no...
|
by: jamait |
last post by:
Hi all,
I m trying to read in a text file into a datatable...
Not sure on how to split up the information though, regex or substrings...?
sample:
Col1 Col2 ...
|
by: Cy Huckaba |
last post by:
I have an XML document that is linked to other document and I can't figure out what the best way to try and merge them before query qith an XpathNavigator.
Simple example...a root xml document...
|
by: Mike |
last post by:
Hi!
I also asked this question in C# group with no results:
I have 2 datasets loaded with data from two xml files having the same
schema. The files contain data from yesterday and today. I'd...
|
by: Nikhil Prashar |
last post by:
I'm trying to merge two XML files that have the same structure but not
necessarily the same nodes in the same order. I've tried opening the files as
datasets and using the DataSet.Merge() function,...
|
by: n o s p a m p l e a s e |
last post by:
Is it possible to merge two DLL files into one? If so, how?
Thanx/NSP
|
by: keveen |
last post by:
Can someone tell me how I can import tables from another non-Joomla mysql file into Joomla? Basically it is just from one mySQL database into another. I use phpMyAdmin to import and export the entire...
|
by: veer |
last post by:
Hello sir.
I am making a program on merging in Visual Basic. The program is that I have a folder which is not on my hard drive contain 80 Mdb files and each Mdb file contains two tables.
I want to...
|
by: Albert-jan Roskam |
last post by:
Hi John,
Thanks! Using a higher xlrd version did the trick! Regarding your other remarks:
-yep, input files with multiple sheets don't work yet. I kinda repressed that ;-) Spss outputs only...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
| |