Dear all,
I am new to Perl.
I have a large matrix. The easiest way to describe is it is like a calender where there are days along the top, numbers of weeks down the side, and the numbers of dates in the rest of the table.
Imagine this calendar to be populated with different data but with the same concept, but more like 10,000s across and 10,000s down. How would I parse this huge dataset into a hash. I know how to do the smaller example, as you can define the associative parts in the perl code. My file is too large for that. How do you define the top line and the side column into the hash as the important parts?
Kevin
8 1801
A bit more informations:-
I was reading the guidelines, and want to make it clear that I am not asking for code to be written for me. I am just confused as I am new and all the hash examples define the data within the perl code. I am just stating that my file is too large for that, so how do you define the keys and values, using the first row line and the first column. I am just confused by perl toy examples, as they do not show this step with reading a file in.
Your help will be very appreciated.
Kevin
I have this which is for a 2 column file that has been fed into a hash:- as I was looking at examples of hashes. - #!/usr/bin/perl -w
-
-
use strict;
-
use warnings;
-
-
my $index1;
-
my $index2;
-
my %hashname;
-
my $hashname;
-
-
open (LIST1, "test.txt") || die "File not found\n";
-
while (<LIST1>) {
-
($index1, $index2) = split(/\t/, $_);
-
$hashname{$index1} = $index2;
-
-
print $index1;
-
}
-
close(LIST1);
But I want this type of script for a huge dataset with 10,000s of columns each way. So my index1 is my first column, and index2 is my row at the top, rather than the second column in this case.
Kevin
KevinADC 4,059
Recognized Expert Specialist
Show a sample of the data, with a few rows and columns and explain what you want to do. I don't understand what you said above: "So my index1 is my first column, and index2 is my row at the top, rather than the second column in this case".
0 A B C D E F G --> thousands more
A 1 0 5 1 2 3 1
B 6 0 2 1 1 1 1
C 1 0 2 1 2 1 1
D 2 0 1 2 1 1 1
E 1 1 0 1 1 1 1
F 0 1 0 1 2 1 9
G 0 1 3 2 1 1 1
thousands more
I have to go through the hash of a hash, so the top line is one hash ref and the column down the side is another. I have to pick each number is over 2, and the letter ref of the row and column are stated rather than the number itself.
Example:
A D is 2 so more than 2 (A on top row, D in column)
C A is 5 so more than 2
need to print out A\tD
C\tA
and so on....
I am getting confused how to read it in properly. All the examples of HOH's are small toy examples (Barney, Fred etc....) that are loaded in the code rather than large datasets in a file.
Thanks,
Kevin
KevinADC 4,059
Recognized Expert Specialist
0 A B C D E F G --> thousands more
A 1 0 5 1 2 3 1
B 6 0 2 1 1 1 1
C 1 0 2 1 2 1 1
D 2 0 1 2 1 1 1
E 1 1 0 1 1 1 1
F 0 1 0 1 2 1 9
G 0 1 3 2 1 1 1
thousands more
I have to go through the hash of a hash, so the top line is one hash ref and the column down the side is another. I have to pick each number is over 2, and the letter ref of the row and column are stated rather than the number itself.
Example:
A D is 2 so more than 2 (A on top row, D in column)
C A is 5 so more than 2
need to print out A\tD
C\tA
and so on....
I am getting confused how to read it in properly. All the examples of HOH's are small toy examples (Barney, Fred etc....) that are loaded in the code rather than large datasets in a file.
Thanks,
Kevin
Unless there is some requirement that I am missing, I see no need for any complex data set to do this task. Process the input file line by line: - use strict;
-
use warnings;
-
open (LIST1, "test.txt") || die "File not found\n";
-
my $header = <LIST>; # read first line of file
-
chomp $header;
-
my @colms = split(/\s/,$header);
-
shift @colms; # remove the 0 in the first position so array starts with "A".
-
# print "@colms\n";
-
while (<LIST>) {
-
chomp;
-
my @row = split(/\s/,$_);
-
my $this_row = shift @row;
-
for my $i (0..$#row) {
-
next if $row[$i] < 2;
-
print "$colms[$i]\t$this_row \n";
-
}
-
}
-
close LIST;
Why do you want to use something like a hash of hashes?
Hi Kevin,
Thank you for the code, it worked with my file with the correct output. Well done.
I just wanted to learn how to use a hash of a hash, I am learning perl and trying to practise different ways of solving a problem. To be honest with you, I have always used arrays in perl, and never a hash of hash before, so wanted to try it but for this example dataset I became confused.
I could understand the examples on the internet but they were always loading data within the script, I understand using easy toy examples, but it didn't help when it came to a huge file.
From looking at perl examples on the internet I could see how to split a row into, for example, 5 fields with this particular line:-
while( <FILE> ) {
( $firstname, $surname, $date, $age, $number ) = split( ':' );
I was confused how to do it for a row of 1000s, but I guess just loading it into an array was the best way.
Thank you for you help, it is very appreciated.
Kevin
KevinADC 4,059
Recognized Expert Specialist
OK, I see, you were more interested in the mechanics of building the data set read in from a file more than anything. If you look at thi spage: http://perldoc.perl.org/perldsc.html
You will see headings that start with "Generation of....", they show how to build records from files. They don't show the open(FH, 'file.txt') part but just use while (<>) to show that some intput is coming in from a file. Of course they are rather simple examples but hopefully they help.
Sign in to post your reply or Sign up for a free account.
Similar topics |
by: Brad Tilley |
last post by:
I have some large files (between 2 & 4 GB) that I want to do a few
things with. Here's how I've been using the md5 module in Python:
original = file(path + f, 'rb')
data = original.read(4096)
original.close()
verify = md5.new(data)
print verify.hexdigest(), f
Is reading the first 4096 bytes of the files and calculating the md5 sum
|
by: DJTB |
last post by:
zodb-dev@zope.org]
Hi,
I'm having problems storing large amounts of objects in a ZODB.
After committing changes to the database, elements are not cleared from
memory. Since the number of objects I'd like to store in the ZODB is too
large to fit in RAM, my program gets killed with signal 11 or signal 9...
Below a minimal working (or actually: it doesn't work because of memory
|
by: Gigi.com |
last post by:
Hi All.
I need some help trying to pull prices from a price matrix. Here's an
example:
>>>> 1000 1500 2000 2500
-----------------------------------------
1000 ¦ 10.20 11.95 12.55 13.76
1500 ¦ 11.23 12.23 13.45 14.45
|
by: raj |
last post by:
Hi,
I saw it mentioned that "int" is the fastest data-type for use in C
,that is data storage/retrieval would be the fastest if I use int among
the following 4 situations in a 32 bit machine with 4-byte ints:
int m;
bool m; // assuming I use C++
char m;
unsigned char m;
|
by: nestini |
last post by:
Hello everybody,
I am student who just begin learning about programming
I want to know what is Sprase Matrix.
And would you please show me how to add the 2 Sprase Matrix in C
soursecode.
| |
by: Michael H |
last post by:
Hi all,
I guess I don't fully understand how a SHA1 hash value is calculated
in C# / .NET for a large file... I'm trying to calculate SHA1 values
for large files that are much larger than my physical main memory. It
seems the way to derive a SHA1 value involves opening a file stream to
the large file, passing it to a byte array, and passing the byte array
to the .NET hash method.
Does this load the entire file into main memory (within...
|
by: mediratta |
last post by:
Hi,
I want to allocate memory for a large matrix, whose size will be
around 2.5 million x 17000. Three fourth of its rows will have all
zeroes, but it is not known which will be those rows. If I try to
allocate memory for this huge array, then I get a segmentation fault
saying:
Program received signal SIGSEGV, Segmentation fault.
0xb7dd5226 in mallopt () from /lib/tls/i686/cmov/libc.so.6
|
by: James Stroud |
last post by:
Hello All,
I'm using numpy to calculate determinants of matrices that look like
this (13x13):
|
by: theCancerus |
last post by:
Hi All,
I am not sure if this is the right place to ask this question but i am
very sure you may have faced this problem, i have already found some
post related to this but not the answer i am looking for.
My problem is that i have to upload images and store them. I am using
filesystem for that.
setup is something like this, their will be items/groups/user each can
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |