473,320 Members | 1,856 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Large Matrix into a hash of a hash

Dear all,

I am new to Perl.

I have a large matrix. The easiest way to describe is it is like a calender where there are days along the top, numbers of weeks down the side, and the numbers of dates in the rest of the table.

Imagine this calendar to be populated with different data but with the same concept, but more like 10,000s across and 10,000s down. How would I parse this huge dataset into a hash. I know how to do the smaller example, as you can define the associative parts in the perl code. My file is too large for that. How do you define the top line and the side column into the hash as the important parts?

Kevin
Jun 23 '08 #1
8 1786
A bit more informations:-

I was reading the guidelines, and want to make it clear that I am not asking for code to be written for me. I am just confused as I am new and all the hash examples define the data within the perl code. I am just stating that my file is too large for that, so how do you define the keys and values, using the first row line and the first column. I am just confused by perl toy examples, as they do not show this step with reading a file in.

Your help will be very appreciated.

Kevin
Jun 23 '08 #2
KevinADC
4,059 Expert 2GB
use the open() function to open a file and start piping its contents into your perl script. http://perldoc.perl.org/perlopentut.html
Jun 23 '08 #3
I have this which is for a 2 column file that has been fed into a hash:- as I was looking at examples of hashes.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -w
  2.  
  3. use strict;
  4. use warnings;
  5.  
  6. my $index1;
  7. my $index2;
  8. my %hashname;
  9. my $hashname;
  10.  
  11. open (LIST1, "test.txt") || die "File not found\n";
  12.      while (<LIST1>) {
  13.           ($index1, $index2) = split(/\t/, $_);
  14.           $hashname{$index1} = $index2;
  15.  
  16.           print $index1;
  17.      }
  18. close(LIST1);
But I want this type of script for a huge dataset with 10,000s of columns each way. So my index1 is my first column, and index2 is my row at the top, rather than the second column in this case.

Kevin
Jun 23 '08 #4
KevinADC
4,059 Expert 2GB
Show a sample of the data, with a few rows and columns and explain what you want to do. I don't understand what you said above: "So my index1 is my first column, and index2 is my row at the top, rather than the second column in this case".
Jun 23 '08 #5
0 A B C D E F G --> thousands more
A 1 0 5 1 2 3 1
B 6 0 2 1 1 1 1
C 1 0 2 1 2 1 1
D 2 0 1 2 1 1 1
E 1 1 0 1 1 1 1
F 0 1 0 1 2 1 9
G 0 1 3 2 1 1 1

thousands more

I have to go through the hash of a hash, so the top line is one hash ref and the column down the side is another. I have to pick each number is over 2, and the letter ref of the row and column are stated rather than the number itself.

Example:
A D is 2 so more than 2 (A on top row, D in column)
C A is 5 so more than 2

need to print out A\tD
C\tA
and so on....

I am getting confused how to read it in properly. All the examples of HOH's are small toy examples (Barney, Fred etc....) that are loaded in the code rather than large datasets in a file.

Thanks,

Kevin
Jun 23 '08 #6
KevinADC
4,059 Expert 2GB
0 A B C D E F G --> thousands more
A 1 0 5 1 2 3 1
B 6 0 2 1 1 1 1
C 1 0 2 1 2 1 1
D 2 0 1 2 1 1 1
E 1 1 0 1 1 1 1
F 0 1 0 1 2 1 9
G 0 1 3 2 1 1 1

thousands more

I have to go through the hash of a hash, so the top line is one hash ref and the column down the side is another. I have to pick each number is over 2, and the letter ref of the row and column are stated rather than the number itself.

Example:
A D is 2 so more than 2 (A on top row, D in column)
C A is 5 so more than 2

need to print out A\tD
C\tA
and so on....

I am getting confused how to read it in properly. All the examples of HOH's are small toy examples (Barney, Fred etc....) that are loaded in the code rather than large datasets in a file.

Thanks,

Kevin

Unless there is some requirement that I am missing, I see no need for any complex data set to do this task. Process the input file line by line:

Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. open (LIST1, "test.txt") || die "File not found\n";
  4. my $header = <LIST>; # read first line of file
  5. chomp $header;
  6. my @colms = split(/\s/,$header);
  7. shift @colms; # remove the 0 in the first position so array starts with "A". 
  8. # print "@colms\n"; 
  9. while (<LIST>) {
  10.    chomp;
  11.    my @row = split(/\s/,$_);
  12.    my $this_row = shift @row;
  13.    for my $i (0..$#row) {
  14.       next if $row[$i] < 2;
  15.       print "$colms[$i]\t$this_row \n";
  16.    }
  17. }
  18. close LIST;
Why do you want to use something like a hash of hashes?
Jun 24 '08 #7
Hi Kevin,

Thank you for the code, it worked with my file with the correct output. Well done.

I just wanted to learn how to use a hash of a hash, I am learning perl and trying to practise different ways of solving a problem. To be honest with you, I have always used arrays in perl, and never a hash of hash before, so wanted to try it but for this example dataset I became confused.

I could understand the examples on the internet but they were always loading data within the script, I understand using easy toy examples, but it didn't help when it came to a huge file.

From looking at perl examples on the internet I could see how to split a row into, for example, 5 fields with this particular line:-

while( <FILE> ) {
( $firstname, $surname, $date, $age, $number ) = split( ':' );

I was confused how to do it for a row of 1000s, but I guess just loading it into an array was the best way.

Thank you for you help, it is very appreciated.

Kevin
Jun 24 '08 #8
KevinADC
4,059 Expert 2GB
OK, I see, you were more interested in the mechanics of building the data set read in from a file more than anything. If you look at thi spage:

http://perldoc.perl.org/perldsc.html

You will see headings that start with "Generation of....", they show how to build records from files. They don't show the open(FH, 'file.txt') part but just use while (<>) to show that some intput is coming in from a file. Of course they are rather simple examples but hopefully they help.
Jun 25 '08 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

19
by: Brad Tilley | last post by:
I have some large files (between 2 & 4 GB) that I want to do a few things with. Here's how I've been using the md5 module in Python: original = file(path + f, 'rb') data = original.read(4096)...
1
by: DJTB | last post by:
zodb-dev@zope.org] Hi, I'm having problems storing large amounts of objects in a ZODB. After committing changes to the database, elements are not cleared from memory. Since the number of...
3
by: Gigi.com | last post by:
Hi All. I need some help trying to pull prices from a price matrix. Here's an example: >>>> 1000 1500 2000 2500 ----------------------------------------- 1000 ¦ 10.20 ...
16
by: raj | last post by:
Hi, I saw it mentioned that "int" is the fastest data-type for use in C ,that is data storage/retrieval would be the fastest if I use int among the following 4 situations in a 32 bit machine with...
8
by: nestini | last post by:
Hello everybody, I am student who just begin learning about programming I want to know what is Sprase Matrix. And would you please show me how to add the 2 Sprase Matrix in C soursecode.
5
by: Michael H | last post by:
Hi all, I guess I don't fully understand how a SHA1 hash value is calculated in C# / .NET for a large file... I'm trying to calculate SHA1 values for large files that are much larger than my...
3
by: mediratta | last post by:
Hi, I want to allocate memory for a large matrix, whose size will be around 2.5 million x 17000. Three fourth of its rows will have all zeroes, but it is not known which will be those rows. If I...
14
by: James Stroud | last post by:
Hello All, I'm using numpy to calculate determinants of matrices that look like this (13x13):
8
by: theCancerus | last post by:
Hi All, I am not sure if this is the right place to ask this question but i am very sure you may have faced this problem, i have already found some post related to this but not the answer i am...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.