By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,848 Members | 1,380 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,848 IT Pros & Developers. It's quick & easy.

Searching in Hashes

P: 98
There is a hash in my program.
The problem is I need to search in the hash, but there are some keys that are identical, but I need to get the associated value of all these keys.

when I do something like

$value = $hash{ "theKey" };

obviously I'm only getting the value associated to the first key called "theKey". How can I get the others? Some sort of loop?

thanks
Feb 25 '08 #1
Share this Question
Share on Google+
22 Replies


KevinADC
Expert 2.5K+
P: 4,059
Hashes can not have identical keys. Each key has to be unique. Are you using a hash of hashes or array of hashes?
Feb 25 '08 #2

P: 98
Hashes can not have identical keys. Each key has to be unique. Are you using a hash of hashes or array of hashes?
Let me give you some mre info; it's an exercise I'm making.

It's like a webshop which has items, a product code, a price and a quantity.

I read a file, which holds a 'database' of items for sale and it's structured like this:
for every item for sale there is a line in the textfile:

value1|value2|value3|value4

value1 is unique for every item. Call it a productcode or id number.
anyway due to search querys I will need to be able to get a product code from a title(which is provided by the query).

so I need a hash which has the names of the objects as keys, and the associated values will be the product codes.

now let's say there is a book and a dvd for sale with the exact same name I need to be able to get both product codes out of my hash.

That's the whole problem.

I have doublechecked my hash, and it has two identical keys after reading the file and creating the hash... I printed them all with a foreach loop.

p.s it's just a simple hash, no complicated things, I'm new^^
Feb 25 '08 #3

KevinADC
Expert 2.5K+
P: 4,059
The same hash can not have two identical keys. Try it and see:

Expand|Select|Wrap|Line Numbers
  1. %hash = (
  2.    foo => 1,
  3.    foo => 2,
  4.    foo => 3
  5. );
  6.  
  7. foreach my $key (keys %hash) {
  8.    print "$key = $hash{$key}\n";
  9. }
  10.  
I don't know where you are making an error in thinking that your hash does. Post some code so I can take a look at what you are doing.
Feb 26 '08 #4

P: 98
Expand|Select|Wrap|Line Numbers
  1. sub loadInventory{
  2.      open (INVENTORY, $inventoryPath);
  3.      $line = <INVENTORY>;
  4.      while ($line){
  5.         @fields = split(/\|/ , $line);
  6.         ...
  7.         %inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);
  8.     ...
  9.         $line = <INVENTORY>;
  10.      }
  11. }
  12.  
the file to be read has got for example these two lines:

DVD-432|Harry Potter and the Prisoner of Azkaban|25.00|5
BOOK-432|Harry Potter and the Prisoner of Azkaban|15.00|7

because of the split: the values are:
$fields[0] | $fields[1] | $fields[2] | $fields[3]

so, because of this line:
%inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);

i guess this hash will get two pairs, both with the value "Harry Potter and the Prisoner of Azkaban", but they won't have thesame product code...or am I mistaken?

Thanks for the help btw
Feb 26 '08 #5

P: 58
How about storing and printing the data like the following?

Expand|Select|Wrap|Line Numbers
  1. use warnings;
  2. use strict;
  3. use Cwd;
  4.  
  5. &init;
  6.  
  7. sub init
  8. {
  9.    my %data = ();
  10.    my $directory = &getcwd;
  11.  
  12.    open ( IN, "$directory/Input.txt" );
  13.       my $line = <IN>;
  14.  
  15.       while ( $line = <IN> )
  16.       {
  17.          chomp ( $line );
  18.  
  19.          my ( $type_id, $name, $price, $quantity ) = split ( /\s*\|\s*/, $line );
  20.          my ( $type, $id ) = split ( /-/, $type_id );
  21.  
  22.          $data{$id}{$type}{NAME}     = $name;
  23.          $data{$id}{$type}{PRICE}    = $price;
  24.          $data{$id}{$type}{QUANTITY} = $quantity;
  25.       }
  26.    close ( IN );
  27.  
  28.    foreach my $i ( sort keys %data )
  29.    {
  30.       foreach my $t ( sort keys %{$data{$i}} )
  31.       {
  32.          # if ( $t =~ /DVD/i )
  33.          # if ( $t =~ /book/i )
  34.  
  35.          if ( $data{$i}{$t}{NAME} =~ /Harry Potter/i )
  36.          {
  37.             print "$i -> $t -> $data{$i}{$t}{NAME} -> $data{$i}{$t}{PRICE} -> $data{$i}{$t}{QUANTITY}\n";
  38.          }
  39.       }
  40.    }
  41. }
Feb 26 '08 #6

KevinADC
Expert 2.5K+
P: 4,059
Expand|Select|Wrap|Line Numbers
  1. guess this hash will get two pairs, both with the value "Harry Potter and the Prisoner of Azkaban", but they won't have thesame product code...or am I mistaken?
"Harry Potter and the Prisoner of Azkaban" is a value, not a key. The key is the product ID. Your code could be written much better.
Feb 26 '08 #7

P: 98
@WinblowsME: Thanks, but I'm affraid that is not allowed due to other restrictions..

@KevinADC: I'm aware my code isn't good ^^ I'm only into perl a couple days now,

anyway...If Harry Potter and the Prisoner of Azkaban is a value, howcome when I do:

$productCode = $inventoryCodes{ Harry Potter and the Prisoner of Azkaban }

the $productCode variable is correct? but ofcourse only the DVD-code, and not both (DVD and BOOK)...I'm confused now ^^
Feb 26 '08 #8

KevinADC
Expert 2.5K+
P: 4,059
@KevinADC: I'm aware my code isn't good ^^ I'm only into perl a couple days now,

anyway...If Harry Potter and the Prisoner of Azkaban is a value, howcome when I do:

$productCode = $inventoryCodes{ Harry Potter and the Prisoner of Azkaban }

the $productCode variable is correct? but ofcourse only the DVD-code, and not both (DVD and BOOK)...I'm confused now ^^

My mistake, I misread the code:

%inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);

you reversed fields [1] and [0] making field [1] the key and field [0] the value. The fact remains, hashes can not have two identical keys, there is no reason to continue discussing that part of the question. When you build up your hash from the file the last value that was read in from the file that is associated with the key will be stored in the hash. If you need a hash key to have multiple values you have to use more complex data structures, like a hash of arrays or whatever is appropriate.
Feb 26 '08 #9

P: 98
I believe you about the unique key part.

Ii'm trying to figure out how I'm gonna bypass that problem. probably by working with arrays, which is easy, instead of hashes, but that will be slow with big files..

Do you know any way of creating a hash system in which I can store the names: like "Harry Potter..." and the productcode of the object. With the knowledge that there will be items with identical names but different productcodes AND that I need to be able to find the productcodes of all items mathing a specified name? (because I need to implement a "search" option based on product names)
Feb 26 '08 #10

KevinADC
Expert 2.5K+
P: 4,059
I believe you about the unique key part.

Ii'm trying to figure out how I'm gonna bypass that problem. probably by working with arrays, which is easy, instead of hashes, but that will be slow with big files..

Do you know any way of creating a hash system in which I can store the names: like "Harry Potter..." and the productcode of the object. With the knowledge that there will be items with identical names but different productcodes AND that I need to be able to find the productcodes of all items mathing a specified name? (because I need to implement a "search" option based on product names)
Don't be offended, but judging by the code you posted you need to master some basics first. The way you are reading the data in from the file is a bit convoluted and slow, and the way you are building the hash is not correct.

Generic example:

Expand|Select|Wrap|Line Numbers
  1. my %hash;
  2. open(FH,'file') or die "$!";
  3. while(<FH>) {
  4.    chomp;
  5.    my @data = split(/\|/);
  6.    push @{$hash{$data[1]}},$data[0];
  7. }
  8. close FH;
  9.  
That will create a hash of arrays from the first two data fields of each line. To access the array that is associated with the hash key (assumes you know the name of the hash key):

Expand|Select|Wrap|Line Numbers
  1. foreach my $id (@{$hash{'Harry Potter'}}) {
  2.      print "$id\n";
  3. }
  4.  
which is really the same as a regular array with the exception of some extra symbols @{} to dereference the array that the hash key points to.
Feb 27 '08 #11

eWish
Expert 100+
P: 971
If you plan on having your product line growing quickly, then I would suggest that you use a database to handle your data for you. It would simplify things for you greatly.

--Kevin
Feb 27 '08 #12

P: 12
Actually, what you are trying to do is not so ill advised as some have tried to make out. It sounds like what you really want to do is search through your database for products that match a given title. A hash is a great way to search for things, but unfortunately, as has been mentioned, the hash keys need to be unique.

The first possible work-around for this would be to simply use the product ID as your key. This gets you something unique, but means that to 'search' you'll have to iterate through all the keys and check for a value that matches the title for which you are searching.

My suspicion is that you want to avoid such a search by instead creating an index with a hash. The problem of course is that you can't simply use the title as the key to your has and associate the product ID with it since each new occurrence of the same title will result in overwriting the previously associated value.

Instead though, what you could do is create what is called a hash of arrays.

Rather than simply storing the value in the hash under a certain key, you instead could have each key associated with a reference to an array. Instead of putting the product ID into the value for a given key, you would add it to the array associated with the key. This gives you a hash that looks like this:

Expand|Select|Wrap|Line Numbers
  1. my %search_index = (
  2.   'Title One'  => ['book-123', 'dvd-456'],
  3. );
  4.  
  5.  
Then you'll simply be able to look for the key in the has and find out that book-123 and dvd-456 both match that title. A further lookup into the has that uses the product IDs as its keys will give you the full details for either or both of these matches. This scales out fairly well.

I'd post some example code, but I'm in a bit of a rush. Still, there are plenty of good guides and chapters in books on hashes of arrays and other data structures. One good place to start is

Expand|Select|Wrap|Line Numbers
  1. perldoc perldsc
  2.  
Feb 27 '08 #13

KevinADC
Expert 2.5K+
P: 4,059
Instead though, what you could do is create what is called a hash of arrays.
While you explained it in more detail, that is the same suggestion I already made in a previous post. There has been no mention of what he is trying to do to be ill advised that I noticed, it's the way his going about doing it that needs work.

Not that there is anything wrong with repeating what others have already suggested, especially since you took the time to elaborate on it more.

Hope to see you around here more often.

Regards,
Kevin
Feb 27 '08 #14

P: 98
Thanks for the help and explanation, I appreciate it.
I'm gonna try the hash of arrays

thanks
Feb 27 '08 #15

P: 12
There has been no mention of what he is trying to do to be ill advised that I noticed
Perhaps that was a poor choice of words on my part. Still, I took the following:

If you plan on having your product line growing quickly, then I would suggest that you use a database to handle your data for you. It would simplify things for you greatly.
To mean that how the original poster intended to solve the problem was somehow deficient and far better served by using a database rather than a hash internal to the perl program. This may well be true, depending on the current and expected sizes of the database and the resources available on the machine that will be running the code.

It is unlikely that the use of a database would actually simplify things for the OP though. Instead it is likely to complicate things, at least in the short term. Selecting a database to use, learning SQL, and introducing a dependency on additional software that may not already be available on the machine where the script will run...

We already know that the current data is kept in a flat file. At a minimum, a hash of arrays or a hash of hashes is quite likely to be an improvement on that.
Feb 27 '08 #16

P: 98
MySQL is no problem, I'm pretty good with it, but i'm simply not allowed to use that in my assignment.
Feb 27 '08 #17

P: 98
You certainly helped me. I got it working now.

I have a new problem, but I'm not gonna sart a new topic, and instead I'll just post it here since it's about searching in hashes as well.

One question: how can i search for keys in a hash using a regular expression?
because if I do something like this:

if (exists $inventoryCodes{qr/.*/})

It doesn't find anything...
Feb 28 '08 #18

KevinADC
Expert 2.5K+
P: 4,059
You certainly helped me. I got it working now.

I have a new problem, but I'm not gonna sart a new topic, and instead I'll just post it here since it's about searching in hashes as well.

One question: how can i search for keys in a hash using a regular expression?
because if I do something like this:

if (exists $inventoryCodes{qr/.*/})

It doesn't find anything...

You have to loop through a hash and check each key to find something with a regexp:

Expand|Select|Wrap|Line Numbers
  1. foreach my $key (keys %hash) {
  2.    if ($key =~ /foo/) {
  3.        found it
  4.    }
  5. }
  6.  
You can also use grep on the hash keys but it might not be as efficient unless you wanted to find hash keys that are similar in some way.

Keep in mind that hash keys have to be simple strings, so what you posted just looks to see if there is hash key in the hash table that is literally 'qr/.*/' it does not try and match a hash key using a compiled regexp if thats what you are hoping.

But are you really searching for a hash key or a hash value? If it's a value and you don't know the name of the key you have to loop through the hash and use a regexp to search each value:

Expand|Select|Wrap|Line Numbers
  1. foreach my $key (keys %hash) {
  2.    if ($hash{$key} =~ /foo/) {
  3.        found it
  4.    }
  5. }
  6.  
Or use grep to find multiple matches.
Feb 28 '08 #19

P: 98
it's the keys I'm comparing. It's to expand the search I made with the help of this topic by adding functionality like searching for:

'keyword 1' AND 'keyword2'

so both keywords need to be in the string representing the key.

also an 'keyword 1' OR 'keyword2'

and also searching for one word in keys (given the fact the keys are sentences: i.e "Harry Potter and ..." )

I was just hoping there would be a more efficient way than just check each value and compare...because that won't give hashes an advantage in speed against arrays.
Feb 28 '08 #20

P: 12
it's the keys I'm comparing. It's to expand the search I made with the help of this topic by adding functionality like searching for:

'keyword 1' AND 'keyword2'

so both keywords need to be in the string representing the key.

also an 'keyword 1' OR 'keyword2'

and also searching for one word in keys (given the fact the keys are sentences: i.e "Harry Potter and ..." )

I was just hoping there would be a more efficient way than just check each value and compare...because that won't give hashes an advantage in speed against arrays.

OK, so to some extent, you really want to be able to query all sorts of possible matching situations against the data. Eventually, yes, you are going to end up wanting an SQL-like syntax to express searches. So you really need to stop and ask yourself how much farther your search semantics are going to evolve and if you're going to pass a point where the code your writing would have been better served with an SQL database. For simple searches, you just don't need one, but the more complex and varried your searches are going to become, the more of a payoff you're going to get from the investment in going to the effort.

That said, I don't think that what you're trying to do so far has quite reached that point.

At this point though I think you're best off keeping your original data in a hash with the product ID as the key and building arrays that index into that hash for the various searches you expect to be most common.

Assuming your inventory is kept in a hash that looks something like this:

Expand|Select|Wrap|Line Numbers
  1. my %inventoryDB = (
  2.     DVD-432 => ['Harry Potter and the Prisoner of Azkaban'. '25.00', '5'],
  3.     BOOK-432 => ['Harry Potter and the Prisoner of Azkaban', '15.00' '7'],
  4. );
  5.  
You'd build a title index that looked something like this:

Expand|Select|Wrap|Line Numbers
  1. my %titleIDX = (
  2.     'Harry Potter and the Prisoner of Azkaban' => ['DVD-432', 'BOOK-432'],
  3. );
  4.  
And you could even go so far as to build a keywords index by iterating through titles and other such things to create indexes like:

Expand|Select|Wrap|Line Numbers
  1. my %keywordIDX = (
  2.     'harry' => ['DVD-432', 'BOOK-432'],
  3.     'potter' => ['DVD-432', 'BOOK-432'],
  4.     'prisoner' => ['DVD-432', 'BOOK-432'],
  5.     'azkaban' => ['DVD-432', 'BOOK-432'],
  6. );
  7.  
Though such an index would get rather large and do so quickly. (You'd end up doing better to start using something like Lucene at some point going down this path.)

Constructs such as these allow you to do things like grep or pattern match across the keys of the hashes. Depending on how you intend to update the information in the database, you could even build arrays of the keys ahead of time just to avoid repetitive calls to keys().

Expand|Select|Wrap|Line Numbers
  1. my @titleKEY = keys %titleIDX;
  2.  
You'd then need to regenerate indexes and arrays with lists of keys either on every update that would change them, or on a periodic basis, understanding that they'd be out of date until such an update occurred. (This is largely a function of how 'live' the data is supposed to be.)

Very quickly this would lend itself to a bit of object orientation in which you'd presumably have objects for both individual inventory items and for the inventory database itself. Then the various instance data accessors would would have code that would understand when an update would require a re-index and some part of the indexes, potentially triggering a cascade of re-indexes along the way.

The upshot is that this all gets very complex in an effort to save you from simply doing:

Expand|Select|Wrap|Line Numbers
  1. foreach my $key (keys %hash) {
  2.     if ($hash{$key} =~ /foo/) {
  3.         found it
  4.     }
  5. }
  6.  
I guess what I'm saying is that this last little code snippet may not be ideal, but it is always up to date with whatever is in your hash, and it is really simple to code. You don't need to worry about complex re-indexing schemes and so on.

If you really have a need for making this simple to do, simple to extend, AND high performance (so much so that you're looking att using multiple arrays with common numerical indexes rather than hashes for performance reasons) then I'm finally willing to admit that you should go ahead and start using an SQL database now.
Mar 3 '08 #21

eWish
Expert 100+
P: 971
minowicz,

What would be the main reason you would suggest the OP move towards using a database? Also, from the standpoint of performance, what gain(s) does the database have to offer vs the use of multiple hashes and/or arrays?

--Kevin
Mar 4 '08 #22

P: 12
minowicz,

What would be the main reason you would suggest the OP move towards using a database? Also, from the standpoint of performance, what gain(s) does the database have to offer vs the use of multiple hashes and/or arrays?

--Kevin
Actually, I'm largely opposed to the idea of moving this to a database unless one of two conditions are true:
  1. The total size of the database is large enough that it will not readily fit into memory when all of the various indexes and such are considered along with the original data.
  2. The search semantics required are so elaborate, varied, and/or changing through the application life cycle that the OP is likely to have to reinvent the wheel of SQL.

There are probably solutions for either of these situations that would not require a move to an SQL database, but the original question was a fairly simple case. New requirements in the follow-on seem to be revealing the likelihood of #2 above.

One simple possibility, particularly given the original data format would be DBD::RAM. More speed and/or a smaller memory footprint could probably be had from DBD::DBM or possibly DBD::SQLite or DBD::SQLite2 all listed on CPAN. Moving to something like MySQL or PostgreSQL are other possibilities if the situation warrants it.

Again, my contention is that this all *could* be done in Perl, but if there are going to be more complex searches, changing search needs, the need to keep the data 'live; for both searches and updates, complex rules for updating only the indexes that need to be updated for a given change, and so on... Well, there comes apoint when it is wise to be lazy and rely on systems that others have already built that do such things and do them very well.
Mar 4 '08 #23

Post your reply

Sign in to post your reply or Sign up for a free account.