Searching in Hashes

4,059

Expert 2GB

Hashes can not have identical keys. Each key has to be unique. Are you using a hash of hashes or array of hashes?

Feb 25 '08 #2

Hashes can not have identical keys. Each key has to be unique. Are you using a hash of hashes or array of hashes?

Let me give you some mre info; it's an exercise I'm making.

It's like a webshop which has items, a product code, a price and a quantity.

I read a file, which holds a 'database' of items for sale and it's structured like this:
for every item for sale there is a line in the textfile:

value1|value2|value3|value4

value1 is unique for every item. Call it a productcode or id number.
anyway due to search querys I will need to be able to get a product code from a title(which is provided by the query).

so I need a hash which has the names of the objects as keys, and the associated values will be the product codes.

now let's say there is a book and a dvd for sale with the exact same name I need to be able to get both product codes out of my hash.

That's the whole problem.

I have doublechecked my hash, and it has two identical keys after reading the file and creating the hash... I printed them all with a foreach loop.

p.s it's just a simple hash, no complicated things, I'm new^^

Feb 25 '08 #3

4,059

Expert 2GB

The same hash can not have two identical keys. Try it and see:

Expand|Select|Wrap|Line Numbers

 
%hash = (

   foo => 1,

   foo => 2,

   foo => 3

);
 
foreach my $key (keys %hash) {

   print "$key = $hash{$key}\n";

}

I don't know where you are making an error in thinking that your hash does. Post some code so I can take a look at what you are doing.

Feb 26 '08 #4

Expand|Select|Wrap|Line Numbers

 
sub loadInventory{

     open (INVENTORY, $inventoryPath);

     $line = <INVENTORY>;

     while ($line){

        @fields = split(/\|/ , $line);

        ...

        %inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);

    ...

        $line = <INVENTORY>;

     }

}

the file to be read has got for example these two lines:

DVD-432|Harry Potter and the Prisoner of Azkaban|25.00|5
BOOK-432|Harry Potter and the Prisoner of Azkaban|15.00|7

because of the split: the values are:
$fields[0] | $fields[1] | $fields[2] | $fields[3]

so, because of this line:
%inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);

i guess this hash will get two pairs, both with the value "Harry Potter and the Prisoner of Azkaban", but they won't have thesame product code...or am I mistaken?

Thanks for the help btw

Feb 26 '08 #5

WinblowsME

How about storing and printing the data like the following?

Expand|Select|Wrap|Line Numbers

 use warnings;

use strict;

use Cwd;
 
&init;
 
sub init

{

   my %data = ();

   my $directory = &getcwd;
 
   open ( IN, "$directory/Input.txt" );

      my $line = <IN>;
 
      while ( $line = <IN> )

      {

         chomp ( $line );
 
         my ( $type_id, $name, $price, $quantity ) = split ( /\s*\|\s*/, $line );

         my ( $type, $id ) = split ( /-/, $type_id );
 
         $data{$id}{$type}{NAME}     = $name;

         $data{$id}{$type}{PRICE}    = $price;

         $data{$id}{$type}{QUANTITY} = $quantity;

      }

   close ( IN );
 
   foreach my $i ( sort keys %data )

   {

      foreach my $t ( sort keys %{$data{$i}} )

      {

         # if ( $t =~ /DVD/i )

         # if ( $t =~ /book/i )
 
         if ( $data{$i}{$t}{NAME} =~ /Harry Potter/i )

         {

            print "$i -> $t -> $data{$i}{$t}{NAME} -> $data{$i}{$t}{PRICE} -> $data{$i}{$t}{QUANTITY}\n";

         }

      }

   }

}

Feb 26 '08 #6

4,059

Expert 2GB

Expand|Select|Wrap|Line Numbers

 guess this hash will get two pairs, both with the value "Harry Potter and the Prisoner of Azkaban", but they won't have thesame product code...or am I mistaken?
 

"Harry Potter and the Prisoner of Azkaban" is a value, not a key. The key is the product ID. Your code could be written much better.

Feb 26 '08 #7

@WinblowsME: Thanks, but I'm affraid that is not allowed due to other restrictions..

@KevinADC: I'm aware my code isn't good ^^ I'm only into perl a couple days now,

anyway...If Harry Potter and the Prisoner of Azkaban is a value, howcome when I do:

$productCode = $inventoryCodes{ Harry Potter and the Prisoner of Azkaban }

the $productCode variable is correct? but ofcourse only the DVD-code, and not both (DVD and BOOK)...I'm confused now ^^

Feb 26 '08 #8

4,059

Expert 2GB

@KevinADC: I'm aware my code isn't good ^^ I'm only into perl a couple days now,

anyway...If Harry Potter and the Prisoner of Azkaban is a value, howcome when I do:

$productCode = $inventoryCodes{ Harry Potter and the Prisoner of Azkaban }

the $productCode variable is correct? but ofcourse only the DVD-code, and not both (DVD and BOOK)...I'm confused now ^^

My mistake, I misread the code:

%inventoryCodes = ($fields[1],$fields[0],%inventoryCodes);

you reversed fields [1] and [0] making field [1] the key and field [0] the value. The fact remains, hashes can not have two identical keys, there is no reason to continue discussing that part of the question. When you build up your hash from the file the last value that was read in from the file that is associated with the key will be stored in the hash. If you need a hash key to have multiple values you have to use more complex data structures, like a hash of arrays or whatever is appropriate.

Feb 26 '08 #9

I believe you about the unique key part.

Ii'm trying to figure out how I'm gonna bypass that problem. probably by working with arrays, which is easy, instead of hashes, but that will be slow with big files..

Do you know any way of creating a hash system in which I can store the names: like "Harry Potter..." and the productcode of the object. With the knowledge that there will be items with identical names but different productcodes AND that I need to be able to find the productcodes of all items mathing a specified name? (because I need to implement a "search" option based on product names)

Feb 26 '08 #10

4,059

Expert 2GB

I believe you about the unique key part.

Ii'm trying to figure out how I'm gonna bypass that problem. probably by working with arrays, which is easy, instead of hashes, but that will be slow with big files..

Do you know any way of creating a hash system in which I can store the names: like "Harry Potter..." and the productcode of the object. With the knowledge that there will be items with identical names but different productcodes AND that I need to be able to find the productcodes of all items mathing a specified name? (because I need to implement a "search" option based on product names)

Don't be offended, but judging by the code you posted you need to master some basics first. The way you are reading the data in from the file is a bit convoluted and slow, and the way you are building the hash is not correct.

Generic example:

Expand|Select|Wrap|Line Numbers

 
my %hash;

open(FH,'file') or die "$!";

while(<FH>) {

   chomp;

   my @data = split(/\|/);

   push @{$hash{$data[1]}},$data[0];

}

close FH;

That will create a hash of arrays from the first two data fields of each line. To access the array that is associated with the hash key (assumes you know the name of the hash key):

Expand|Select|Wrap|Line Numbers

 
foreach my $id (@{$hash{'Harry Potter'}}) {

     print "$id\n";

}

which is really the same as a regular array with the exception of some extra symbols @{} to dereference the array that the hash key points to.

Feb 27 '08 #11

eWish

971

Expert 512MB

If you plan on having your product line growing quickly, then I would suggest that you use a database to handle your data for you. It would simplify things for you greatly.

--Kevin

Feb 27 '08 #12

Actually, what you are trying to do is not so ill advised as some have tried to make out. It sounds like what you really want to do is search through your database for products that match a given title. A hash is a great way to search for things, but unfortunately, as has been mentioned, the hash keys need to be unique.

The first possible work-around for this would be to simply use the product ID as your key. This gets you something unique, but means that to 'search' you'll have to iterate through all the keys and check for a value that matches the title for which you are searching.

My suspicion is that you want to avoid such a search by instead creating an index with a hash. The problem of course is that you can't simply use the title as the key to your has and associate the product ID with it since each new occurrence of the same title will result in overwriting the previously associated value.

Instead though, what you could do is create what is called a hash of arrays.

Rather than simply storing the value in the hash under a certain key, you instead could have each key associated with a reference to an array. Instead of putting the product ID into the value for a given key, you would add it to the array associated with the key. This gives you a hash that looks like this:

Expand|Select|Wrap|Line Numbers

 
my %search_index = (

  'Title One'  => ['book-123', 'dvd-456'],

);

Then you'll simply be able to look for the key in the has and find out that book-123 and dvd-456 both match that title. A further lookup into the has that uses the product IDs as its keys will give you the full details for either or both of these matches. This scales out fairly well.

I'd post some example code, but I'm in a bit of a rush. Still, there are plenty of good guides and chapters in books on hashes of arrays and other data structures. One good place to start is

Expand|Select|Wrap|Line Numbers

perldoc perldsc

Feb 27 '08 #13

4,059

Expert 2GB

Instead though, what you could do is create what is called a hash of arrays.

While you explained it in more detail, that is the same suggestion I already made in a previous post. There has been no mention of what he is trying to do to be ill advised that I noticed, it's the way his going about doing it that needs work.

Not that there is anything wrong with repeating what others have already suggested, especially since you took the time to elaborate on it more.

Hope to see you around here more often.

Regards,
Kevin

Feb 27 '08 #14

Thanks for the help and explanation, I appreciate it.
I'm gonna try the hash of arrays

thanks

Feb 27 '08 #15

There has been no mention of what he is trying to do to be ill advised that I noticed

Perhaps that was a poor choice of words on my part. Still, I took the following:

If you plan on having your product line growing quickly, then I would suggest that you use a database to handle your data for you. It would simplify things for you greatly.

To mean that how the original poster intended to solve the problem was somehow deficient and far better served by using a database rather than a hash internal to the perl program. This may well be true, depending on the current and expected sizes of the database and the resources available on the machine that will be running the code.

It is unlikely that the use of a database would actually simplify things for the OP though. Instead it is likely to complicate things, at least in the short term. Selecting a database to use, learning SQL, and introducing a dependency on additional software that may not already be available on the machine where the script will run...

We already know that the current data is kept in a flat file. At a minimum, a hash of arrays or a hash of hashes is quite likely to be an improvement on that.

Feb 27 '08 #16

MySQL is no problem, I'm pretty good with it, but i'm simply not allowed to use that in my assignment.

Feb 27 '08 #17

You certainly helped me. I got it working now.

I have a new problem, but I'm not gonna sart a new topic, and instead I'll just post it here since it's about searching in hashes as well.

One question: how can i search for keys in a hash using a regular expression?
because if I do something like this:

if (exists $inventoryCodes{qr/.*/})

It doesn't find anything...

Feb 28 '08 #18

4,059

Expert 2GB

You certainly helped me. I got it working now.

I have a new problem, but I'm not gonna sart a new topic, and instead I'll just post it here since it's about searching in hashes as well.

One question: how can i search for keys in a hash using a regular expression?
because if I do something like this:

if (exists $inventoryCodes{qr/.*/})

It doesn't find anything...

You have to loop through a hash and check each key to find something with a regexp:

Expand|Select|Wrap|Line Numbers

 
foreach my $key (keys %hash) {

   if ($key =~ /foo/) {

       found it

   }

}

You can also use grep on the hash keys but it might not be as efficient unless you wanted to find hash keys that are similar in some way.

Keep in mind that hash keys have to be simple strings, so what you posted just looks to see if there is hash key in the hash table that is literally 'qr/.*/' it does not try and match a hash key using a compiled regexp if thats what you are hoping.

But are you really searching for a hash key or a hash value? If it's a value and you don't know the name of the key you have to loop through the hash and use a regexp to search each value:

Expand|Select|Wrap|Line Numbers

 
foreach my $key (keys %hash) {

   if ($hash{$key} =~ /foo/) {

       found it

   }

}

Or use grep to find multiple matches.

Feb 28 '08 #19

it's the keys I'm comparing. It's to expand the search I made with the help of this topic by adding functionality like searching for:

'keyword 1' AND 'keyword2'

so both keywords need to be in the string representing the key.

also an 'keyword 1' OR 'keyword2'

and also searching for one word in keys (given the fact the keys are sentences: i.e "Harry Potter and ..." )

I was just hoping there would be a more efficient way than just check each value and compare...because that won't give hashes an advantage in speed against arrays.

Feb 28 '08 #20

it's the keys I'm comparing. It's to expand the search I made with the help of this topic by adding functionality like searching for:

'keyword 1' AND 'keyword2'

so both keywords need to be in the string representing the key.

also an 'keyword 1' OR 'keyword2'

and also searching for one word in keys (given the fact the keys are sentences: i.e "Harry Potter and ..." )

I was just hoping there would be a more efficient way than just check each value and compare...because that won't give hashes an advantage in speed against arrays.

OK, so to some extent, you really want to be able to query all sorts of possible matching situations against the data. Eventually, yes, you are going to end up wanting an SQL-like syntax to express searches. So you really need to stop and ask yourself how much farther your search semantics are going to evolve and if you're going to pass a point where the code your writing would have been better served with an SQL database. For simple searches, you just don't need one, but the more complex and varried your searches are going to become, the more of a payoff you're going to get from the investment in going to the effort.

That said, I don't think that what you're trying to do so far has quite reached that point.

At this point though I think you're best off keeping your original data in a hash with the product ID as the key and building arrays that index into that hash for the various searches you expect to be most common.

Assuming your inventory is kept in a hash that looks something like this:

Expand|Select|Wrap|Line Numbers

 
my %inventoryDB = (

    DVD-432 => ['Harry Potter and the Prisoner of Azkaban'. '25.00', '5'],

    BOOK-432 => ['Harry Potter and the Prisoner of Azkaban', '15.00' '7'],

);

You'd build a title index that looked something like this:

Expand|Select|Wrap|Line Numbers

 
my %titleIDX = (

    'Harry Potter and the Prisoner of Azkaban' => ['DVD-432', 'BOOK-432'],

);

And you could even go so far as to build a keywords index by iterating through titles and other such things to create indexes like:

Expand|Select|Wrap|Line Numbers

 
my %keywordIDX = (

    'harry' => ['DVD-432', 'BOOK-432'],

    'potter' => ['DVD-432', 'BOOK-432'],

    'prisoner' => ['DVD-432', 'BOOK-432'],

    'azkaban' => ['DVD-432', 'BOOK-432'],

);

Though such an index would get rather large and do so quickly. (You'd end up doing better to start using something like Lucene at some point going down this path.)

Constructs such as these allow you to do things like grep or pattern match across the keys of the hashes. Depending on how you intend to update the information in the database, you could even build arrays of the keys ahead of time just to avoid repetitive calls to keys().

Expand|Select|Wrap|Line Numbers

my @titleKEY = keys %titleIDX;

You'd then need to regenerate indexes and arrays with lists of keys either on every update that would change them, or on a periodic basis, understanding that they'd be out of date until such an update occurred. (This is largely a function of how 'live' the data is supposed to be.)

Very quickly this would lend itself to a bit of object orientation in which you'd presumably have objects for both individual inventory items and for the inventory database itself. Then the various instance data accessors would would have code that would understand when an update would require a re-index and some part of the indexes, potentially triggering a cascade of re-indexes along the way.

The upshot is that this all gets very complex in an effort to save you from simply doing:

Expand|Select|Wrap|Line Numbers

 
foreach my $key (keys %hash) {

    if ($hash{$key} =~ /foo/) {

        found it

    }

}

I guess what I'm saying is that this last little code snippet may not be ideal, but it is always up to date with whatever is in your hash, and it is really simple to code. You don't need to worry about complex re-indexing schemes and so on.

If you really have a need for making this simple to do, simple to extend, AND high performance (so much so that you're looking att using multiple arrays with common numerical indexes rather than hashes for performance reasons) then I'm finally willing to admit that you should go ahead and start using an SQL database now.

Mar 3 '08 #21

eWish

971

Expert 512MB

minowicz,

What would be the main reason you would suggest the OP move towards using a database? Also, from the standpoint of performance, what gain(s) does the database have to offer vs the use of multiple hashes and/or arrays?

--Kevin

Mar 4 '08 #22