467,202 Members | 1,016 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,202 developers. It's quick & easy.

hash increases after exists function

Hello All,

I've run into a problem I am not able to solve myself because I don't know what perl exactly does when I try to use the exist function.

My script does the following:
first i use a database to build up a hash, this hash has has then around 800000 values divided over 2300 keys.
Then I use a file with which I must examine whether a component exists in that hash. If the value exists it simply adds 1 to the number of times the value was found. A value can exist in combination with multiple keys.
Now when I count the number of values before the counting and after the counting the computer comes up with different numbers which should, in my view, be impossible.

A piece of my code looks like this:
Expand|Select|Wrap|Line Numbers
  1. $chrompos = $chromosome."_".$position;
  2. $test_loc=$position+$lengthreads-1;
  3. $test_pos = $chromosome."_".$test_loc;
  4. foreach $exon_id(keys %exon_hash){
  5.     if (exists $exon_hash{$exon_id}{$chrompos}){
  6.         if (exists $exon_hash{$exon_id}{$test_pos}){
  7.                 for ($i=0;$i<$lengthreads;$i++){
  8.                     $next_position=$position+$i;
  9.                     $x= $chromosome."_".$next_position;
  10.                     $exon_hash{$exon_id}{$x}{amount}++;
  11.                 }
  12.                   }
  13.         }
  14. }
  15.  
when I print out a chromosomic location ($chrompos) the value is changed after the second exist function. This only happens in very rare cases but when it happens values gets added to my hash.

Does anyone knows what goes wrong here and how to solve it?
Thanks in advance.

Regards
Karel
Oct 20 '08 #1
  • viewed: 1503
Share:
6 Replies
KevinADC
Expert 2GB
Hello All,

I've run into a problem I am not able to solve myself because I don't know what perl exactly does when I try to use the exist function.

My script does the following:
first i use a database to build up a hash, this hash has has then around 800000 values divided over 2300 keys.
Then I use a file with which I must examine whether a component exists in that hash. If the value exists it simply adds 1 to the number of times the value was found. A value can exist in combination with multiple keys.
Now when I count the number of values before the counting and after the counting the computer comes up with different numbers which should, in my view, be impossible.

A piece of my code looks like this:
Expand|Select|Wrap|Line Numbers
  1. $chrompos = $chromosome."_".$position;
  2. $test_loc=$position+$lengthreads-1;
  3. $test_pos = $chromosome."_".$test_loc;
  4. foreach $exon_id(keys %exon_hash){
  5.     if (exists $exon_hash{$exon_id}{$chrompos}){
  6.         if (exists $exon_hash{$exon_id}{$test_pos}){
  7.                 for ($i=0;$i<$lengthreads;$i++){
  8.                     $next_position=$position+$i;
  9.                     $x= $chromosome."_".$next_position;
  10.                     $exon_hash{$exon_id}{$x}{amount}++;
  11.                 }
  12.                   }
  13.         }
  14. }
  15.  
when I print out a chromosomic location ($chrompos) the value is changed after the second exist function. This only happens in very rare cases but when it happens values gets added to my hash.

Does anyone knows what goes wrong here and how to solve it?
Thanks in advance.

Regards
Karel
Maybe you want to start this loop at 1 instead of 0:

Expand|Select|Wrap|Line Numbers
  1. for ($i=0;$i<$lengthreads;$i++){
If you add 0 to $position the current value of $next_position is not changed so the value of $x is not changed and then you increment the value of:

Expand|Select|Wrap|Line Numbers
  1. $exon_hash{$exon_id}{$x}{amount}++;

So it looks possible that you might count the above key twice for the same value. I could be totally wrong but try using 1 as the initial value instead of 0:

Expand|Select|Wrap|Line Numbers
  1. for ($i=1;$i<$lengthreads;$i++){
Oct 20 '08 #2
I tried what you suggested but it didn't solve the problem, still suddenly some extra entries in my hash got created.

Any other suggestions?

Regards
Karel
Oct 21 '08 #3
numberwhun
Expert Mod 2GB
I tried what you suggested but it didn't solve the problem, still suddenly some extra entries in my hash got created.

Any other suggestions?

Regards
Karel

Can you show what is in your hash and what was expected? Also, can you show your data source?

Regards,

Jeff
Oct 21 '08 #4
KevinADC
Expert 2GB
OK, lets look at these two lines:

Expand|Select|Wrap|Line Numbers
  1. if (exists $exon_hash{$exon_id}{$chrompos}){
  2.          if (exists $exon_hash{$exon_id}{$test_pos}){
  3.  
When you check for the existence of the key $chrompos in the first line, if the key $exon_id did not already exist it will spring into "life". Same in the next line. If the $test_pos key did not exist $exon_id will spring into life if it did not already exist. This is called autovivication. The only key that does not get autovivified is the deepest key ($chrompos and $test_pos in the this case).

I don't know if that is the problem but I can't tell what the problem might be just by looking at the code you posted besides the two suggestions I have now given you.
Oct 21 '08 #5
Here if we consider the top level 'for loop', autovivication should not arise:

Expand|Select|Wrap|Line Numbers
  1. foreach $exon_id(keys %exon_hash)
  2. {      
  3.       if (exists $exon_hash{$exon_id}{$chrompos})
  4.       {   
  5.                if (exists $exon_hash{$exon_id}{$test_pos}){
  6.  
  7.  

Because from the %xeon_hash only those keys will be taken which 'exists' in the hash and then second line checks second level key after giving first level key which already exists...same to next 'exists' check....so I feel in this tight check mode where there is clear navigation from keys level 1,2,3....autovivcation should not come in picture...

Please correct me if my understanding is wrong..

Regards,
Pawan
Oct 23 '08 #6
KevinADC
Expert 2GB
Here if we consider the top level 'for loop', autovivication should not arise:

Expand|Select|Wrap|Line Numbers
  1. foreach $exon_id(keys %exon_hash)
  2. {      
  3.       if (exists $exon_hash{$exon_id}{$chrompos})
  4.       {   
  5.                if (exists $exon_hash{$exon_id}{$test_pos}){
  6.  
  7.  

Because from the %xeon_hash only those keys will be taken which 'exists' in the hash and then second line checks second level key after giving first level key which already exists...same to next 'exists' check....so I feel in this tight check mode where there is clear navigation from keys level 1,2,3....autovivcation should not come in picture...

Please correct me if my understanding is wrong..

Regards,
Pawan

After reading your clear explanation above, I agree with you. Autovivcation should not be occuring as the loop advances from level one key to level two key. As long as a level is not skipped everything should be OK.

Good observation Pawan,
Kevin
Oct 23 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by CowBoyCraig | last post: by
3 posts views Thread by lestrov1@mail.ru | last post: by
3 posts views Thread by Brian | last post: by
21 posts views Thread by Johan Tibell | last post: by
139 posts views Thread by ravi | last post: by
4 posts views Thread by Amit Bhatia | last post: by
6 posts views Thread by j1mb0jay | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.