By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,594 Members | 3,459 Online
Bytes IT Community
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

Hashes, Static Variables, Recursive Functions, Serialization

KevinADC
Expert 2.5K+
P: 4,059
This snippet of code provides several examples of programming techniques that can be applied to most programs.
  • using hashes to create unique results
  • static variable
  • recursive function
  • serialization

The code itself generates a list of strings comprised of random numbers. No number will be repeated within a string, and no string will be repeated within the list of strings. Following the code is a brief discussion of how the above programming techniques are used to achieve those requirements.

Expand|Select|Wrap|Line Numbers
  1. use warnings;
  2. use strict;
  3.  
  4. my $length = 7; # the amount of numbers per string
  5. my $max = 1000; # the number of strings
  6. my @strings; # array to store the strings
  7. for (1..$max) {
  8.    push @strings, rand_nums($length);
  9. }
  10. print "$_\n" for @strings;
  11. {
  12.    my %cache; # create a "static" variable
  13.    sub rand_nums {
  14.       my %local_cache; # create a new hash
  15.       my ($length) = @_; # the length of the random number string passed to the sub
  16.       my $serial = int(rand(49))+1; # get the first number for the serialized string
  17.       $local_cache{$serial} = 1; # store the first number in the hash
  18.       for (2..$length) {# start at 2 because we already have the first number
  19.          my $num = int(rand(49))+1; # get a new random number
  20.          redo if exists $local_cache{$num}; # redo the loop if the number already exists
  21.          $local_cache{$num} = 1; # store the number in the hash
  22.          $serial .= "-$num"; # append to the serialized string
  23.       }
  24.       rand_nums($length) if exists $cache{$serial}; # redo the function if the key already exists
  25.       $cache{$serial}=1; # store the new key in the hash (%cache)
  26.       return $serial;
  27.    }
  28. }
  29.  
Whenever you think unique with perl a hash is invariably going to be used. This is because the keys of a hash must be unique. I used two hashes, one to keep track of each string of numbers (%local_cache) to insure there are no repeated numbers in any one string, and another one to insure that all the strings are unique (%cache).

%cache is wrapped in a block {} around the function (rand_nums) to create a perlish version of a C static variable. Declaring it with "my" inside the block makes it lexically scoped to the function but hides it from the rest of the code/file. Since the code snippet above is not doing anything else this is not really necessary, but its an example of how you can create static variables with perl. Perl 5.10 also has a new feature that allows you to create static variables called state.

%local_cache is declared with "my" inside the function because we want to declare a new hash each time the function is called. This insures that each string of numbers has no repeated values. %cache insures that there are no repeated strings of numbers.

$length and $max could easily be used to accept some input to the script making those variables dynamic each time the script is run

Inside the rand_nums() function you see a call back to the function:

rand_nums($length) if exists $cache{$key};

This just tells perl to run the function again if the string already exists. The function runs again, produces another string of numbers and checks again if it already exists or not. This is a simple example of a recursive perl function. If the string of numbers does not already exist the code continues to run and eventually returns a string back to be added to the array @strings.

Something important to keep in mind when using a recursive function is to not introduce a condition that could cause infinite (or deep) recursion. The "warnings" pragma will warn you if such a condition occurs in your perl scripts. In the code above if you set $length to a small number, say 3, you can see for yourself what will happen.

I mentioned serialization at the beginning. Serialization is the joining together of arbitrary strings to make a single string. $serial will be the serialized string, which is the joined list of random number. It starts at this line:

my $serial = int(rand(49))+1; # get the first number for the serialized string

The string is serialized by using the concatenation operator together with the assignment operator:

$serial .= "-$num";

We end up with a string something like this:

3-22-45-9-11-33-48

That string is used as the hash key to make sure there are no duplicated strings in the final list. The reason we don’t join the keys of %local_cache to make the serialized string is because while hashes have no guaranteed order, they don’t have a random order. The resultant strings of random numbers created by joining the hash keys together would no longer be random but would favor the probability of numbers occurring at certain positions in the strings. You will end up with strings that have the same numbers in the same positions over and over even though no two strings will be duplicated.


A list of pragmas and perl builtin functions used in the code:

Perl functions :
Pragmas :
  • strict - Perl pragma to restrict unsafe constructs
  • warnings - Perl pragma to control optional warnings
Jan 2 '09 #1
Share this Article
Share on Google+
6 Comments


KevinADC
Expert 2.5K+
P: 4,059
Comments, corrections, discussions are welcome.
Jan 2 '09 #2

Kelicula
Expert 100+
P: 176
Documentation missing for "redo".
It's a very interesting function.

redo - perldoc.perl.org
Jan 17 '09 #3

KevinADC
Expert 2.5K+
P: 4,059
@Kelicula
Good catch, thanks.
Jan 17 '09 #4

Expert Mod 100+
P: 2,327
Nice article kevin. ++
Feb 22 '09 #5

jsos20
P: 11
@Kelicula
does any one know how to index a file
i have it coming into a array in a different file
for example names.txt where letters are strings as in names
and a phone numbers file phoneNumbs.txt
these two file are stored in a third file eg codes.txt
Mar 30 '10 #6

Expert 100+
P: 785
Ok, here is some feedback as you wish. But if you do not like critical feedback, then don't read further on...

1.) copy&paste style vs. modular programming
It's not a good idea to duplicate source code.
It's better to have the code for the "program-logic", the creation of the random number, lines 16-17 and 19-21, only one time. And then start your for-loop with 1.
There is always a way to achieve that. Mostly a simple rearrange of lines would do. If that doesn't work, you can skip unwanted lines inside your loop as a last resort, by conditional testing the loopcounter for the initial value.
During my sourcecode-cleanups and error-search, I have detected this copy&paste-style in poor code regularly and most often it was the error source. That means one programmer writing it, the second programmer not understanding it fully and only changing the code inside the loop, then the tester not testing for the initial condition and then crashing in production!. I already have seen a code block with over 50 (!) lines on the top, bottom and inside a loop. After understanding that the program-logic was all the same of the slightly different blocks, I modified the block inside the loop slightly and deleted the ones on the top and bottom, and voila, the whole function's size shrank by half!
But let's get back to your example: let's say somebody after you must modify your code by exchanging the logic, to speed up your program: to get a random number not by "rand(49)+1", but by using a complex random number generator that doesn't repeat random numbers. He would probably focus only on the line with the rand() function and replace that with several lines of code. If the program-logic would be there only once, he could do that easily without understanding the surrounding. But now he finds this rand()-function 2 times and he must take a time-consuming analysis to understand that he needs to replace both, doing double work. And what if he only replaced it once? Then the error is hard to find. A tester and error-fixer must use 2 breakpoints, not just one. He must understand what the first programmer did, then what the second one did, to be able to correct the problem. And so on and so on. The code gets more and more complicated. Statistically a function is written once, but overlooked or changed by other programmers more than 10 times during its lifetime. That means the time spend for analysis of the code is much bigger than for writing it. So the best programming technique is to do it correctly from the beginning on (and put many comments inside!), because it pays off.

2.) avoid endless or random recursion
As you already mentioned it, your program will go to endless recursion. It's not a solution to let the Perl compiler handle it, just avoid it from the beginning on. Just imagine someone is using your function in a big project and suddenly there is a stack overflow error! So how do you know if your function or some other project function generated it? The stack overflow could be in another function but it could be the fault of your function! You could have used up the stack (which has space for 1000 recursions) by your function unless there is only space for 3 more recursions and then another function just needs to use only 5 recursions and crashes!
Also a big proroblem is that there is not a maximum recursion deepness of your function, because it depends on the random number generated. You only need to get unlucky. If you roll your dice 20 times and you never got a 3, you will think the 21th roll must do it now. But in reality the chances are always 1/6th, the same as in the first roll. It's extremely difficult to reproduce the error for analysis if the error occurs random and seldom.
If there are only one million different strings possible, and you set up the program to generate one million plus one strings, it will crash during its attempt to generate the last one. And even when it tries to generate the one-millionth string, it will have a recursion deep of average half a million. What a waste of time and memory!
So just write your algorithm by using a loop instead of recursion and the memory isn't wasted. By using a non-repeating random number sequence generator, you wouldn't waste so much time. Or use a shuffle-algorithm: a bucket with all numbers where you draw one every time by random. If the bucket is empty (this corresponds to the case of endless recursion), the program just breaks with an error.

3.) always check input/output
What if I call your function with $length=0? It generates a string with length=1 !!! But I was expecting the empty string. (I would also expect an error message if $max (the number of generated strings) would be != 1 in this case). This is an unwanted effect of your program-logic block duplication. So you have to make an "if length != 0" statement around your first block. If you would have had only one block inside the loop and initialized $serial with the empty string, it would skip the loop and return the right result without further coding.
Most errors happen by programmers not thinking about (and forgetting to test)the extreme values of their functions, like minimum and maximum. A good programming style is to test for illegal input values always at the beginning of the function. A function should work as a black box: the function itself should handle all possibilities instead of making assumptions. And it should be specified (and tested) for all possible input. There should be no need (and waste of time) for a programmer who wishes to use this function to analyze it! (And then maybe implementing various workarounds before calling it, testing for special values ahead of time that might crash the function or lead to unexpected results)
Mar 31 '10 #7