473,385 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,385 developers and data experts.

Hashes, Static Variables, Recursive Functions, Serialization

4,059 Expert 2GB
This snippet of code provides several examples of programming techniques that can be applied to most programs.
  • using hashes to create unique results
  • static variable
  • recursive function
  • serialization

The code itself generates a list of strings comprised of random numbers. No number will be repeated within a string, and no string will be repeated within the list of strings. Following the code is a brief discussion of how the above programming techniques are used to achieve those requirements.

Expand|Select|Wrap|Line Numbers
  1. use warnings;
  2. use strict;
  4. my $length = 7; # the amount of numbers per string
  5. my $max = 1000; # the number of strings
  6. my @strings; # array to store the strings
  7. for (1..$max) {
  8.    push @strings, rand_nums($length);
  9. }
  10. print "$_\n" for @strings;
  11. {
  12.    my %cache; # create a "static" variable
  13.    sub rand_nums {
  14.       my %local_cache; # create a new hash
  15.       my ($length) = @_; # the length of the random number string passed to the sub
  16.       my $serial = int(rand(49))+1; # get the first number for the serialized string
  17.       $local_cache{$serial} = 1; # store the first number in the hash
  18.       for (2..$length) {# start at 2 because we already have the first number
  19.          my $num = int(rand(49))+1; # get a new random number
  20.          redo if exists $local_cache{$num}; # redo the loop if the number already exists
  21.          $local_cache{$num} = 1; # store the number in the hash
  22.          $serial .= "-$num"; # append to the serialized string
  23.       }
  24.       rand_nums($length) if exists $cache{$serial}; # redo the function if the key already exists
  25.       $cache{$serial}=1; # store the new key in the hash (%cache)
  26.       return $serial;
  27.    }
  28. }
Whenever you think unique with perl a hash is invariably going to be used. This is because the keys of a hash must be unique. I used two hashes, one to keep track of each string of numbers (%local_cache) to insure there are no repeated numbers in any one string, and another one to insure that all the strings are unique (%cache).

%cache is wrapped in a block {} around the function (rand_nums) to create a perlish version of a C static variable. Declaring it with "my" inside the block makes it lexically scoped to the function but hides it from the rest of the code/file. Since the code snippet above is not doing anything else this is not really necessary, but its an example of how you can create static variables with perl. Perl 5.10 also has a new feature that allows you to create static variables called state.

%local_cache is declared with "my" inside the function because we want to declare a new hash each time the function is called. This insures that each string of numbers has no repeated values. %cache insures that there are no repeated strings of numbers.

$length and $max could easily be used to accept some input to the script making those variables dynamic each time the script is run

Inside the rand_nums() function you see a call back to the function:

rand_nums($length) if exists $cache{$key};

This just tells perl to run the function again if the string already exists. The function runs again, produces another string of numbers and checks again if it already exists or not. This is a simple example of a recursive perl function. If the string of numbers does not already exist the code continues to run and eventually returns a string back to be added to the array @strings.

Something important to keep in mind when using a recursive function is to not introduce a condition that could cause infinite (or deep) recursion. The "warnings" pragma will warn you if such a condition occurs in your perl scripts. In the code above if you set $length to a small number, say 3, you can see for yourself what will happen.

I mentioned serialization at the beginning. Serialization is the joining together of arbitrary strings to make a single string. $serial will be the serialized string, which is the joined list of random number. It starts at this line:

my $serial = int(rand(49))+1; # get the first number for the serialized string

The string is serialized by using the concatenation operator together with the assignment operator:

$serial .= "-$num";

We end up with a string something like this:


That string is used as the hash key to make sure there are no duplicated strings in the final list. The reason we don’t join the keys of %local_cache to make the serialized string is because while hashes have no guaranteed order, they don’t have a random order. The resultant strings of random numbers created by joining the hash keys together would no longer be random but would favor the probability of numbers occurring at certain positions in the strings. You will end up with strings that have the same numbers in the same positions over and over even though no two strings will be duplicated.

A list of pragmas and perl builtin functions used in the code:

Perl functions :
Pragmas :
  • strict - Perl pragma to restrict unsafe constructs
  • warnings - Perl pragma to control optional warnings
Jan 2 '09 #1
6 9576
4,059 Expert 2GB
Comments, corrections, discussions are welcome.
Jan 2 '09 #2
176 Expert 100+
Documentation missing for "redo".
It's a very interesting function.

redo - perldoc.perl.org
Jan 17 '09 #3
4,059 Expert 2GB
Good catch, thanks.
Jan 17 '09 #4
2,460 Expert Mod 2GB
Nice article kevin. ++
Feb 22 '09 #5
does any one know how to index a file
i have it coming into a array in a different file
for example names.txt where letters are strings as in names
and a phone numbers file phoneNumbs.txt
these two file are stored in a third file eg codes.txt
Mar 30 '10 #6
785 Expert 512MB
Ok, here is some feedback as you wish. But if you do not like critical feedback, then don't read further on...

1.) copy&paste style vs. modular programming
It's not a good idea to duplicate source code.
It's better to have the code for the "program-logic", the creation of the random number, lines 16-17 and 19-21, only one time. And then start your for-loop with 1.
There is always a way to achieve that. Mostly a simple rearrange of lines would do. If that doesn't work, you can skip unwanted lines inside your loop as a last resort, by conditional testing the loopcounter for the initial value.
During my sourcecode-cleanups and error-search, I have detected this copy&paste-style in poor code regularly and most often it was the error source. That means one programmer writing it, the second programmer not understanding it fully and only changing the code inside the loop, then the tester not testing for the initial condition and then crashing in production!. I already have seen a code block with over 50 (!) lines on the top, bottom and inside a loop. After understanding that the program-logic was all the same of the slightly different blocks, I modified the block inside the loop slightly and deleted the ones on the top and bottom, and voila, the whole function's size shrank by half!
But let's get back to your example: let's say somebody after you must modify your code by exchanging the logic, to speed up your program: to get a random number not by "rand(49)+1", but by using a complex random number generator that doesn't repeat random numbers. He would probably focus only on the line with the rand() function and replace that with several lines of code. If the program-logic would be there only once, he could do that easily without understanding the surrounding. But now he finds this rand()-function 2 times and he must take a time-consuming analysis to understand that he needs to replace both, doing double work. And what if he only replaced it once? Then the error is hard to find. A tester and error-fixer must use 2 breakpoints, not just one. He must understand what the first programmer did, then what the second one did, to be able to correct the problem. And so on and so on. The code gets more and more complicated. Statistically a function is written once, but overlooked or changed by other programmers more than 10 times during its lifetime. That means the time spend for analysis of the code is much bigger than for writing it. So the best programming technique is to do it correctly from the beginning on (and put many comments inside!), because it pays off.

2.) avoid endless or random recursion
As you already mentioned it, your program will go to endless recursion. It's not a solution to let the Perl compiler handle it, just avoid it from the beginning on. Just imagine someone is using your function in a big project and suddenly there is a stack overflow error! So how do you know if your function or some other project function generated it? The stack overflow could be in another function but it could be the fault of your function! You could have used up the stack (which has space for 1000 recursions) by your function unless there is only space for 3 more recursions and then another function just needs to use only 5 recursions and crashes!
Also a big proroblem is that there is not a maximum recursion deepness of your function, because it depends on the random number generated. You only need to get unlucky. If you roll your dice 20 times and you never got a 3, you will think the 21th roll must do it now. But in reality the chances are always 1/6th, the same as in the first roll. It's extremely difficult to reproduce the error for analysis if the error occurs random and seldom.
If there are only one million different strings possible, and you set up the program to generate one million plus one strings, it will crash during its attempt to generate the last one. And even when it tries to generate the one-millionth string, it will have a recursion deep of average half a million. What a waste of time and memory!
So just write your algorithm by using a loop instead of recursion and the memory isn't wasted. By using a non-repeating random number sequence generator, you wouldn't waste so much time. Or use a shuffle-algorithm: a bucket with all numbers where you draw one every time by random. If the bucket is empty (this corresponds to the case of endless recursion), the program just breaks with an error.

3.) always check input/output
What if I call your function with $length=0? It generates a string with length=1 !!! But I was expecting the empty string. (I would also expect an error message if $max (the number of generated strings) would be != 1 in this case). This is an unwanted effect of your program-logic block duplication. So you have to make an "if length != 0" statement around your first block. If you would have had only one block inside the loop and initialized $serial with the empty string, it would skip the loop and return the right result without further coding.
Most errors happen by programmers not thinking about (and forgetting to test)the extreme values of their functions, like minimum and maximum. A good programming style is to test for illegal input values always at the beginning of the function. A function should work as a black box: the function itself should handle all possibilities instead of making assumptions. And it should be specified (and tested) for all possible input. There should be no need (and waste of time) for a programmer who wishes to use this function to analyze it! (And then maybe implementing various workarounds before calling it, testing for special values ahead of time that might crash the function or lead to unexpected results)
Mar 31 '10 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

by: James Sleeman | last post by:
Hi all, i just spent an hour rifling through code to find the cause of a problem only to find it's an oddity with serialization and recursive objects, so figured I'd post for the next person who...
by: Nolan Martin | last post by:
is a static functions address constant? ie.. static void func(); write_to_file(&func); Restart program... static void func(); void (*funcPtr) ();
by: Bryan Parkoff | last post by:
I have noticed that C programmers put static keyword beside global variable and global functions in C source codes. I believe that it is not necessary and it is not the practice in C++. Static...
by: Datta Patil | last post by:
Hi , #include<stdio.h> func(static int k) /* point2 : why this is not giving error */ { int i = 10 ; // static int j = &i ; /* point 1: this will give compile time error */ return k; } /*...
by: yellow1912 | last post by:
Hi, I'm very new to C#, in fact I dont know anything about it. Im having problem with its static variables. I used static variables alot in C++, but seem like C# doesnt allow it inside a function...
by: pauldepstein | last post by:
I have tried to get help over debugging issues and everyone who's replied has been as helpful as possible. Unfortunately, I haven't found ways to simplify the issues enough to provide compilable...
by: Jesper Schmidt | last post by:
When does CLR performs initialization of static variables in a class library? (1) when the class library is loaded (2) when a static variable is first referenced (3) when... It seems that...
by: Zytan | last post by:
I see that static is more restricted in C# than in C++. It appears usable only on classes and methods, and data members, but cannot be created within a method itself. Surely this is possible in...
by: Why Tea | last post by:
I seem to remember that in ANSI C, all static functions should have their function prototypes listed at the beginning of the file for better consistency (or something like that). I googled and...
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.