Hashes, Static Variables, Recursive Functions, Serialization

4,059 Expert 2GB

This snippet of code provides several examples of programming techniques that can be applied to most programs.

using hashes to create unique results
static variable
recursive function
serialization

The code itself generates a list of strings comprised of random numbers. No number will be repeated within a string, and no string will be repeated within the list of strings. Following the code is a brief discussion of how the above programming techniques are used to achieve those requirements.

Expand|Select|Wrap|Line Numbers

 
use warnings;

use strict;
 
my $length = 7; # the amount of numbers per string

my $max = 1000; # the number of strings

my @strings; # array to store the strings

for (1..$max) {

   push @strings, rand_nums($length);

}

print "$_\n" for @strings;

{

   my %cache; # create a "static" variable

   sub rand_nums {

      my %local_cache; # create a new hash

      my ($length) = @_; # the length of the random number string passed to the sub

      my $serial = int(rand(49))+1; # get the first number for the serialized string

      $local_cache{$serial} = 1; # store the first number in the hash

      for (2..$length) {# start at 2 because we already have the first number

         my $num = int(rand(49))+1; # get a new random number

         redo if exists $local_cache{$num}; # redo the loop if the number already exists

         $local_cache{$num} = 1; # store the number in the hash

         $serial .= "-$num"; # append to the serialized string

      }

      rand_nums($length) if exists $cache{$serial}; # redo the function if the key already exists

      $cache{$serial}=1; # store the new key in the hash (%cache)

      return $serial;

   }

}

Whenever you think unique with perl a hash is invariably going to be used. This is because the keys of a hash must be unique. I used two hashes, one to keep track of each string of numbers (%local_cache) to insure there are no repeated numbers in any one string, and another one to insure that all the strings are unique (%cache).

%cache is wrapped in a block {} around the function (rand_nums) to create a perlish version of a C static variable. Declaring it with "my" inside the block makes it lexically scoped to the function but hides it from the rest of the code/file. Since the code snippet above is not doing anything else this is not really necessary, but its an example of how you can create static variables with perl. Perl 5.10 also has a new feature that allows you to create static variables called state.

%local_cache is declared with "my" inside the function because we want to declare a new hash each time the function is called. This insures that each string of numbers has no repeated values. %cache insures that there are no repeated strings of numbers.

$length and $max could easily be used to accept some input to the script making those variables dynamic each time the script is run

Inside the rand_nums() function you see a call back to the function:

rand_nums($length) if exists $cache{$key};

This just tells perl to run the function again if the string already exists. The function runs again, produces another string of numbers and checks again if it already exists or not. This is a simple example of a recursive perl function. If the string of numbers does not already exist the code continues to run and eventually returns a string back to be added to the array @strings.

Something important to keep in mind when using a recursive function is to not introduce a condition that could cause infinite (or deep) recursion. The "warnings" pragma will warn you if such a condition occurs in your perl scripts. In the code above if you set $length to a small number, say 3, you can see for yourself what will happen.

I mentioned serialization at the beginning. Serialization is the joining together of arbitrary strings to make a single string. $serial will be the serialized string, which is the joined list of random number. It starts at this line:

my $serial = int(rand(49))+1; # get the first number for the serialized string

The string is serialized by using the concatenation operator together with the assignment operator:

$serial .= "-$num";

We end up with a string something like this:

3-22-45-9-11-33-48

That string is used as the hash key to make sure there are no duplicated strings in the final list. The reason we don’t join the keys of %local_cache to make the serialized string is because while hashes have no guaranteed order, they don’t have a random order. The resultant strings of random numbers created by joining the hash keys together would no longer be random but would favor the probability of numbers occurring at certain positions in the strings. You will end up with strings that have the same numbers in the same positions over and over even though no two strings will be duplicated.

A list of pragmas and perl builtin functions used in the code:

Perl functions :

Pragmas :

strict - Perl pragma to restrict unsafe constructs
warnings - Perl pragma to control optional warnings

Jan 2 '09 #1

Subscribe Post Reply

9577

KevinADC

4,059

Expert 2GB

Comments, corrections, discussions are welcome.

Jan 2 '09 #2

Kelicula

176

Expert 100+

Documentation missing for "redo".
It's a very interesting function.

redo - perldoc.perl.org

Jan 17 '09 #3

KevinADC

4,059

Expert 2GB

@Kelicula
Good catch, thanks.

Jan 17 '09 #4

Niheel

2,460

Expert Mod 2GB

Nice article kevin. ++

Feb 22 '09 #5

jsos20

@Kelicula
does any one know how to index a file
i have it coming into a array in a different file
for example names.txt where letters are strings as in names
and a phone numbers file phoneNumbs.txt
these two file are stored in a third file eg codes.txt

Mar 30 '10 #6

chaarmann

785

Expert 512MB

Ok, here is some feedback as you wish. But if you do not like critical feedback, then don't read further on...

1.) copy&paste style vs. modular programming
It's not a good idea to duplicate source code.
It's better to have the code for the "program-logic", the creation of the random number, lines 16-17 and 19-21, only one time. And then start your for-loop with 1.
There is always a way to achieve that. Mostly a simple rearrange of lines would do. If that doesn't work, you can skip unwanted lines inside your loop as a last resort, by conditional testing the loopcounter for the initial value.
During my sourcecode-cleanups and error-search, I have detected this copy&paste-style in poor code regularly and most often it was the error source. That means one programmer writing it, the second programmer not understanding it fully and only changing the code inside the loop, then the tester not testing for the initial condition and then crashing in production!. I already have seen a code block with over 50 (!) lines on the top, bottom and inside a loop. After understanding that the program-logic was all the same of the slightly different blocks, I modified the block inside the loop slightly and deleted the ones on the top and bottom, and voila, the whole function's size shrank by half!
But let's get back to your example: let's say somebody after you must modify your code by exchanging the logic, to speed up your program: to get a random number not by "rand(49)+1", but by using a complex random number generator that doesn't repeat random numbers. He would probably focus only on the line with the rand() function and replace that with several lines of code. If the program-logic would be there only once, he could do that easily without understanding the surrounding. But now he finds this rand()-function 2 times and he must take a time-consuming analysis to understand that he needs to replace both, doing double work. And what if he only replaced it once? Then the error is hard to find. A tester and error-fixer must use 2 breakpoints, not just one. He must understand what the first programmer did, then what the second one did, to be able to correct the problem. And so on and so on. The code gets more and more complicated. Statistically a function is written once, but overlooked or changed by other programmers more than 10 times during its lifetime. That means the time spend for analysis of the code is much bigger than for writing it. So the best programming technique is to do it correctly from the beginning on (and put many comments inside!), because it pays off.

2.) avoid endless or random recursion
As you already mentioned it, your program will go to endless recursion. It's not a solution to let the Perl compiler handle it, just avoid it from the beginning on. Just imagine someone is using your function in a big project and suddenly there is a stack overflow error! So how do you know if your function or some other project function generated it? The stack overflow could be in another function but it could be the fault of your function! You could have used up the stack (which has space for 1000 recursions) by your function unless there is only space for 3 more recursions and then another function just needs to use only 5 recursions and crashes!
Also a big proroblem is that there is not a maximum recursion deepness of your function, because it depends on the random number generated. You only need to get unlucky. If you roll your dice 20 times and you never got a 3, you will think the 21th roll must do it now. But in reality the chances are always 1/6th, the same as in the first roll. It's extremely difficult to reproduce the error for analysis if the error occurs random and seldom.
If there are only one million different strings possible, and you set up the program to generate one million plus one strings, it will crash during its attempt to generate the last one. And even when it tries to generate the one-millionth string, it will have a recursion deep of average half a million. What a waste of time and memory!
So just write your algorithm by using a loop instead of recursion and the memory isn't wasted. By using a non-repeating random number sequence generator, you wouldn't waste so much time. Or use a shuffle-algorithm: a bucket with all numbers where you draw one every time by random. If the bucket is empty (this corresponds to the case of endless recursion), the program just breaks with an error.

3.) always check input/output
What if I call your function with $length=0? It generates a string with length=1 !!! But I was expecting the empty string. (I would also expect an error message if $max (the number of generated strings) would be != 1 in this case). This is an unwanted effect of your program-logic block duplication. So you have to make an "if length != 0" statement around your first block. If you would have had only one block inside the loop and initialized $serial with the empty string, it would skip the loop and return the right result without further coding.
Most errors happen by programmers not thinking about (and forgetting to test)the extreme values of their functions, like minimum and maximum. A good programming style is to test for illegal input values always at the beginning of the function. A function should work as a black box: the function itself should handle all possibilities instead of making assumptions. And it should be specified (and tested) for all possible input. There should be no need (and waste of time) for a programmer who wishes to use this function to analyze it! (And then maybe implementing various workarounds before calling it, testing for special values ahead of time that might crash the function or lead to unexpected results)

Mar 31 '10 #7

by: James Sleeman | last post by:

Hi all, i just spent an hour rifling through code to find the cause of a problem only to find it's an oddity with serialization and recursive objects, so figured I'd post for the next person who...

PHP

is a static functions address constant?

by: Nolan Martin | last post by:

is a static functions address constant? ie.. static void func(); write_to_file(&func); Restart program... static void func(); void (*funcPtr) ();

C / C++

Why STATIC keyword is needed?

by: Bryan Parkoff | last post by:

I have noticed that C programmers put static keyword beside global variable and global functions in C source codes. I believe that it is not necessary and it is not the practice in C++. Static...

C / C++

Static variable mapping !!

by: Datta Patil | last post by:

Hi , #include<stdio.h> func(static int k) /* point2 : why this is not giving error */ { int i = 10 ; // static int j = &i ; /* point 1: this will give compile time error */ return k; } /*...

C / C++

static variables?

by: yellow1912 | last post by:

Hi, I'm very new to C#, in fact I dont know anything about it. Im having problem with its static variables. I used static variables alot in C++, but seem like C# doesnt allow it inside a function...

C# / C Sharp

confusion over static variables

by: pauldepstein | last post by:

I have tried to get help over debugging issues and everyone who's replied has been as helpful as possible. Unfortunately, I haven't found ways to simplify the issues enough to provide compilable...

C / C++

Initialization of static variables in a class library

by: Jesper Schmidt | last post by:

When does CLR performs initialization of static variables in a class library? (1) when the class library is loaded (2) when a static variable is first referenced (3) when... It seems that...

.NET Framework

Does C# have static local variables like C++?

by: Zytan | last post by:

I see that static is more restricted in C# than in C++. It appears usable only on classes and methods, and data members, but cannot be created within a method itself. Surely this is possible in...

C# / C Sharp

Static function prototype

by: Why Tea | last post by:

I seem to remember that in ANSI C, all static functions should have their function prototypes listed at the beginning of the file for better consistency (or something like that). I googled and...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Hashes, Static Variables, Recursive Functions, Serialization

Similar topics