424,054 Members | 1,078 Online
Bytes IT Community
Submit an Article
Got Smarts?
Share your bits of IT knowledge by writing an article on Bytes.

The Case Against Global Variables

Expert Mod 5K+
P: 9,197
C++: The Case Against Global Variables

This article explores the negative ramifications of using global variables. The use of global variables is such a problem that C++ architects have called it polluting the global namespace. This article explores what happens when the global namespace becomes polluted and how to avoid this condition.

The opinions expressed in this article are those of the author alone although many have appeared in various publications and developer forums.

What is a Global Variable?
A global variable is a variable that is defined outside a scope. That is, it is outside any braces. Because of this, the variable is not owned by any scope and is therefore visible to any scope.

Global variables come in two varieties, those with external linkage and those with internal linkage. Global variables with external linkage are sharable among multiple source files whereas global variables with internal linkage are not.

Why Use Global Variables?
A global variable is a convenient way to avoid function arguments for pieces of information that are common to many functions. A function can use the global variable without having to have it passed in as an argument.

Unfortunately, this feature is misused by many programmers who don't understand how to use function arguments. They simply make the variable global and then it does not have to be passed or returned. The function can have no arguments and thus the passing of arguments problem is avoided altogether.

Why a Case Against Global Variables?
Based on many years of experience, there have been no real cases where a global variable is absolutely required. That is, a global variable is a convenience and it comes at a price in the form of potential disasters that can cause your program to fail to function as designed, if not at the outset, then perhaps in the future as the result of changes.

Disaster 1: A Local Variable Hides the Global Variable
In C++, the local scope is always the default scope. Therefore, should you declare a local variable of the same name as a global variable, then the local variable will be used instead of the global. Note that the local variable doesn't even have to be the same type as the global. It just has to have the same name.
Expand|Select|Wrap|Line Numbers
  1. int data = 10;
  3. int main()
  4. {
  5.     double data = 25.0;
  6.     cout << data << endl;   //You see 25
  7. }
To avoid this disaster, you must be careful to identify the scope of the variable being used. That is, you must use the scope resolution operator to indicate which variable you want to use.

Expand|Select|Wrap|Line Numbers
  1. int data = 10;
  3. int main()
  4. {
  5.     double data = 25.0;
  6.     cout <<    data << endl;   //You see 25
  7.     cout << ::data << endl;   //You see 10
  8. }
The scope resolution operator (::) used with no scope refers to the unnamed namespace, which is popularly called the global namespace.

The weakness here is that using the scope resolution operator this way depends upon an infallible programmer who doesn't exist.

The infallible way to avoid this disaster is to not have a global variable in the first place.

Disaster 2: Variable Name Conflicts
Declaring a global variable makes the variable accessible from multiple source files. This opens the door to name conflicts when your code is merged with someone else's. Any duplication of variable names can cause an error at link time.

The C solution was to go through the code and either change the name of one of the conflicting variables, or failing that, to define one of the variables as static. A static global variable is accessible only from the file that declares it. If the static variable needed to be accessed from several source files, then the source files were merged together to form one giant source file.

The C++ solution is to use a thing called a namespace. This is a way of adding a scope name to the name of the global variable to make it distinguishable from the conflicting global. This is called the fully qualified name .

Expand|Select|Wrap|Line Numbers
  1. int data;
  3. namespace MyStuff
  4. {
  5.     int data;
  6. }
The global variable outside the namespace is ::data whereas the name of the global variable inside the namespace is MyStuff::data. At least this removed the one name conflict.

However, since C++ namespaces are open-ended, there is no guarantee that later on in some other source file:
Expand|Select|Wrap|Line Numbers
  1. namespace MyStuff
  2. {
  3.     int data;
  4. }
might be defined. The name conflict again rears its head only this time there are two MyStuff::data variables and you are back to the C solution. Or, create a second namespace and put one of the variables in there. But then, the variable could be re-declared in that second namespace in some other source file and the whole name conflict mess repeats itself.

The lesson here is that there is no protection against name conflicts regardless of whether namespaces are used or not. The most that can be said is that by using a namespace, the chances of a name conflict have been reduced but not eliminated.

Of course, you could use an anonymous namespace for your global but then it would only be accessible in the file where the anonymous namespace was defined. This is identical to the C static global variable and you are back to one giant source file.

The infallible way to avoid this disaster is to not have a global variable in the first place.

Disaster 3: Exposure of Implementation
Defining a global variable requires that you also define the name of the variable. This is the same as having a public data variable. As a result, the name of this variable is scattered throughout the user code. Should a redesign be required in the future, then you will find the effort to make the change will be much greater because this variable name will have to be changed or removed in all those places.

Consider the case of an application limit defined by a global variable named TheLimit. Suppose management decides that TheLimit has to be obtained by a function call at the moment it is needed in the program because TheLimit can now vary with time. Writing the function is easy but going into the user code and replacing all occurrences of TheLimit with a function call is not easy: a) the users won't let you change their code, b) if some code happens to be in a user-written DLL that is shipped worldwide then all of the existing user DLLs have to replaced with the new one.

The net effect is that you may be told that implementing this feature is not possible. The global variable has effectively frozen your design. So, in addition to breaking encapsulation, the global variable has also terminated improvements to your product.

A global variable acts like a hard-coded value like you might get using a #define. The use of hard-coded values in code is discouraged.

Maximizing the ripple of change is another way of looking at this. By having the global variable you have maximized the amount of work when it comes time to change the design.

The solution here is to never have had the global variable in the first place but instead have the limit obtained by calling a function. You can always rewrite the function and if you keep the same function prototype, then the users need only to re-compile and re-link.

Disaster 4: No Multithreading
A global variable is accessible from many functions. There is no protection against having the global variable accessed by functions operating on separate threads. When this condition occurs, each thread can access the variable without the other threads knowing about it and if this thread changes the value of the global then the other threads may be doing likewise and these updates may collide with each other if they occur at the same time and produce incorrect values in the global variable.

This is called a race condition and can cause the program to operate incorrectly, or even crash. Therefore, any global variable indicates that the program cannot be multithreaded. Considering that most modern software is multithreaded, then code using global variables isn't modern and can't be used with modern software.

Even protecting the variables in critical sections won't work since there is no guarantee that all functions will use a critical section when working with these variables. That would depend upon the infallible programmer again.

These global variables can be sharable, static, or defined in namespaces. It makes no difference. Their mere presence limits the program to single-threaded execution.

Disaster 5: Maximizes the Maintenance

When the value in a global variable gets screwed up, then all functions that can access the variable are suspects. Each of them will have to be examined to find the culprit (or culprits). With just a few functions maybe this isn't so bad but with a real application, there are potentially thousands of functions that would need examination. Now solving the problem is not so easy.

This is further complicated by the fact that you may not have access to all the code. It could be user functions that cause the problem and the users may not want you researching their code. They will simply put your problem on their priority list and will get around to it at some point. Meanwhile, the problems caused by the bad value will just need to be endured.

A really bad fallout from this is user-written code that compensates for the error. If you fix the error, the compensating code may now be what screws up the value. There was a case where a linker had calculated addresses that were off by 4 bits. Users wrote code to compensate and when the vendor fixed the linker, the user links now produced bad addresses due to the compensating code. Result: The users could not use any further upgrades of the linker.

Disaster 6: No Guarantee the Global Will Be Used
Having a global variable does not mean that the variable will be used. Programmers may not be aware of the global and create other variables instead.

Recall that a local variable with the same name as the global will hide the global. Besides, a programmer just may not believe in the global and refuse to use it.

Once again, it is better to provide a function that returns the value.

This disaster is avoided by not having global variables that can be ignored in the first place.

Disaster 7:Globals Expand the Memory Footprint
Global variables must be created and initialized before main() starts execution and they must remain in existence until main() completes execution. That is, they are present for the entire life of the program.

Each of these variables makes your memory footprint larger thereby consuming more machine resources than necessary.

Or worse, the global variables may be const variables defined in a header file. That means one set of these const global variables will be created each time the header file is included in a source file. If your program has 500 source files, then you have 500 sets of these const global variables. Normally, this would cause a linker failure due to multiple definitions. However, to avoid this, the const global variables are static by default.

Static global variables can't be accessed outside the file that defines them so this avoids the multiple definition error. If these constant variables need to be used in many source files, then you declare the variables in the header file and define them in a source file.

Variables should never be defined in a header file.

Expand|Select|Wrap|Line Numbers
  1. //Header file
  3. extern const double PI;
Then in a source file, you define the variable as sharable (that is, with external rather than static linkage).
Expand|Select|Wrap|Line Numbers
  1. //Source file
  3. extern const double PI = 3.14159;
You can minimize your program footprint by not having global variables in the first place.

Disaster 8: Memory for Global Variables May be Limited
Some environments limit the amount of memory made available for global variables. Should your program exceed those limits, then it will not execute. For maximum portability do not make assumptions about how much global memory is acceptable.

Avoid this disaster by not using global memory.

Disaster 9: No Guarantee for the Order of Creation .

The C++ language specification specifies that global variables in a single source file are created and initialized in the order which they appear in the file.

Remember that C++ can make constructor calls to initialize variables so these global variables can be initialized by their constructors. This is the only case in C++ where code executes before main() starts.

If two global variables A and B are defined, and A appears before B, then A is guaranteed to be created and initialized before B. Therefore, B can be initialized using a constructor that has A for an argument.

The disaster occurs when A is defined in one source file and B is defined in another. There is no guarantee in which order the source files were loaded by the linker into the executable. That means B could be initialized before A was created and the constructor of B would be using an argument that doesn't yet exist. The crash is immediate.

The popular name for this is the global initialization fiasco.

Therefore, never define global variables that depend upon other global variables for their initialization.

Actually, your best course is to not have global variables in the first place.

Avoiding the Disasters

The best course is to not have global variables in the first place.

Solution 2:
Next, you can use an anonymous namespace to hold your global variables and define a function in that source file to access it. This forces everyone to call the function to access the variable.

You can now redesign the global variable to, say, reading a file without breaking user code so long as the function prototype does not change. If it does, you can just add the new function that reads the file. Then new users can use the new function and old ones can continue to use the hard-coded variable. When all users use the new function, the old one can be retired and the global variable removed.

Solution 3:
Use a Singleton.

A Singleton is an object that can have only one instance and that instance represents the global variable. Often, singletons are created only when needed. The singleton is accessed using a globally available Instance() function much like the function in Solution 2 above.

Where there needs to be many singletons, these may be created, initialized and then stored in a database with a name for a key. You access these singletons by requesting them by name.

There is an article in the C/C++ Articles forum on the Singleton Design Pattern.

Further Information
Design Patterns by Erich Fromm, et al, Addison-Wesley 1994.
Effective C++ Scott Meyers 3rd Edition Addison-Wesley 1998

Copyright 2007 Buchmiller Technical Associates North Bend WA USA
Nov 13 '07 #1
Share this Article
Share on Google+
1 Comment

P: 2
Hi weaknessforcats,
Firstly, thanks a lot for your concern. As you requested, I repost my question here after reading through your article

I have two questions:
1) About memory space in C++
2) About global and static variable

As what I read somewhere that after compilation process which translate C++ code into machine language, your application is given a certain amount of memory to use, that memory space is divided into 4 segments as follow

a. Code segment, where all application code is stored
b. Data segment, where global data is stored
c. Stack segment, where local variables are stored
d. Heap segment, where dynamic memory is used

First question, is my statement correct?


If correct, the second question:
Are all global and static variables initialized when program start and exist until the program is terminated? Where are these types of variable located? Is it data segment?

If the first question is wrong, my second question
How is memory space allocated for an application? Where are all types of variable located?

Anyway, thanks a lot for your concern and help
Dec 20 '07 #2