matthurne wrote:
My next question would be, how?
The answer is that is depends on the specific compiler/linker.
Let's work through an example from start to finish, and maybe it will
help clarify things.
First, let's make a source file:
stuff.cpp
-----
int f(int x)
{
return x ;
}
int g(int x)
{
return x*x ;
}
Now, we compile this source file, using whatever method our compiler
likes. What get's generated? On most compilers, a file called
"stuff.o" or "stuff.obj" would be generated. Exactly what is in this
file depends on the compiler, but it contains, at a minimum, the
compiled code for the functions f and g, and probably some sort of
symbol table that says "this block of data represents the compiled code
for f", and "this block of data represents the compiled code for g".
Now, for the convenience of everybody who might use f and g, let's
create a header file:
stuff.h
------
int f(int x) ;
int g(int x) ;
It doesn't make any sense to "compile" a header file. There is no
executable code there. The first line simply says, "somewhere (we
aren't specifying where, but somewhere), there is a function in
existance called f, and this function takes one integer as a parameter,
and returns an integer." The second line says a similar thing about g.
Time for another source file:
main.cpp
--------
#include "stuff.h"
int main()
{
int y ;
y = f(3) + g(2) ;
return 0 ;
}
Now, we compile main.cpp. The very first thing the compiler does is run
main.cpp through what is called the "preprocessor". The preprocessor
sees the include statement, and replaces it with the contents of stuff.h
, so by the time it gets to the more interesting phases of compiling, it
looks like this:
preprocessed main.cpp
---------------------
int f(int x) ;
int g(int x) ;
int main()
{
int y ;
y = f(3) + g(2) ;
return 0 ;
}
As before, the compiler is going to emit an object file called something
like "main.o" or "main.obj". This object file will contain the compiled
version of the function main, and probably a symbol table that has
'main' in it somewhere. An interesting question is how this line gets
compiled:
y = f(3) + g(2) ;
After all, the compiler, at this point, doesn't know very much about f
and g. It certainly doesn't know anything about the actual compiled
versions of f and g, as that got created during a completely different
run of the compiler, and it doesn't have any idea where to begin looking
for it now. It does know, however, because of the first two lines that
declare f and g, that f and g exist somewhere, that they each take an
integer as a parameter, and each return an int. That is enough
information to at least make sure that the functions are being used
correctly. Instead of actually putting all of the code necessary to
call f and g into the object file (which it can't do, because, if you
recall, it doesn't know where f and g are), it is content to put some
sort of marker that says "I need to call some function called f right
here", and similarly for g.
So, now we've compiled both our source files, and have two object files,
which for sake of argument we'll say are "stuff.o" and "main.o". The
next step is to combine these together into a program that someone could
actually run. That is where the linker comes in. When we use the
linker, we give it ALL the object files at the same time, rather than
one at a time. So, here is a (highly simplified) dramatization of what
happens in the linker:
1. I know that programs need to start at a function called "main", so I
need to find that.
2. Here it is, in "main.o". So, I'll copy this compiled code into the
executable I'm creating.
3. But wait! This function called "main" needs to call some function
called "f", so I need to find that.
4. Here it is, in "stuff.o". So, I'll copy this compiled code for "f"
into the executable, and replace the marker in "main" with the actual
location of "f".
5. "main" needs another function called "g", so I need to find that.
6. Here it is, in "stuff.o". So, I'll copy this compiled code for "g"
into the executable, and replace the marker in "main" with the actual
location of "g".
7. It doesn't look like anything else is needed, so I'll write out this
executable in the proper format for this operating system.
At this point, we have a real, working executable (which, admittedly,
doesn't do very much). There are a couple of other things that are
worth mentioning, that may often confuse people.
First, many object files are often stuffed together into one large file
that we call a library. It is quite likely that whatever compiler you
are using has the code for its C++ library functions stored like this.
Second, many compilers will, by default, automatically call the linker
after they are finished compiling, and in some cases may not bother to
actually write out the object files to disk. For example, consider
compiling this with g++ in the following manner:
$ g++ main.cpp stuff.cpp
Now, looking in the directory, we see:
$ ls
a.out main.cpp stuff.cpp stuff.h
If you are not familiar with g++, a.out is the name given to executables
if you do not explicitly tell it a name. But, as you can see, no object
files were created. At least, they weren't saved on the disk. You can
be pretty sure that g++ went through the process of creating them in
memory (or temporary files), and then invoked the linker to combine
them. Consider an alternative approach, in which we explicitly tell g++
to only compile (but not invoke the linker) :
$ g++ main.cpp
$ g++ stuff.cpp
Now, let g++ invoke the linker on the object files:
$ g++ main.o stuff.cpp
$ ls
a.out main.cpp main.o stuff.cpp stuff.h stuff.o
The executable got created just the same. So what is the advantage of
doing it this way? With large projects, compilation can take a long
time. So if we are careful (and there are tools to help) we can get
away with only recompiling the source files that we change, and then
just relink everything together. Look up 'Makefile' or 'make' for more
information.
I hope this helps,
Alan