How do header files work?

matthurne

I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

Thanks!

Jul 22 '05 #1

Subscribe Post Reply

12485

Jeff Schwab

matthurne wrote:

I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

The #include directive just tells the preprocessor to go read a
different file, just as though the contents of that file had been
included inline. There's no other magic there. You can #include any
file you like, regardless of what its name is.

Jul 22 '05 #2

dpr

matthurne a écrit :

I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

Thanks!

To make it simple, you could replace:

#include "myfile"

with the verbatim content of "myfile"

There is no link between an include file myfile.h and myfile.cpp for
instance.

The usage is to put definition of type and functions (prototypes) so
that they can be "known" by #include'ing them in many others source file
that use these function or types.

when you define "void myfunc();" in a header file that you include it in
a source file, you only tell the compiler that a function myfunc() is
defined elsewhere.

Jul 22 '05 #3

JKop

matthurne posted:

I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

Thanks!

Your C++ program is made up of Source Code files, AKA modules, which usually
have a .cpp extension.

Each Source Code file is compiled SEPARATELY.
Here's a Header file, "Kangaroo.hpp", starting at the @ symbol, ending at
the @ symbol:

@

void Hop(void);

@

Now here's a Source Code file, "Chocolate.cpp":
@

#include "Kangaroo.hpp"

int main(void)
{
Hop();
}

@
What happens is that even before the Source File is compiled, a thing called
the Preprocessor copies and pastes the entire contents of "Kangaroo.hpp"
right into where you've written "#include Kangaroo.hpp" in the Source Code
file "Chocolate.cpp". Once all of the header files have been included,
you're left with what's called a translation unit. The translation unit for
"Chocolate.cpp" is as follows:

@

void Hop(void);

int main(void)
{
Hop();
}

@
Then, this translation unit gets compiled. When it's compiled, you're left
with an Object file, which may be "Chocolate.obj" or "Chocolate.o". That's
all the work finished for "Chocolate.cpp".

You're obviously going to have a Source Code file called "Kangaroo.cpp", in
which will be the actual definition of the function "void Hop(void)". Here's
"Kangaroo.cpp":

@

#include "Kangaroo.hpp"

void Hop(void)
{
int k = 4;

k+=6;
}

@

Out of this comes a Translation Unit, out of which comes an Object file.

So now you've got two Object files: "Chocolate.obj" and "Kangaroo.obj". Now,
the linker takes over. The linker links all your Object files into an
executable.

Now... one little thing, you *can* have an include statement within a header
file. Here's the contents of "blinds.hpp":

@

#include "crocodile.hpp"

int Cardiac(void);

@
Now, imagine that you have a Source Code file that includes both
"blinds.hpp" and "crocodile.hpp". The contents of both will be copied into
your Source Code file, resulting in Crocodile being copied in twice.
Therefore, you'll have something like the following in your Source Code
file:

struct House
{
int k;
};

struct House
{
int k;
};

Obviously, the first problem is that you've got a "Multiple Definition",
which will result in a compile error. Not only that, if you have loads and
loads of header files, and they all get included a couple of times, your
program would take much longer to compile. Here's the solution: Here's the
contents of "NiceHeader.hpp":

@

#ifndef INC_NICEHEADER_HPP
#define INC_NICEHEADER_HPP

int Relapse(void);
#endif

@
Now, the very first time you include "NiceHeader.hpp" into your Source Code
file, it will be copypasted in. But... subsequent includes won't do
anything, because INC_NICEHEADER_HPP is already defined.

Each #define is unique to each Source Code file.
Even though INC_NICEHEADER_HPP is defined in "monkey.cpp", that doesn't mean
it's defined in "resistence.cpp", because each Source Code file is a
seperate entitiy, which gets compiled into a separate object file. Then the
linker links the object files into an executable. Therefore, you can have
"niceheader.hpp" included into 50 different Source Code files, but you're
guaranteed that it will only be included into each Source Code file ONCE.
Another thing I feel I must mention, take the following Source Code file,
"StarFish.cpp":

@

#include "iostream"

using namespace std;

int main(void)
{

cout << "Help!!";

return 0;
}

@
Here you'll see that you don't have to write "std::cout". This can be handy
in a Source Code file...
DON'T DO IT IN A HEADER FILE!! Why? Because the namespace comes into scope
for the ENTIRE Source Code file into which the header file has been
included. Concordantly, if you refer to anything in the standard library,
from within a header file, you must write "std::" before it.
Hope that helps.
-JKop

Jul 22 '05 #4

Karl Heinz Buchegger

matthurne wrote:

I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

Thanks!

Actually is very simple:

The compiler does the equivalent of:

take the source file, look for an
#include "something"

then open 'something', take the content of
that file and replace the text

#include "something"

with the read text from the file 'something'

That's it: text substitution, no more, no less.

--
Karl Heinz Buchegger
kb******@gascad.at

Jul 22 '05 #5

matthurne

Jeff Schwab <je******@comcast.net> wrote in message news:<js********************@comcast.com>...

matthurne wrote:
I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

The #include directive just tells the preprocessor to go read a
different file, just as though the contents of that file had been
included inline. There's no other magic there. You can #include any
file you like, regardless of what its name is.

Ok, fair enough. That is helpful...however, header files will
generally only contain function declarations, not definitions,
correct? If this is the case, how does including just the declaration
allow the function to actually work when there is no #include of the
source file which contains the function's full-blown definition?

Jul 22 '05 #6

Michael Schutte

matthurne wrote:

Ok, fair enough. That is helpful...however, header files will
generally only contain function declarations, not definitions,
correct? If this is the case, how does including just the declaration
allow the function to actually work when there is no #include of the
source file which contains the function's full-blown definition?

That's the linker's work. The compiler (i.e. gcc) only knows the
declarations of the functions, not the definitions. The linker (i.e. ld)
binds object files together, which contain the definitions.

--
Michael Schutte
Remove the Xes from the eMail address to reply.

Jul 22 '05 #7

Howard

"matthurne" <ma***********@yahoo.com> wrote in message
news:4b**************************@posting.google.c om...

Jeff Schwab <je******@comcast.net> wrote in message

news:<js********************@comcast.com>...

matthurne wrote:
I just started learning C++ on my own...I'm using Accelerated C++.
Something it hasn't explained and I keep wondering about is how header
files actually work. I suspect it doesn't get into it because it is,
as the authors love to say, "implementation specific". If that's the
case, how does the compiler commonly handle them? I use Linux and gcc
specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my
understanding I could name my header files something completely
unrelated to the source files which contain the definitions and
including the header files would still make the defined
functions/whatnot usable. How?

The #include directive just tells the preprocessor to go read a
different file, just as though the contents of that file had been
included inline. There's no other magic there. You can #include any
file you like, regardless of what its name is.

Ok, fair enough. That is helpful...however, header files will
generally only contain function declarations, not definitions,
correct? If this is the case, how does including just the declaration
allow the function to actually work when there is no #include of the
source file which contains the function's full-blown definition?

The compiler has to know which source files it's going to compile in the
first place, right? Usually, that means listing them in some kind of
project file. Those are the files that contain the implementation code.
You don't compile the headers, you compile the source files, and those
include the headers they specify.

In some cases, you don't have the source file, but only a header file and
some kind of library. In that case, your project has to specify the library
where the compiled sources reside.

So, you compile all the sources based on the list of source files (or
whatever), and those get the headers included which they specify, then these
compilation units are compiled into object code, and afterwards everything
gets linked up (as needed, depending upon what it is you're building).

-Howard

Jul 22 '05 #8

Luther Baker

ma***********@yahoo.com (matthurne) wrote in message news:<4b*************************@posting.google.c om>...
....

specifically. Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file. It is my

....

Its a complicated process of which I do not have a lot of experience,
so I'm sure I will misspeak, but imagine the original implementers
writing the standard libraries. For simplicity, let's say they write
"something.h" that declares lots of things and "something.cpp" that
defines or implements lots of things.

Well, many commercial vendors don't want to give away their
implementation code, so they compile the implementation
"something.cpp" into binary, operating system specifc object files or
libraries - and distribute the libraries or object files with their
associated header files - ie: the implementation source code
"something.cpp" is not needed.

Now, as a developer, when you want to use functionality from these
files - you must include "something.h" in your source code - so that
during the first phase of compilation, amongst other things, your
compiler can look for syntax problems .. checking legal use of
functions, structures, classes, etc. against the declarations in the
header file.

Then you reach a second stage of compilation. When you actually
compile and create your own object files or executables, the compiler
searchs library paths, accessing its libraries and linking them into
your code. At this point, if object files or libraries that are linked
in don't somehow match what their respective header files said - then
lots of things will go terribly wrong. Hopefully, this never happens.

Man is this ever simplistic, but:

1. The compiler looks through your code and validates your syntax with
the syntax in the included header files
2. Then, in a different phase, the compiler locates libraries and
either links them to your target or statically compiles code into your
target object file or executable.

The names of the header files are irrelevant. As long as you've
inluded the proper names in your source code - and valid object files
or libraries implementing those header files are in the compiler's
library path, all will compile just fine.

As a side note, some library's header files are located in the
standard include area and are therefore easily seen via #include
<myheader.h> but to compile your code, you are forced to add a flag or
some other indicator to the compiler to tell it where or which library
to use.

g++ -o myapp.exe myapp.cpp -lthirpartylib

Hth,

-Luther

Jul 22 '05 #9

matthurne

Ok, I think I understand then. Accelerated C++ did a great job at
explaining the same things you all explained, including the importance
of the things JKop mentioned such as using #ifndef...#endif to avoid
multiple inclusion and using fully qualified names (std::cout, etc) in
header files. I think the answer to my ultimate question came from
Michael...the linker is what makes the connection between the function
declaration and definition, correct? My next question would be, how?
Something tells me that gets into issues I probably won't understand.
:-) If anyone can throw it into "just finished his sophomore year of
undergraduate school for computer science" language, go ahead and try!
I'm assuming the linker sees the declaration/use of the function and
then searches wherever it is configured to search for the definition
of the function? i.e. the same directory as the files being linked,
the standard library, etc?

Jul 22 '05 #10

Dan Moos

Here is what I think you understand so far. You have some class, called cat

class cat{}

that is what the .h file would have. In a seperate .cpp file, you would have
all of cat's methods defined. This .cpp gets compiled into an .obj withe the
rest of your code.
Now, you just include the cat.h file in any source file where a cat class
might be needed.

This much I think you understand. Now for the part I think you don't get.

By including cat.h. your telling your source file abouut the cat class. That
way, evrytime the compiler sees a cat class in your source file, it knows
that it is a valid class. You are "exposing" cat classes to tha source file.
At this point, the compiler doesn't know or care whether the definition of
cat's methods actually exists. That is the linker's job. The inclusion of
the header only makes cat classes availiable to that particular source file.
If the linker can't find a .obj file that defines a cat, then you get an
unresolved external link error of some sort. In this case, your source file
compiles fine because the .h file says a cat class is a valid type, but if
there isn't a coorasponding .obj file, the link will fail. For what it's
worth, the cat.cpp file must include cat.h also, for the same reasons.

Here is a good experiment. Basically, try creating the scenario I just
described, and when it all works, I think you'll underestand. do this:

write an .h file that has the class definition for a cat. NOT THE METHODS!
just the basic class.

write a .cpp file that defines cat's methods. Make sure and #include cat.h
in this .cpp file

write a mainprogram.cpp file that uses cats somehow. Make sure and #include
cat.h in this file also

compile and link the whole shebang

let us know how it goes!

Jul 22 '05 #11

Alan Johnson

matthurne wrote:

My next question would be, how?

The answer is that is depends on the specific compiler/linker.

Let's work through an example from start to finish, and maybe it will
help clarify things.

First, let's make a source file:

stuff.cpp
-----
int f(int x)
{
return x ;
}

int g(int x)
{
return x*x ;
}
Now, we compile this source file, using whatever method our compiler
likes. What get's generated? On most compilers, a file called
"stuff.o" or "stuff.obj" would be generated. Exactly what is in this
file depends on the compiler, but it contains, at a minimum, the
compiled code for the functions f and g, and probably some sort of
symbol table that says "this block of data represents the compiled code
for f", and "this block of data represents the compiled code for g".

Now, for the convenience of everybody who might use f and g, let's
create a header file:

stuff.h
------
int f(int x) ;
int g(int x) ;

It doesn't make any sense to "compile" a header file. There is no
executable code there. The first line simply says, "somewhere (we
aren't specifying where, but somewhere), there is a function in
existance called f, and this function takes one integer as a parameter,
and returns an integer." The second line says a similar thing about g.

Time for another source file:

main.cpp
--------
#include "stuff.h"

int main()
{
int y ;
y = f(3) + g(2) ;
return 0 ;
}

Now, we compile main.cpp. The very first thing the compiler does is run
main.cpp through what is called the "preprocessor". The preprocessor
sees the include statement, and replaces it with the contents of stuff.h
, so by the time it gets to the more interesting phases of compiling, it
looks like this:

preprocessed main.cpp
---------------------
int f(int x) ;
int g(int x) ;

int main()
{
int y ;
y = f(3) + g(2) ;
return 0 ;
}
As before, the compiler is going to emit an object file called something
like "main.o" or "main.obj". This object file will contain the compiled
version of the function main, and probably a symbol table that has
'main' in it somewhere. An interesting question is how this line gets
compiled:

y = f(3) + g(2) ;

After all, the compiler, at this point, doesn't know very much about f
and g. It certainly doesn't know anything about the actual compiled
versions of f and g, as that got created during a completely different
run of the compiler, and it doesn't have any idea where to begin looking
for it now. It does know, however, because of the first two lines that
declare f and g, that f and g exist somewhere, that they each take an
integer as a parameter, and each return an int. That is enough
information to at least make sure that the functions are being used
correctly. Instead of actually putting all of the code necessary to
call f and g into the object file (which it can't do, because, if you
recall, it doesn't know where f and g are), it is content to put some
sort of marker that says "I need to call some function called f right
here", and similarly for g.

So, now we've compiled both our source files, and have two object files,
which for sake of argument we'll say are "stuff.o" and "main.o". The
next step is to combine these together into a program that someone could
actually run. That is where the linker comes in. When we use the
linker, we give it ALL the object files at the same time, rather than
one at a time. So, here is a (highly simplified) dramatization of what
happens in the linker:

1. I know that programs need to start at a function called "main", so I
need to find that.
2. Here it is, in "main.o". So, I'll copy this compiled code into the
executable I'm creating.
3. But wait! This function called "main" needs to call some function
called "f", so I need to find that.
4. Here it is, in "stuff.o". So, I'll copy this compiled code for "f"
into the executable, and replace the marker in "main" with the actual
location of "f".
5. "main" needs another function called "g", so I need to find that.
6. Here it is, in "stuff.o". So, I'll copy this compiled code for "g"
into the executable, and replace the marker in "main" with the actual
location of "g".
7. It doesn't look like anything else is needed, so I'll write out this
executable in the proper format for this operating system.
At this point, we have a real, working executable (which, admittedly,
doesn't do very much). There are a couple of other things that are
worth mentioning, that may often confuse people.

First, many object files are often stuffed together into one large file
that we call a library. It is quite likely that whatever compiler you
are using has the code for its C++ library functions stored like this.

Second, many compilers will, by default, automatically call the linker
after they are finished compiling, and in some cases may not bother to
actually write out the object files to disk. For example, consider
compiling this with g++ in the following manner:

$ g++ main.cpp stuff.cpp

Now, looking in the directory, we see:

$ ls
a.out main.cpp stuff.cpp stuff.h

If you are not familiar with g++, a.out is the name given to executables
if you do not explicitly tell it a name. But, as you can see, no object
files were created. At least, they weren't saved on the disk. You can
be pretty sure that g++ went through the process of creating them in
memory (or temporary files), and then invoked the linker to combine
them. Consider an alternative approach, in which we explicitly tell g++
to only compile (but not invoke the linker) :

$ g++ main.cpp

$ g++ stuff.cpp

Now, let g++ invoke the linker on the object files:

$ g++ main.o stuff.cpp

$ ls
a.out main.cpp main.o stuff.cpp stuff.h stuff.o
The executable got created just the same. So what is the advantage of
doing it this way? With large projects, compilation can take a long
time. So if we are careful (and there are tools to help) we can get
away with only recompiling the source files that we change, and then
just relink everything together. Look up 'Makefile' or 'make' for more
information.

I hope this helps,
Alan

Jul 22 '05 #12

David Harmon

On 19 May 2004 07:50:44 -0700 in comp.lang.c++, ma***********@yahoo.com
(matthurne) wrote,

Basically, I don't understand how a header file being
included makes a connection with the source file that has the
definitions for whatever is in the header file.

As you guessed, this is implementation specific.

A typical implementation might be, the definitions are all precompiled
and the resulting object code combined into "library" files that are
indexed in such a way as to make them easy to search. Then when you
call some library function, there is a step that searches the library
file(s) for the matching definitions and links them with your code into
an executable program.

Jul 22 '05 #13

JKop

matthurne posted:

I think the answer to my ultimate question came from Michael...the
linker is what makes the connection between the function declaration
and definition, correct? My next question would be, how?
Take the two following Object files:

choclate.obj
icecream.obj
"chocolate.obj" will have a sort of Contents at the start of it. It will say
what functions it has and what global variables it has. Maybe it'll look
like this:

void MakeChocolateChips(void) //A function
unsigned int DiameterOfChocolateChip //A global variable
void HeatChocolate(unsigned int) //A function
"icecream.obj"'s contents:

int PickFlavour(void) //A function
double GetVolume(void) //A function
double DensityOfIcecream //A global variable

So now the linker takes over. It throws all the object files together,
resulting in:
void MakeChocolateChips(void) //A function
unsigned int DiameterOfChocolateChip //A global variable
void HeatChocolate(unsigned int) //A function
int PickFlavour(void) //A function
double GetVolume(void) //A function
double DensityOfIcecream //A global variable
Now, Let's say "int PickFlavour(void)" calls "void MakeChocolateChips
(void)", it'll find it because the linker has introduced all the object
files to each other and made one entire entity out of them.
Moving on...
Let's say you have a function in "chocolate.obj" as so:

void Hello(void);

And you also have a function in "icecream.obj" as so:

void Hello(void);
When the linker throws everything together, what do you think is going to
happen?... Exactly, Multiple Definition. Link Error.

Let's say that the function "void MakeChocolateChips(void)" in
"chocolate.obj" calls the function "void Hello(void)" in "chocolate.obj".
But if you try link, you'll get a Multiple Definition error. What you want
to do is remove "void Hello(void)" from the Contents of the "chocolate.obj"
file, thus giving it Internal Linkage, so the Linker won't throw it in with
everything else. To achieve this, you do the following in "chocolate.cpp":

static void Hello(void);
Now "void Hello(void)" can only be seen from within "chocolate.obj". It has
internal linkage. Now the prog will link without error. You can do the same
with the global variables:

static int monkey;

Something tells me that gets into issues I probably won't understand.
:-) If anyone can throw it into "just finished his sophomore year of
undergraduate school for computer science" language, go ahead and try!

I left school went I was 16. Learned C++ from a book or two. I'm now 17.

Hope that helps
-JKop

Jul 22 '05 #14

matthurne

Ok, well I think between all of you my question was answered! At
least for now... :-) My main curiosity had to do with the linker and
how it works, though when I first started this thread I didn't know
it. Alan's post was especially helpful...btw, I'm using Slackware and
g++ exclusively since that's what I run on my desktop...I did figure
out on my own that if I just compiled all the files together, it "just
worked"...now I better understand that g++ did all the steps for me.
g++ -c will compile only to object files. g++ -v was a nice way for
me to see all the commands g++ was running, including the linking.

I'm satisfied, but if you aren't, feel free to post some more. Thanks
everyone.

P.S. JKop, I hope your programming abilities with leaving school at
16 doesn't mean my $27,000 a year is a waste! Ahhh! ;-)

Jul 22 '05 #15

JKop

matthurne posted:

P.S. JKop, I hope your programming abilities with leaving school at
16 doesn't mean my $27,000 a year is a waste! Ahhh! ;-)

Stereotypes are there to be broken.

Education, as with beauty, is in the eye of the beholder.
-JKop

Jul 22 '05 #16

Alan Johnson

Alan Johnson wrote:

$ g++ main.o stuff.cpp

One correction ... this should have been, obviously:

$ g++ main.o stuff.o
Sorry for any confusion.

Alan

Jul 22 '05 #17

How do header files work?

Similar topics