By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,787 Members | 2,251 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,787 IT Pros & Developers. It's quick & easy.

Removing dead code and unused functions

P: n/a
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.
Nov 14 '05 #1
Share this Question
Share on Google+
39 Replies


P: n/a


Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."

Greg

Nov 14 '05 #2

P: n/a
Greg wrote:
LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."


Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.
Nov 14 '05 #3

P: n/a

Le 19/06/2005 17:49, dans 11****************@sabbath.news.uk.clara.net,
«*Geronimo W. Christ Esq*» <th**************@hotmail.com> a écrit*:
Greg wrote:
LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."


Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.


I'm not sure but "nm" could be useful here.

Nov 14 '05 #4

P: n/a
In article <BE******************************@laposte.net>,
Jean-Claude Arbaut <je****************@laposte.net> wrote:
Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.
I'm not sure but "nm" could be useful here.


Linkers typically do not exclude functions in the user program that are
unused. They only do that with libraries.

More useful would be one of the many tools that generate call graphs.

-- Richard
Nov 14 '05 #5

P: n/a
Jean-Claude Arbaut wrote:
Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.


I'm not sure but "nm" could be useful here.


This problem can't be appropriately solved with a linker, particularly
not the GNU linker. GNU ld can only throw out sections, not unused
functions or global variables; so if you've got a file containing 10
functions, 9 of which are unused, all ten will still get linked.

Parsing the source code is the answer, it's just surprising no-one seems
to have done this yet.
Nov 14 '05 #6

P: n/a
Richard Tobin wrote:
Linkers typically do not exclude functions in the user program that are
unused. They only do that with libraries.

More useful would be one of the many tools that generate call graphs.


Can you think of any examples ?

I know of runtime tools which do this (could even use gcov at a pinch)
but in that case you need to come up with a set of test cases which
exercise the code fully.

Nov 14 '05 #7

P: n/a
Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?


Write unit tests for every feature. Pass all tests between every 1~10 edits.

Constantly try to remove parameters, variables, lines, methods, classes, and
modules. If any test fails, hit Undo.

This process is great for growing code to maximize features and minimize
lines.

--
Phlip
http://www.c2.com/cgi/wiki?ZeekLand
Nov 14 '05 #8

P: n/a
Phlip wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?


Write unit tests for every feature. Pass all tests between every 1~10 edits.


That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.
Nov 14 '05 #9

P: n/a
Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


The lcc-win32 IDE will do that. Select Object file cross-reference in
the analysis menu, then look for symbols that are not referenced anywhere.

A problem with this approach is that the IDE doesn't recognize functions
that are referenced in the same file. For instance:

int foo(void)
{
// ...
}

int main(void)
{
foo();
}

The function will appear as not referenced. Besdies the IDE only handles
C programs (it is a C IDE).

http://www.cs.virginia.edu/~lcc-win32.
Nov 14 '05 #10

P: n/a
Geronimo W. Christ Esq wrote:
Phlip wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?
Write unit tests for every feature. Pass all tests between every 1~10

edits.
That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.


Do you have time and resources to debug?

You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.

The idea that automated testing requires an "infinite budget" is a myth.

(And if you indeed have a short deadline, why bother removing harmless but
unused code?)

--
Phlip
http://www.c2.com/cgi/wiki?ZeekLand

Nov 14 '05 #11

P: n/a
Jean-Claude Arbaut wrote:


Le 19/06/2005 17:49, dans 11****************@sabbath.news.uk.clara.net,
« Geronimo W. Christ Esq » <th**************@hotmail.com> a écrit :

Greg wrote:

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.

There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."


Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.

I'm not sure but "nm" could be useful here.

In times gone by, the lorder and tsort tools showed which .o files were
not used, as well as finding a single pass link order, if one exists.
Now that no one cares about the single pass link, we don't find these
tools installed automatically.
Nov 14 '05 #12

P: n/a
In article <yg*************@newssvr17.news.prodigy.com>,
Phlip <ph*******@yahoo.com> wrote:
Do you have time and resources to debug? You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests. The idea that automated testing requires an "infinite budget" is a myth. (And if you indeed have a short deadline, why bother removing harmless but
unused code?)
If you are handed a large program and told to "make it work",
then the first thing you need to do is bring it under control. Machines
are a lot faster and more accurate about matters such as which functions
are potentially callable, so it makes sense to mechanically
pre-process the code instead of going in and writing tests for
each section under the assumption that the code will be used.
One can spend endless hours trying to "fix" a routine that
isn't even needed. Overview first, -then- ensure that each
function performs its proper role in the design.

A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.

The idea that automated testing requires an "infinite budget" is a myth.


Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.

There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).

But you are absolutely right that that won't require an infinite budget --
it only requires a budget larger than is likely to be available at
any time before Homo Sapiens Sapiens die off or evolve into something
else.
--
Would you buy a used bit from this man??
Nov 14 '05 #13

P: n/a
jacob navia wrote:
Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look
recursively through a group of C/C++ source files, and allow
unreferenced function calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function
or global variable it can show you all the places where it is
referenced. It would be really nice to have a tool that would
simply list all of the referenced functions, so that you could
go through and remove them.


The lcc-win32 IDE will do that. Select Object file
cross-reference in the analysis menu, then look for symbols that
are not referenced anywhere.

A problem with this approach is that the IDE doesn't recognize
functions that are referenced in the same file. For instance:


Those shouldn't appear in the first place. They should have been
declared static and omitted from the .h file.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
Nov 14 '05 #14

P: n/a

On 19/06/2005 21:15, Tim Prince wrote:
Jean-Claude Arbaut wrote:


Le 19/06/2005 17:49, dans 11****************@sabbath.news.uk.clara.net,
« Geronimo W. Christ Esq » <th**************@hotmail.com> a écrit :

Greg wrote:
> LXR is handy for indexing source code, and for a given function or
> global variable it can show you all the places where it is referenced.
> It would be really nice to have a tool that would simply list all of the
> referenced functions, so that you could go through and remove them.

There is in fact such a tool, it's commonly called a "linker." And the
list of unreferenced code and data that it strips from a build is
usually cataloged in a file it can be directed to create. This file is
commonly called a "link map."

Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.

I'm not sure but "nm" could be useful here.

In times gone by, the lorder and tsort tools showed which .o files were
not used, as well as finding a single pass link order, if one exists.
Now that no one cares about the single pass link, we don't find these
tools installed automatically.


I didn't know they show this information, but that's true they are not very
useful nowadays. I think they are still part of the binutils package.

Nov 14 '05 #15

P: n/a
Walter Roberson wrote:
In article <yg*************@newssvr17.news.prodigy.com>,
Phlip wrote:
The idea that automated testing requires an "infinite budget" is a myth.

Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.

There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).


Then it is clear that you do not understand unit testing.

Ben
--
A7N8X FAQ: www.ben.pope.name/a7n8x_faq.html
Questions by email will likely be ignored, please use the newsgroups.
I'm not just a number. To many, I'm known as a String...
Nov 14 '05 #16

P: n/a
Walter Roberson wrote:
If you are handed a large program and told to "make it work",
then the first thing you need to do is bring it under control.
Read /Working Effectively with Legacy Code/ by Mike Feathers. He's a
consultant who routinely guides teams thru that exact situation.

A boss has spent a lot of money to build a codebase, with very little
return. Then a team must make the code valuable, without wasting more time
and effort.
Machines
are a lot faster and more accurate about matters such as which functions
are potentially callable, so it makes sense to mechanically
pre-process the code instead of going in and writing tests for
each section under the assumption that the code will be used.
One can spend endless hours trying to "fix" a routine that
isn't even needed. Overview first, -then- ensure that each
function performs its proper role in the design.

A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.
Yes, automated tools that scan code and interpret it will help. But I don't
see the relation between "Where the bugs are" and "Where control flow is
not". The principle "Ain't broke don't fix it" applies here. Dead code ain't
broke. Bugs will lead to investigation of the live code causing them.
The idea that automated testing requires an "infinite budget" is a myth.


Well, sure it is: there are only a finite number of states that
a program can be in on a given system, so the amount of testing
one has to do has a finite upper bound, not an infinite bound.


The idea that developer tests should be like quality assurance tests is also
a myth. Developer tests are little more than the scaffolding used to support
a building while you build it. Earthquake-proofing the building is an
orthogonal concern.
There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).


That's hardly an excuse not to try. The goal is _not_ "prove there are no
bugs". A math proof is, indeed, NP-incomplete. Tests can get within 99.9% of
a proof with a trivial effort. The last 0.1% is what costs so much.

The goal is "prevent 99.9% of bugs". You can get there by running tests
frequently, and hitting Undo if any test breaks, to back out the most recent
edit. That's infinitely preferrable to debugging.

--
Phlip
http://www.c2.com/cgi/wiki?ZeekLand
Nov 14 '05 #17

P: n/a
>> A problem with this approach is that the IDE doesn't recognize
functions that are referenced in the same file. For instance:


Those shouldn't appear in the first place. They should have been
declared static and omitted from the .h file.

And if you do so, most decent compilers (at least GCC does) with the
appropriate warnings enabled will find unreferenced static functions for
you.

--
Martijn
http://www.sereneconcepts.nl
Nov 14 '05 #18

P: n/a
Phlip wrote:
That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.
Do you have time and resources to debug?

You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.


I've got just under a million lines of code here that have just come
into my possession. I'd love to believe that a few minutes would allow
me to create a suite of tests proving that the program generated from
that codebase worked the same before and after any changes, but I remain
somewhat cynical.
The idea that automated testing requires an "infinite budget" is a myth.
Timescales and budgets do not presently permit me to sit down and write
tests for a huge body of code which I am not completely familiar with. I
have no doubts about the wisdom or long term benefits of doing it, but I
don't possess the resources at the moment.
(And if you indeed have a short deadline, why bother removing harmless but
unused code?)


I don't believe I've mentioned anything about a deadline. What I do have
is a limited resource to work with. I can leverage that resource better
if I can grasp the code more easily. The code can be grasped more easily
if the redundant bits of it are removed.

Nov 14 '05 #19

P: n/a
Walter Roberson wrote:

<snip> thank you for that articulate contribution, Walter.
A program such as 'cscope' can assist in finding unused functions
and in finding locations from which functions are called.


cscope is very handy (as is LXR as I mentioned before). I can indeed go
through each function manually and determine whether it is needed or
not. But I figure that the computer should be able to do that for me,
automatically. Cscope's (or LXR's) generated database contains all the
information that would be required to do that. It's just odd that no-one
has attempted to do the kind of source code profiling that I am talking
about yet, using those databases to generate lists of redundant
functions (or duplicate code).

The reason why it has to be automated is because you have to make
several passes. For example, you could come to function bar() and not
remove that because it is needed by function foo(). However, only later
would you find that function foo() is also unused. You would have to
make a second pass to remove bar(). Take that trivial example and scale
it up to a source base that has a few tens or hundreds of thousands of
functions defined within it and you can see the scale of the issue.
Nov 14 '05 #20

P: n/a
Yes, automated tools that scan code and interpret it will help. But I don't
see the relation between "Where the bugs are" and "Where control flow is
not". The principle "Ain't broke don't fix it" applies here. Dead code ain't
broke. Bugs will lead to investigation of the live code causing them.


It is a perfectly sound analysis, but if you have taken over maintenance
of a large code base it is harder to see the wood for the trees.

Nov 14 '05 #21

P: n/a
Martijn wrote:
A problem with this approach is that the IDE doesn't recognize
functions that are referenced in the same file. For instance:


Those shouldn't appear in the first place. They should have been
declared static and omitted from the .h file.


And if you do so, most decent compilers (at least GCC does) with the
appropriate warnings enabled will find unreferenced static functions for
you.

Well of course, lcc-win32 will warn you about unreferenced statics.
The problem is when you have a non-static function or variable
in some file that is not referenced in any other file.

The algorithm I used in the IDE is to look at the object files
then look at the public symbols, then make a cross reference of them.

This has the advantage of getting beyond superficial similarities
like:

file 1:
int foo(void)
{
}

file 2:
static int foo(void)
{
//...
}

int main(void)
{
foo();
}

I take advantage of the work done by the compiler and the method is
(more or less) compiler independent since the object files are
completely standardized.
Nov 14 '05 #22

P: n/a
Geronimo W. Christ Esq wrote:
Yes, automated tools that scan code and interpret it will help. But I don't see the relation between "Where the bugs are" and "Where control flow is
not". The principle "Ain't broke don't fix it" applies here. Dead code ain't broke. Bugs will lead to investigation of the live code causing them.
It is a perfectly sound analysis, but if you have taken over maintenance
of a large code base it is harder to see the wood for the trees.


Each time I did, it came with a list of bugs, and a boss to prioritize them.
I've got just under a million lines of code here that have just come
into my possession. I'd love to believe that a few minutes would allow
me to create a suite of tests proving that the program generated from
that codebase worked the same before and after any changes, but I remain
somewhat cynical.
Writing a test for each bug taught me a lot about the structure. I'm aware
there are other ways to learn, but if there's a lot of code (there was) and
a very small bug loci (there was), then spotting the dead code rapidly
became trivial.

Next, you can write a characterization test for a large codebase very
easily - but it's not really a test, just a thing to help refactoring.
Increase the logging until the program is repulsively verbose. Write 3 or 4
test cases that feed high-level data in, collect the results, collect the
logs, and compare them to golden copies. Trivial byte comparisons are fine.
Now run these test-like things after every 1~10 edits. If they fail, don't
inspect why. (The byte comparison will be _very_ hard to track back to a
failing unit.) If they fail, use Undo to return to the passing state.

If you have a big legacy codebase, and you seek to remove some code to see
what's left, you won't get far, and every mistake will add a bug. Yes, the
code looks crappy, but most of it is used, and some of it "might" be used.
Suppose your automated tool helped you remove 10% (and suppose the tool was
somehow magically better at catching mistakes than a test). You still have
90% left to deal with. I don't see the gain. You still have to climb the
mountain.
Timescales and budgets do not presently permit me to sit down and write
tests for a huge body of code which I am not completely familiar with. I
have no doubts about the wisdom or long term benefits of doing it, but I
don't possess the resources at the moment.


Don't automate writing tests for everything, or manually write tests for
everything.

Write tests for each bug. They will not prevent bugs, and they won't spot
dead code. They will, however, force test attention on the areas of the code
that have the bugs. They will document and service that area. Long term,
development will get faster and faster. Without tests, no matter how clean
the code, over time development will get slower. The benefits will appear
very soon.

--
; Phlip
http://www.c2.com/cgi/wiki?ZeekLand
Nov 14 '05 #23

P: n/a
Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


PC-Lint will list unused functions, variables and headers. A free Lint
my do the same but I do not know if that is the case.

Kevin.
Nov 14 '05 #24

P: n/a
On Sun, 19 Jun 2005 18:59:42 GMT, "Phlip" <ph*******@yahoo.com> wrote:
Geronimo W. Christ Esq wrote:
Phlip wrote:
>>Are there any scripts or tools out there that could look recursively
>>through a group of C/C++ source files, and allow unreferenced function
>>calls or values to be easily identified ?
>
> Write unit tests for every feature. Pass all tests between every 1~10

edits.

That's what I would do if I had a group of developers and six months to
do it in. Unfortunately many of us in this post-dotcom age do not work
to near-infinite budgets.


Do you have time and resources to debug?

You can leverage tests, like that, to replace many long hours of debugging
for a few short minutes writing tests.

The idea that automated testing requires an "infinite budget" is a myth.

(And if you indeed have a short deadline, why bother removing harmless but
unused code?)


For one thing, so you don't have to write unit tests for it ;-)

--
Al Balmer
Balmer Consulting
re************************@att.net
Nov 14 '05 #25

P: n/a
Kevin Bagust wrote:
LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of
the referenced functions, so that you could go through and remove them.


PC-Lint will list unused functions, variables and headers. A free Lint
my do the same but I do not know if that is the case.


Finally, an answer that I can use :) I'm very appreciative Kevin. An
initial check confirms that PC-Lint does indeed appear to do exactly
what I'm looking for. I will make some enquiries.
Nov 14 '05 #26

P: n/a
Kevin Bagust <ke**********@ntlworld.com> wrote:
Geronimo W. Christ Esq wrote:
Are there any scripts or tools out there that could look recursively
through a group of C/C++ source files, and allow unreferenced function
calls or values to be easily identified ?

LXR is handy for indexing source code, and for a given function or
global variable it can show you all the places where it is referenced.
It would be really nice to have a tool that would simply list all of the
referenced functions, so that you could go through and remove them.


PC-Lint will list unused functions, variables and headers. A free Lint
my do the same but I do not know if that is the case.


My input regarding "A free Lint"...

I have grown accustomed to my PC-Lint doing this and when a client
hesitated to purchase PC-Lint at my recommendation, I tried Splint --
a freebie. FWIW, as of my attempt ~1 year ago, it would not announce
unreferenced functions. My client purchased a LAN license for PC-Lint
and everyone is now happy.

Gimpel's FlexeLint presumably has the same features.

--
Dan Henry
Nov 14 '05 #27

P: n/a
Richard Tobin wrote:
In article <BE******************************@laposte.net>,
Jean-Claude Arbaut <je****************@laposte.net> wrote:
Got a link ? The GNU linker at least only puts symbols that are included
into the link map. No mention of it cataloging symbols it excludes.

I'm not sure but "nm" could be useful here.


Linkers typically do not exclude functions in the user program that are
unused. They only do that with libraries.

More useful would be one of the many tools that generate call graphs.

-- Richard


The Metrowerks linker, as an example, strips all unreferenced,
unexported functions from a build by default, and does so no matter
where such functions are found. What would be the point of a linker
leaving unreachable code and inaccessible data in a binary? And why
would programmers want to perform this tedious chore by hand themselves
rather than let the linker do it in a few seconds?

The algorithm to strip unused code is well-understood. All the linker
has to do is calculate the "transitive closure" for the set of
functions in the object code to be linked, that includes main(). In
fact calculating the transitive closure is no doubt how Apple was able
to add the "-dead-strip" switch to GNU's ld linker on OS X; and the
reason they did so is clear: many developers are understandably
reluctant to use a linker that bloats their final builds.

Greg

Nov 14 '05 #28

P: n/a
Greg wrote:
The Metrowerks linker, as an example, strips all unreferenced,
unexported functions from a build by default, and does so no matter
where such functions are found. What would be the point of a linker
leaving unreachable code and inaccessible data in a binary?
They just do it because the linker authors don't put sufficient priority
on dealing with the matter properly. The GNU linker for example only
garbage collects sections rather than individual functions, so if you
use one function in a large object file, the whole object will get linked.
And why
would programmers want to perform this tedious chore by hand themselves
rather than let the linker do it in a few seconds?


If the codebase is full of cruft it is harder to maintain.
Nov 15 '05 #29

P: n/a
>> The Metrowerks linker, as an example, strips all unreferenced,
unexported functions from a build by default, and does so no matter
where such functions are found. What would be the point of a linker
leaving unreachable code and inaccessible data in a binary?


They just do it because the linker authors don't put sufficient priority
on dealing with the matter properly. The GNU linker for example only
garbage collects sections rather than individual functions, so if you
use one function in a large object file, the whole object will get linked.


There may be insufficient information to TELL whether a particular
piece of a compilation is used or not. For example, no law says that
a particular machine instruction generated by the compiler can be
identified as being part of exactly one function. Functions might
share code. And the linker might not be able to TELL that functions
are sharing code.

int check1arg(char **argv)
{
int i;

i = validate(argv[1]);
/* common */
if (i == OK)
return 1;
else if (i == MAYBE)
return 0;
else
return -1;
}
int check2arg(char **argv)
{
int i;

i = validate(argv[2]);
/* common */
if (i == OK)
return 1;
else if (i == MAYBE)
return 0;
else
return -1;
}

For example, the same copy of the code below /* common */ may be
shared between check1arg() and check2arg(). And possibly, check2arg()
is unused. Can the code below /* common */ be omitted? No.
But how does the linker know this? Decompiling compiler output?
Possibly, but that seems to be a lot of extra effort.

Gordon L. Burditt
Nov 15 '05 #30

P: n/a
In article <1119212336.d15d507227f9e20746d10f76bca5f7df@teran ews>,
Ben Pope <benpope81@_REMOVE_gmail.com> wrote:
Walter Roberson wrote:
There's the small issue that current scientific thought suggests
that the Universe will not last long enough to test even fairly
trivial programs (e.g., it takes 1E21 years to test a program
with merely two 64-bit floating point numbers if the tests can be
done at 10 gigaflop).
Then it is clear that you do not understand unit testing.


Perhaps you could explain how you would "unit test" a function
that applies a chaotic formulae to a pair of doubles ?
How do you know if it is the -right- formula? If it is
used as part of an authentication process, how do you know
that there aren't any "back doors" in it that would allow
greatly reduced cost to break in?
There's a simple transform function that has been studied a
fair bit; I don't recall it's proper name. It goes like this:
If the input is even, divide it by 2.
If the input is odd, multiply it by 3 and add 1.
Loop back and repeat using the previous output as input.

As far as I know, it is an open question as to whether every
starting integer will eventually get drawn to the loop
1 -> 4 -> 2 -> 1 . Suppose, though, you had a hypothesis
that there were some values that did not get caught in that
loop, and suppose you further hypothesized that such numbers
would have some particular set of properties. In order to
"unit test" the section that tests the properties, you need
a valid input to feed the section -- but you don't know
yet what the valid inputs *are* because you haven't found
a non-looping number yet. How do you proceed?
In order to unit test without exhaustive search, you have to know some
valid inputs. Not just one either -- you need different inputs
that together cover all branch conditions. If you test small sections
in isolation without sufficient context, yuu might miss a
combination of conditions that is important, such as "sleeper" code
that only activates under particular combinations of circumstances.
You might have to back-solve a cryptographic puzzle in order to
determine whether or not a particular combination of circumstances
*can* occur.
You are, I would suggest, too closely focused on situations in
which the "right answer" is known and testable. If you are
working on scientific or mathematical problems, then you
don't always know, and the only way to test might be to
execute the code.
--
Feep if you love VT-52's.
Nov 15 '05 #31

P: n/a
Gordon Burditt wrote:
The Metrowerks linker, as an example, strips all unreferenced,
unexported functions from a build by default, and does so no matter
where such functions are found. What would be the point of a linker
leaving unreachable code and inaccessible data in a binary?


They just do it because the linker authors don't put sufficient priority
on dealing with the matter properly. The GNU linker for example only
garbage collects sections rather than individual functions, so if you
use one function in a large object file, the whole object will get linked.

There may be insufficient information to TELL whether a particular
piece of a compilation is used or not. For example, no law says that
a particular machine instruction generated by the compiler can be
identified as being part of exactly one function. Functions might
share code. And the linker might not be able to TELL that functions
are sharing code.


<snip>

The example you gave is nothing to do with linking, because a linker
never examines *within* functions to determine whether they are
redundant or not. On the other hand, the GCC compiler does (when the
optimizer is turned on) look for similar pieces of generated machine
code and "compress" them by replacing them with one copy and some pointers.

It would be very useful if the GNU linker would remove unused functions,
but at the moment it doesn't.
Nov 15 '05 #32

P: n/a
>>>>The Metrowerks linker, as an example, strips all unreferenced,
unexported functions from a build by default, and does so no matter
where such functions are found. What would be the point of a linker
leaving unreachable code and inaccessible data in a binary?

They just do it because the linker authors don't put sufficient priority
on dealing with the matter properly. The GNU linker for example only
garbage collects sections rather than individual functions, so if you
use one function in a large object file, the whole object will get linked.

There may be insufficient information to TELL whether a particular
piece of a compilation is used or not. For example, no law says that
a particular machine instruction generated by the compiler can be
identified as being part of exactly one function. Functions might
share code. And the linker might not be able to TELL that functions
are sharing code.


<snip>

The example you gave is nothing to do with linking, because a linker
never examines *within* functions to determine whether they are
redundant or not.


I didn't say it did. I said that if you have two functions compiled
in a (object) file, and one of them isn't needed, there's no guarantee that
the linker can determine what is part of the needed function (and possibly
the other one also) to keep, and what is NOT part of the needed function
(to delete).

You don't get to conclude that function A starts here, and function
B starts here, so everything between those two addresses is function
A, and none of what's between those two is also part of function
B or C, even if I'm only talking about the so-called code segment of
both functions.
On the other hand, the GCC compiler does (when the
optimizer is turned on) look for similar pieces of generated machine
code and "compress" them by replacing them with one copy and some pointers.
So in that situation, you can have functions that share code, and
object code where there is no contiguous block of code where the
linker can determine "this is function A, and all of function A, and
none of any other function".
It would be very useful if the GNU linker would remove unused functions,
but at the moment it doesn't.


The point here is that it may not have the information required to
remove unused functions even if they can be determined to be unused.
The object format may not even PERMIT passing the information required
to determine what code is part of what function(s).

Gordon L. Burditt
Nov 15 '05 #33

P: n/a
Gordon Burditt wrote:
You don't get to conclude that function A starts here, and function
B starts here, so everything between those two addresses is function
A,


I've difficulty picturing how any of the code inside a function can ever
be used in any way if the function is never invoked. I don't see how a
linker would be making an unsafe decision by removing a function that is
never invoked.

I imagine that when a compiler spots repetitive sections of code it
takes them out of the function's object code into a common area of the
object, and has the function point to them. That way redundant functions
could be safely removed.
Nov 15 '05 #34

P: n/a
[]
On the other hand, the GCC compiler does (when the
optimizer is turned on) look for similar pieces of generated machine
code and "compress" them by replacing them with one copy and some pointers.

So in that situation, you can have functions that share code, and
object code where there is no contiguous block of code where the
linker can determine "this is function A, and all of function A, and
none of any other function".

It would be very useful if the GNU linker would remove unused functions,
but at the moment it doesn't.

The point here is that it may not have the information required to
remove unused functions even if they can be determined to be unused.
The object format may not even PERMIT passing the information required
to determine what code is part of what function(s).

Gordon L. Burditt


In that case the object format should be changed :-)
Nov 15 '05 #35

P: n/a
>> You don't get to conclude that function A starts here, and function
B starts here, so everything between those two addresses is function
A,
I've difficulty picturing how any of the code inside a function can ever
be used in any way if the function is never invoked. I don't see how a
linker would be making an unsafe decision by removing a function that is
never invoked.


Given an object file containing two functions, one used and one
not, resulting from a single compilation, what makes you think that
the linker can remove anything and be sure that it has not removed
a piece of the function that *IS* used? Object file formats that
I have seen do not have labels that say this byte is part of function
a, this byte is part of function b and q, and this byte is part of
functions a, b, j, n, and z.
I imagine that when a compiler spots repetitive sections of code it
takes them out of the function's object code into a common area of the
object, and has the function point to them. That way redundant functions
could be safely removed.


And what makes you think that function1, function2, and "common area"
are labelled in a way that the linker can identify them? Sure,
the entry points are labelled. That's likely to be all the info
available.

Gordon L. Burditt
Nov 15 '05 #36

P: n/a
>> The point here is that it may not have the information required to
remove unused functions even if they can be determined to be unused.
The object format may not even PERMIT passing the information required
to determine what code is part of what function(s).

Gordon L. Burditt


In that case the object format should be changed :-)


Using that standard, can you name any object format that should
NOT be changed? One in actual use, with an actual compiler that
generates it?

Gordon L. Burditt
Nov 15 '05 #37

P: n/a
In article <11*************@corp.supernews.com>,
Gordon Burditt <go***********@burditt.org> wrote:
I didn't say it did. I said that if you have two functions compiled
in a (object) file, and one of them isn't needed, there's no guarantee that
the linker can determine what is part of the needed function (and possibly
the other one also) to keep, and what is NOT part of the needed function
(to delete).


How hard can it be?

I mean, all you have to do is solve the halting problem...
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca

[T]he program's running time will be reduced by ONE WHOLE MILLISECOND! WOW!
--Eric Sosman in comp.lang.c
Nov 15 '05 #38

P: n/a
In article <ci***************@newssvr17.news.prodigy.com>,
Phlip <ph*******@yahoo.com> wrote:
Geronimo W. Christ Esq wrote:
It is a perfectly sound analysis, but if you have taken over maintenance
of a large code base it is harder to see the wood for the trees.

Each time I did, it came with a list of bugs, and a boss to prioritize them.


Each time I have taken over a large code base, the original authors
have no longer been available; there has been no list of bugs;
it has been up to me to figure out how the program is -intended-
to work and how it -really- works; it has been up to me to
create the list of bugs; and for the most part it has been up to
me to do the talking to the users to figure out what the
priorities of the various bugs (and missing features) are; it
has been up to me to do any necessary mathematical analysis to
figure out whether the formulae are correct (especially near the
boundary conditions); and it has been up to me to do any necessary
rewriting and restructuring and optimization.

If that sounds like, "Here's a big project: Fix it!", then
yeah, there's a fair bit of truth to that.

Most people don't seem to have the knack of tearing apart a fair-sized
program and rebuilding it, so such projects get left for me. There are
a lot of good programmers who can do very nice work on constructing
-new- code or debugging something they wrote (or which there is good
documentation for); it's a different skill-set to "reengineer" large
poorly-documented programs.
--
Entropy is the logarithm of probability -- Boltzmann
Nov 15 '05 #39

P: n/a
On Thu, 23 Jun 2005 21:57:17 -0000, go***********@burditt.org (Gordon
Burditt) wrote:
The point here is that it may not have the information required to
remove unused functions even if they can be determined to be unused.
The object format may not even PERMIT passing the information required
to determine what code is part of what function(s).

Gordon L. Burditt


In that case the object format should be changed :-)


Using that standard, can you name any object format that should
NOT be changed? One in actual use, with an actual compiler that
generates it?

The object file format used on Tandem^WCompaq^WHP NonStop in TNS
(legacy) mode has completely disjoint code blocks (also data blocks),
with a copy of interroutine references sorted by target, so you need
only look at a single field to determine a routine is unreferenced.
There are still supported and used compilers (and runtimes) for at
least C, Fortran, and COBOL, and a cfront-based (less than Standard)
C++; there used to be Pascal, but I don't think it's still supported.
(The newer 'native' RISC tools are ELF, and full C++. The newest
Itanium ones I haven't seen yet.)

- David.Thompson1 at worldnet.att.net
Nov 15 '05 #40

This discussion thread is closed

Replies have been disabled for this discussion.