unit testing guidelines

Jacob

I have compiled a set og unit testing
recommendations based on my own experience
on the concept.

Feedback and suggestions for improvements
are appreciated:

http://geosoft.no/development/unittesting.html

Thanks.

Mar 17 '06

Subscribe Post Reply

5141

Jacob

Timbo wrote:

Also, I've never seen anything to indicate that random tests are any
more likely to uncover a fault than properly selected test cases.

"Properly selected" is fine. If you miss some of those (there may
be MANY remember), the random cases *may* catch them.

That's it. You are not supposed to replace any of the good stuff
you are already doing. It's just a simple tool for making the whole
package even better.

Mar 27 '06 #51

Roedy Green

On Mon, 27 Mar 2006 22:12:41 +0200, Jacob <ja***@yahoo.com> wrote,
quoted or indirectly quoted someone who said :

My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.

the other way to get coverage is to get same some tests written by
people unfamiliar with the inner workings. The will test things that
"don't need" testing.

--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Mar 27 '06 #52

Andrew McDonagh

Jacob wrote:

Adam Maass wrote:
In unit testing, you want to select several typical inputs, as well as
boundary and out-of-range inputs. This is sufficient to obtain a
general sense that the code is correct for the general case. It also
requires the test-writer to /think/ about what the boundary conditions
are. There may be several of these, at many points in the domain.
You describe an ideal world where the unit test writer thinks
of every possible scenario beforehand. In such a regime you don't
need unit testing in the first place.

My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.

There is where TDD comes in.

If we write one test at a time .
Write Just Enough Code to make the test pass.
Refactor to improve the current state of the design

We are only writing code for tests we already have. The next test is
only needed if we need to code something or to strengthen the
corner-case tests of the code that we have just made.

This way - there is no forgetting.

To make this achievable, each test case (method) should :
1) only test one aspect of the code
2) Have as few asserts as possible (1 being the best)
3) Be small (like any method) ~ 10(or what ever your favourite number
is) lines of code.
4) be fast - the faster they run, the more we run them continuosly,
the sooner we find problems.
5) Do not use/touch: Files, Networks, dbs - these are slow compared to
in memory fake data/objects.

My objection to random inputs is that unit-tests must be 100%
repeatable for every run of the test suite. I don't ever want to see a
failure of a unit test that doesn't reappear on the next run of the
suite unless something significant -- either the test case or the code
under test -- has changed.

If I have a flaw in my code I'd be more happy with a test that
indicates this *sometime* rather than *never*. Of course *always*
is even better, but then we're back to Utopia.

BTW: You can acheieve repeatability by specifying the random
seed in the test setup. My personal approach is of course to seed
with a maximum of randomness (using current time millis :-)

you might want to google 'seeding with time' to see why its not a great
idea.... especially when unit tests are concerned.

Note too that unit-testing is not black-box testing. Good unit tests
usually have pretty good knowledge of the underlying algorithm under
test.

Again you add definition to unit testing without further reference. Unit
testing is *in practice* white-box testing since the tests are normally
written by the target code developer, but it is actually beneficial to
treat it as a black-box test: Look at the class from the public API,
consider the requirements, and then try to tear it appart without thinking
too much about the code internals. This is at least my personal approach
when writing unit tests for my own code.

white box /black box.... all the same really from a testing PoV... the
only difference is how tolerable the test case is to the code design
changing. White box..not terribly tolerant. Black box...tolerant.

With TDD, its better to consider the unit tests to be 'Behavior
Specification Tests'. They are validating that the specified Behavior
exists within the code under test. But each specification test is
specifying a small part of the code under test, as we have multiple
small test cases. Not few large testcases.

For example, we have Calculator class that can Add, Subtract, Multiply &
Divide Integers.

So we'd have the following tests...

testAddingZeros()
testAddingPositiveNumbers()
testAddingNegativeNumbers()
testAddingNegativeWithPositiveNumbers()
testAddingPositiveWithNegativeNumbers();

testDividingByZero()
testDividingPositiveNumberByNegative()
.....
I Don't need to have tests for different values within the Integer Range
within each test case, as I have separate testcases for the different
boundaries. One benefit of having separate named testcases rather than
lumping them all in a single testAdd() method, is that I can write Just
Enought code to make each test pass. However, the biggest benefit comes
later when I or someone else modifies the code and one or two Named
testcase fail rather than a single test case. Immediately - with having
debug! I can see what has broken.

"typing.... run all tests ... bang!
...
testAddingNegativeWithPositiveNumbers() failed - expected -10, got -30)
"

I know I've broken the negative with Positive code somehow, but I also
know I Have Not broken any other conditions (testcases).

if all of those asserts were in one testAdd() method, then any asserts
after the one testing -10 + 20 would NOT be run, so I would know if I've
broken anything else.

This might seem like a small thing, but when your application has 1700s
unit tests, its so much easier to see whats happening quickly with this
apporach.
Now each of these test cases my end up being the same apart from the
values passed to the Calc object and the expected output.

In that case I'd do one of two things:
1) refactor the tests to use a private helper method
private void testWith(Integer num1, Integer num2, Integer expected)..

2) Apply the 'ParameterisedTestcase pattern.

Andrew

Mar 27 '06 #53

Patricia Shanahan

Noah Roberts wrote:
....

Random inputs are difficult to regenerate.

Whether or not pseudo-random inputs are difficult to regenerate depends
on the design of the test framework.

I suggest the following requirements:

1. Each pseudo-random test must support both an externally supplied seed
and a system time based seed.

2. The seed is part of the output on any pseudo-random test failure.

Given those properties, I think one can set up a test regime that gets
the benefits of random testing without the costs.

All tests in the regression test suite that is run for each code change
must be effectively non-random. That includes random tests bound to a
fixed seed. This is important, because any failure in this context
should be due to the most recent code change.

Running with system time seeds is an additional test activity. If it
finds an error, the first step towards a fix is to add the failing
test/seed combination to the regression test suite, so that it fails.

Whether the system time seed testing is considered "unit test" is a
matter of how "unit test" is defined.

Patricia

Mar 27 '06 #54

Roedy Green

On Mon, 27 Mar 2006 20:46:04 GMT, Patricia Shanahan <pa**@acm.org>
wrote, quoted or indirectly quoted someone who said :

Running with system time seeds is an additional test activity. If it
finds an error, the first step towards a fix is to add the failing
test/seed combination to the regression test suite, so that it fails.

Good thinking. It would be so frustrating to discover an error you
can't reproduce.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Mar 28 '06 #55

Adam Maass

"Jacob" <ja***@yahoo.com> wrote:

Adam Maass wrote:
In unit testing, you want to select several typical inputs, as well as
boundary and out-of-range inputs. This is sufficient to obtain a general
sense that the code is correct for the general case. It also requires the
test-writer to /think/ about what the boundary conditions are. There may
be several of these, at many points in the domain.
You describe an ideal world where the unit test writer thinks
of every possible scenario beforehand. In such a regime you don't
need unit testing in the first place.

Well, no. You still need the unit tests for regression testing purposes.
(Make a change; does the code still obey the contract on it as expressed by
its test regime? If a unit test fails, it means that the code no longer
meets it contract.)

Unit tests are also a really good /development/ aide, if you write the test
cases first. Express your preconditions and postconditions, then write the
code to make the pre- and post- conditions hold true. The test cases are
often easier to write than the code that implements the logic required by
them.
My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.
Which is why no test regime is complete if it relies solely on unit-testing.
You want to expend some effort exposing the code to novel inputs -- just to
see what happens. My argument is that these novel inputs do not belong in
/unit/ testing.

My objection to random inputs is that unit-tests must be 100% repeatable
for every run of the test suite. I don't ever want to see a failure of a
unit test that doesn't reappear on the next run of the suite unless
something significant -- either the test case or the code under test --
has changed.

If I have a flaw in my code I'd be more happy with a test that
indicates this *sometime* rather than *never*. Of course *always*
is even better, but then we're back to Utopia.

See above. No testing regime is complete if it relies solely on unit tests.
By all means, run your code through random inputs if you think it will
discover failures. But do not make it a main feature of your unit test
suite, because a unit test must be 100% repeatable from run to run. (Else
how do you know that you've really fixed any failure you've discovered?)

If other kinds of testing show a failure, by all means add that case to your
unit test suite [when it makes sense] so that it doesn't happen again.
BTW: You can acheieve repeatability by specifying the random
seed in the test setup. My personal approach is of course to seed
with a maximum of randomness (using current time millis :-)
[Unimpressed.] Yes, you *could* do that. But another important feature of a
unit-test suite should be that it is easy to run, not requiring any special
setup. In short, it shouldn't require any parameters, and yet still be 100%
repeatable from run to run. That means hard-coded inputs.

Note too that unit-testing is not black-box testing. Good unit tests
usually have pretty good knowledge of the underlying algorithm under
test.

Again you add definition to unit testing without further reference. Unit
testing is *in practice* white-box testing since the tests are normally
written by the target code developer, but it is actually beneficial to
treat it as a black-box test: Look at the class from the public API,
consider the requirements, and then try to tear it appart without thinking
too much about the code internals. This is at least my personal approach
when writing unit tests for my own code.

My experience in many different organizations is that the QA teams expect
code to be unit-tested by the developers before being turned over to QA.
Developers writing unit tests means that the unit tests are white-box, of
necessity.

Story time! Consider your reaction to a failing test case.

"Gee, that's odd. The tests passed last time..."

"What's different this time?"

"Well, I just modified the file FooBar.java. The failure must have something
to do with the change I just made there."

"But the test case that is failing is called 'testBamBazzAdd1'. How could a
change to FooBar.java cause that case to fail?"

[Many hours later...]

"There is no possible way that FooBar.java has anything to do with the
failing test case."

"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1.
I wonder how that happened?"

"Well, let's fix the code to account for the novel input..."

[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]

"Well, that's a relief... the test suite now runs to completion without
error."
These are harried, busy developers working on a codebase that has thousands
of classes, and they're under the gun to get code out the door... they cut
corners here (bad developers!) but I think we can all relate to them.

Random inputs in a unit-test case can:

1. Mislead developers when a failure suddenly appears on novel inputs. If
they aren't working on the piece of code that the random inputs test, they
have to switch gears to understand what's going on;

2. Mislead developers into believing the code is actually fixed, when in
fact it is not, when the failure disappears on the next run of the test
suite.

3. Can create an air of suspicion around the unit-test suite. (To make
errors go away, just run the suite multiple times until you get a run
without errors.)

-- Adam Maass

Mar 28 '06 #56

Ed Kirwan

Jacob wrote:

My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise.

An observation; not written in stone; a subejective view.

Ignoring TDD, no unit test ever has and no unit test ever will verify a
requirement or testify to completeness of behaviour. You seem to think
that unit testing is to help find all possible inputs for a given
behaviour; I don't think this is true.

Unit tests are regression tests.

When you introduce new feature X in an iteration 5, you write unit tests
to show some confidence that the feature works; you're not guaranteeing
it works for any subset, or for the entire range, of input
possibilities. You could easily have a flaw in the program that gives
the correct output for a given input, but for entirely the wrong reason,
as would be apparent if you used input+1; but you didn't. The unit tests
you write in iteration 5 are, in fact, a cost without a return*.

When you introduce feature Y in iteration 6 is when you see the returns
for your iteration 5 unit tests. As when you run these again, and they
all pass, then you know that whatever you did in iteration 6 didn't
break those parts of iteration 5 that seen to run before. But they still
don't guarantee that feature X is fully tested. If you missed a test in
iteration 5, then re-running the tests in iteration 6 won't help. And
you could still have that bug iteration 5. Unit testing will never
uncover it. All they do is show that whatever you did in iteration 6
didn't change much.

Think of iteration tests like a camera. Before you go on holiday, you
take a snapshot of your treasury (you do have a treasury, don't you?) so
that you can quickly identify any that's stolen. When you come back from
you holiday, the police are there saying that there's been a break-in.
You take another snapshot of your treasury and compare the two photos:
damn it, they got the Ark of the Covenant. Again.

This does not, however, show you any objects that were stolen before you
took that first photograph.

By comparison, manual testing can be seen as taking an inventory before
you go and when you come back, based on the list of items (the
requirements) that have been updated ever since you had the treasury
installed.

[*] Actually, regression testing is useful even during feature X's
design phase, so there is some benefit accrued.

--
www.EdmundKirwan.com - Home of The Fractal Class Composition.

Download Fractality, free Java code analyzer:
http://www.EdmundKirwan.com/servlet/...c-page130.html

Mar 28 '06 #57

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Jacob schreef:

My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.

This discussion about whether or not to use random inputs in tests makes
me curious: is it that important at all? The code I am working with now
uses almost no primitive types, except the occasional naming string and
perhaps an int or two. In other words, it is impossible to use random
input.

Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?

Curious,
H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEKPjRe+7xMGD3itQRAoQiAJ9levnvjpByW6AWhdGuOi n62ltqHQCffIjo
EJv4GfKu/b8p4LZI6a3gulI=
=04fH
-----END PGP SIGNATURE-----

Mar 28 '06 #58

Timbo

Hendrik Maryns wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Jacob schreef:
My experience is that you tend to "forget" certain scenarios
when you write the code, and then "forget" the exact same cases
in the test. The result is a test that works fine in normal cases,
but fails to reveal the flaw in the code for the not-so-normal
cases. This is a useless and costly excercise. Random inputs may
cover some of the cases that was forgotten in this process.

This discussion about whether or not to use random inputs in tests makes
me curious: is it that important at all? The code I am working with now
uses almost no primitive types, except the occasional naming string and
perhaps an int or two. In other words, it is impossible to use random
input.

Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?

I don't believe so. Very little of what I write can be tested
randomly.

Another problem is -- how does one determine the expected output
of a randomly generated test case? This requires the
implementation of a test oracle that reproduces the behaviour of
the code under test. If the code under test has some complex data
types that are used for efficiency, and can be replicated using
something similar, this may be useful, but more often than not,
this isn't the case.

Mar 28 '06 #59

Chris Uppal

Hendrik Maryns wrote:

Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?

A lot depends on what you are doing. (As an aside, I think a lot of
programmers underestimate how much variety there is in /other/ programmers'
typical tasks.) For some people working with, say, double[] arrays is the
norm, others would hardly ever see a primitive type except that the language
forces us to use them.

Regarding random testing, it seems to me to be a compromise forced on us by the
fact that machines have limited speed. If computers were infinitely fast then
no one would ever consider random testing -- we'd use a brute-force exploration
of the /entire/ problem space instead. Random testing is one way (only one
way) of trying for a computationally feasible approximation to that ideal. But
I don't think the idea of "exhaustive" testing even makes sense in many
contexts, so random testing doesn't make sense in those contexts either.

For instance, I have some code for manipulating string data in variety of byte
encoding (not written in Java). At one level everything's wrapped up in nice
objects, and exhaustive testing makes no sense (all possible strings ? All
possible operations on strings ??). OTOH, I need to handle byte-encodings too,
such as Java's weird not-entirely-unlike-UTF-8 byte encoding, and that is
happening (in a sense) below the level of objects. I would dearly love to be
able to run some tests on every possible sequence of Unicode characters.
Obviously that's out, but in practical terms, it would almost certainly suffice
to test all sequences up to, say, 8 characters long (in order to avoid edge
effects). But even that isn't feasible. So I plan to do exhaustive testing of
all possible sequences of 1 Unicode character, and random testing of /lots/ of
somewhat longer sequences. There will be other tests too, of course, but I
wouldn't even consider going live with code of this nature without some attempt
to test the /entire/ problem domain.

-- chris

Mar 28 '06 #60

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Chris Uppal schreef:

Hendrik Maryns wrote:
Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?
A lot depends on what you are doing. (As an aside, I think a lot of
programmers underestimate how much variety there is in /other/ programmers'
typical tasks.) For some people working with, say, double[] arrays is the
norm, others would hardly ever see a primitive type except that the language
forces us to use them.

That is what I thought.
Regarding random testing, it seems to me to be a compromise forced on us by the
fact that machines have limited speed. If computers were infinitely fast then
no one would ever consider random testing -- we'd use a brute-force exploration
of the /entire/ problem space instead. Random testing is one way (only one
way) of trying for a computationally feasible approximation to that ideal. But
I don't think the idea of "exhaustive" testing even makes sense in many
contexts, so random testing doesn't make sense in those contexts either.
Precisely. I work with mathematical formulae and automata, and there
are countably many of either, and no obvious method of creating all of
them, I would say. Hm, ok, one could just keep on adding elements here
and there, yes, but there's no point.
For instance, I have some code for manipulating string data in variety of byte
encoding (not written in Java). At one level everything's wrapped up in nice
objects, and exhaustive testing makes no sense (all possible strings ? All
possible operations on strings ??). OTOH, I need to handle byte-encodings too,
such as Java's weird not-entirely-unlike-UTF-8 byte encoding, and that is
happening (in a sense) below the level of objects. I would dearly love to be
able to run some tests on every possible sequence of Unicode characters.
Obviously that's out, but in practical terms, it would almost certainly suffice
to test all sequences up to, say, 8 characters long (in order to avoid edge
effects). But even that isn't feasible. So I plan to do exhaustive testing of
all possible sequences of 1 Unicode character, and random testing of /lots/ of
somewhat longer sequences. There will be other tests too, of course, but I
wouldn't even consider going live with code of this nature without some attempt
to test the /entire/ problem domain.

ACK.

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEKRG2e+7xMGD3itQRAnEPAJ90UjtgCYA7cskPOBIUZF aGy3nn8gCdGJC3
ZNgm7qJFmuD5WxnuUqjKnw4=
=L9Y/
-----END PGP SIGNATURE-----

Mar 28 '06 #61

Scott.R.Lemke

Jacob wrote:

Tom Leylan wrote:
Forgive me but you are terming it "fairly typical" and it isn't typical of
anything I have seen.

The most typical methods around are getters and setters which
are even less complex than the square example I used previously:

String name = getRandomString(0,1000);
A.setName(name);
assertEquals(A.getName(), name);

They are not the most interesting ones to test, but they should
still be tested, and using random input increase the test coverage.

Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion.

> Show me your assertEquals for IsPrime() for instance.

Not the best example I could come up with, but it indicates
the principle:

for (int i = 0; i < 1000; i++) {
int v1 = getRandomInt();
if (isPrime(v1)) {
for (int j = 0; j < 1000; j++) {
int v2 = getRandomInt();
if (isPrime(v2)) {
assertNotEquals(v2 % v1, 0);
assertNotEquals(v1 % v2, 0);
}
}
}
}

Again: It doesn't prove that isPrime() is correct, but it may be able
to prove that it is wrong.

It doesn't prove either. You cannot prove that it was wrong based upon
a random input, as the input might be wrong.

I have long stopped using terms like "Unit", "Black box", "System" when
referrring to test, as there are too many definitions out there.
Instead describe tests by purpose and context, and leave names out. So,
for your random test your purpose would be to test a variety of inputs,
and the context would be on a method with unknown results. By doing
that instead of pre-placing a term like "Unit" and all the
prejudice/preconceptions that come with that term, you will better get
your point across as to why you are doing a test.

Mar 28 '06 #62

Jacob

Adam Maass wrote:

Story time! Consider your reaction to a failing test case.

"Gee, that's odd. The tests passed last time..."

"What's different this time?"

"Well, I just modified the file FooBar.java. The failure must have something
to do with the change I just made there."

"But the test case that is failing is called 'testBamBazzAdd1'. How could a
change to FooBar.java cause that case to fail?"

[Many hours later...]

"There is no possible way that FooBar.java has anything to do with the
failing test case."

"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1.
I wonder how that happened?"

"Well, let's fix the code to account for the novel input..."

[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]

"Well, that's a relief... the test suite now runs to completion without
error."

Given there is an error in the baseline I'd rather have a team
of developers tracing it for hours than having a test suite that
tells me that everything is OK.

Mar 28 '06 #63

Jacob

Hendrik Maryns wrote:

This discussion about whether or not to use random inputs in tests makes
me curious: is it that important at all?

Not at all.

It was included as an issue in the guidelines of the original
post, but it has been taken out of context in a way that seem
to leave many with the impression that I think random testing
is *the* way to perform unit testing.

It is explicitly (and I will consider emphasize this) suggested
as an add-on to the conventional "typical" cases and "border"
cases to improve test coverage further.

As I have done unit testing for many years, and this simple
practice actually has helped me discover many errors, it was
included in the guidelines.

(And the discussion has been quite interesting. :-)

Mar 28 '06 #64

Roedy Green

On Tue, 28 Mar 2006 10:50:25 +0200, Hendrik Maryns
<he************@despammed.com> wrote, quoted or indirectly quoted
someone who said :

Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?

You can generate random strings.

see http://mindprod.com/jgloss/pseudorandom.html#STRINGS
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Java custom programming, consulting and coaching.

Mar 28 '06 #65

Hendrik Maryns

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Sc***********@gmail.com schreef:

Jacob wrote:
Tom Leylan wrote:
Forgive me but you are terming it "fairly typical" and it isn't typical of
anything I have seen.

The most typical methods around are getters and setters which
are even less complex than the square example I used previously:

String name = getRandomString(0,1000);
A.setName(name);
assertEquals(A.getName(), name);

They are not the most interesting ones to test, but they should
still be tested, and using random input increase the test coverage.

Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion.

Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.

H.
--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFEKl7Ie+7xMGD3itQRAoZwAJ0edqH80LGATcrH52oWi2 9CvvvJbwCfVNcA
OMsG49neRG7obAIGnMsqYCU=
=ZhaY
-----END PGP SIGNATURE-----

Mar 29 '06 #66

Scott.R.Lemke

Hendrik Maryns wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

Sc***********@gmail.com schreef:
Jacob wrote:
Tom Leylan wrote:

Forgive me but you are terming it "fairly typical" and it isn't typical of
anything I have seen.
The most typical methods around are getters and setters which
are even less complex than the square example I used previously:

String name = getRandomString(0,1000);
A.setName(name);
assertEquals(A.getName(), name);

They are not the most interesting ones to test, but they should
still be tested, and using random input increase the test coverage.

Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion.

Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.

And what if every one of your random choices fails those conditions,
and the test is never run?

The point I was trying to make is that this type of random testing is
actually a form of another type of test, often referred to as monkey
testing, and by dropping the label of "unit" or "monkey", and instead
stating the purpose and context you eliminate this whole argument.

Mar 29 '06 #67

Adam Maass

"Jacob" <ja***@yahoo.com> wrote:

Adam Maass wrote:
Story time! Consider your reaction to a failing test case.

"Gee, that's odd. The tests passed last time..."

"What's different this time?"

"Well, I just modified the file FooBar.java. The failure must have
something to do with the change I just made there."

"But the test case that is failing is called 'testBamBazzAdd1'. How could
a change to FooBar.java cause that case to fail?"

[Many hours later...]

"There is no possible way that FooBar.java has anything to do with the
failing test case."

"Ohhhh.... you know, we saw a novel input in the test case
testBamBazzAdd1. I wonder how that happened?"

"Well, let's fix the code to account for the novel input..."

[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]

"Well, that's a relief... the test suite now runs to completion without
error."

Given there is an error in the baseline I'd rather have a team
of developers tracing it for hours than having a test suite that
tells me that everything is OK.

One has to wonder about the failure in this scenario -- it is a novel input
generated by a randomness generator. If the failure were critical to the
operation of the system, (one hopes that) it would have been noted, and
probably fixed, in other, earlier test cycles. (Perhaps not a unit test...
maybe a system test run by a QA.) Since this is a new failure that has not
been fixed in earlier cycles, the behavior of the system on these novel
inputs must not be that critical. If this is the case, I'd rather have my
developers finish the work they were doing on FooBar.java than trace the
failure in testBamBazzAdd1. (Of course, in a Utopian world, they would have
the time to do both.)

Ultimately, I'd like developers to be able to use a heuristic to determine
where to look for errors when a unit-test fails. That heuristic is "The
error is almost certainly caused by some delta in the code since the last
time you ran the test suite." (Note that controlling the size of the deltas
is an issue, which is why we get recommendations to make the test suite easy
and fast to run -- so that developers aren't afraid to run the suite very
frequently.)

If the unit-test suite also contains some randomly generated inputs, then
there are two heuristics that the developers must apply to determine where
the failure is:

1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."

Deciding which of these cases applies complicates the task of the developer
when faced with a failure.
-- Adam Maass

Mar 30 '06 #68

davidrubin

Jacob wrote:

Ben Pope wrote:
Randomness just doesn't cut it, and I don't understand how you can check
the output is correct, without knowing the input.
You *do* know the input!

Consider testing this method:

double square(double v)
{
return v * v;
}

Below is a typical unit test that verifies that the
method behaves correctly on typical input:

double v = 2.0;
double v2 = square(v); // You know the input: It is 2.0!
assertEquals(v2, 4.0);

This is fine.
The same test using random input:

double v = getRandomDouble();
double v2 = square(v); // You know the input: It is v!
assertEquals(v2, v*v);
This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.
If the test fails, all the details will be in the error
report.

And this method actually *do* fail for a mjority of all
possible inputs (abs of v exceeding sqrt(maxDouble)).
This will be revealed instantly using the random approach.
This may not ever be revealed using random inputs, but in the case of
'square' this is a moot point. The contract of 'square' must stipulate
that the input (v) is invalid unless
'v * v < "max double"'. Since such inputs are invalid by the contract,
there is no point in testing them.
For an experienced programmer the limitation of square()
might be obvious so border cases are probably covered
sufficiently in both the code and the test. But for more
complex logic this might not be this apparent and throwing
in random input (in ADDITION to the typical cases and all
obvious border cases) has proven quite helpful, at least
to me.

This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.

If you can't test your function in this way, it is probably not
factored correctly.

Mar 30 '06 #69

Jacob

da********@warpmail.net wrote:

This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.
I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.
This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.

This is all correct given you are able to identify the boundary
cases up front. In some cases you are, but for more complex ones
you easily forget some in the same way you forget to handle these
cases in the original code (that's why there are bugs afterall).

Imagine implementing a tree container. In order to test correct
removal of nodes, some of the boundary cases might be:

remove root
remove intermediate node
remove leaf node
remove root when this is the only node
remove root with exactly one leaf
remove root with exactly one intermediate node
remove intermediate node with one child
remove intermediate node with many children
remove leaf node without siblings
remove leaf node with siblings
remove intermediate node with root parent
remove intermediate node with only leaf nodes
remove intermediate node with leaf nodes and other intermediate nodes
remove intermediate node with only other intermediate node children
remove non-existing node
remove null
remove node with unique name
remove node with non-unique name
etc.

The above might or might not be boundary cases, that actually depends
on the implementation: A good implementation has few! From experience
you "know" which cases are more likely to contains bugs, even
without knowing the implementation.

I don't say you shouldn't cover the boundary cases explicitly,
of course you should (see #13 in the guidelines).

But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.

Mar 30 '06 #70

Jacob

Adam Maass wrote:

1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."

Deciding which of these cases applies complicates the task of the developer
when faced with a failure.

If I add a test to your test suite that is able to reveal a flaw in your code,
you still don't want it because when it fails your developers will be confused
about what happened?

I am not sure I get it? You should all be happy you identified an error shouldn't
you? The unit test failing should be pretty clear on what went wrong anyway.

Mar 30 '06 #71

davidrubin

Jacob wrote:

da********@warpmail.net wrote:
This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.
I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.

This is why black-box tests are not entirely sufficient. You must
(especially for unit tests) use some white-box knowledge to test the
boundary conditions of both the contract and the implementation.

[snip - tree stuff] But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.

I went to Brian Kernighan's site at Princeton a while back. One of his
assignments was to implement associative arrays similar to those in
awk. Then, he provided a script generator that produces random output
(add, remove, lookup, etc). You are supposed to run this script against
both awk and your own implementation, and compare the results. So, I
think you would probably appreciate this.

Also, John Lakos' new book is due to be published later this year. In
it, he promises to address the issue of component-level testing in
great detail, including a section on random testing, which I think you
will find very interesting.

Mar 30 '06 #72

Adam Maass

"Jacob" <ja***@yahoo.com> wrote:

Adam Maass wrote:
1. "The error could be caused by a delta in the code since the last time
you ran the test suite"; or
2. "The error could be caused by an input value the test suite has
generated that we've never seen before."

Deciding which of these cases applies complicates the task of the
developer when faced with a failure.
If I add a test to your test suite that is able to reveal a flaw in your
code,
you still don't want it because when it fails your developers will be
confused
about what happened?

Let me clarify. I don't want it in the /unit/ test suite if it relies on
generation of random inputs, due to this confusion issue. If however, the
inputs are hard-coded, then the confusion issue does not apply, and I'd be
perfectly happy to have it in the unit test suite.

If there's a level of testing during which we generate random inputs to
improve the quality of the code, then that is where it belongs. If there
isn't this kind of testing already in the project, perhaps we ought to
start. It just doesn't belong in the /unit/ test suite.

I am not sure I get it? You should all be happy you identified an error
shouldn't
you? The unit test failing should be pretty clear on what went wrong
anyway.

Finding and fixing failures is, in general, a good thing, however it
happens. But a /unit/ test suite should give developers a really good idea
of where any failure originates from, and having to decide whether a failure
is due to a delta in the code under test or a novel input just overly
complicates a /unit/ test suite. The confusion issue is especially of
concern if a failure on one run of the suite simply disappears on the next
run because it didn't generate a set of inputs that causes the code to fail.
[If I saw a unit test suite with this behavior, I wouldn't have much
confidence in the value of passing all the tests -- because the next run
could just as easily produce a failure as a pass.]

Note too that there are some failures that are acceptable to tolerate, even
in shipping product. (Perhaps: It's an obscure corner case that no-one ever
actually encounters in production. It's in some subsystem that hardly anyone
uses. Or a variety of other justifications...) The critical cases should be
covered by hard-coded inputs. That leaves the non-critical cases -- and if
something non-critical fails, then it should be fixed but perhaps there are
more important things to do before it gets fixed.

-- Adam Maass

Mar 30 '06 #73

Similar topics