72 5141
Timbo wrote: Also, I've never seen anything to indicate that random tests are any more likely to uncover a fault than properly selected test cases.
"Properly selected" is fine. If you miss some of those (there may
be MANY remember), the random cases *may* catch them.
That's it. You are not supposed to replace any of the good stuff
you are already doing. It's just a simple tool for making the whole
package even better.
On Mon, 27 Mar 2006 22:12:41 +0200, Jacob <ja***@yahoo.com> wrote,
quoted or indirectly quoted someone who said : My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
the other way to get coverage is to get same some tests written by
people unfamiliar with the inner workings. The will test things that
"don't need" testing.
--
Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
Jacob wrote: Adam Maass wrote:
In unit testing, you want to select several typical inputs, as well as boundary and out-of-range inputs. This is sufficient to obtain a general sense that the code is correct for the general case. It also requires the test-writer to /think/ about what the boundary conditions are. There may be several of these, at many points in the domain. You describe an ideal world where the unit test writer thinks of every possible scenario beforehand. In such a regime you don't need unit testing in the first place.
My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
There is where TDD comes in.
If we write one test at a time .
Write Just Enough Code to make the test pass.
Refactor to improve the current state of the design
We are only writing code for tests we already have. The next test is
only needed if we need to code something or to strengthen the
corner-case tests of the code that we have just made.
This way - there is no forgetting.
To make this achievable, each test case (method) should :
1) only test one aspect of the code
2) Have as few asserts as possible (1 being the best)
3) Be small (like any method) ~ 10(or what ever your favourite number
is) lines of code.
4) be fast - the faster they run, the more we run them continuosly,
the sooner we find problems.
5) Do not use/touch: Files, Networks, dbs - these are slow compared to
in memory fake data/objects. My objection to random inputs is that unit-tests must be 100% repeatable for every run of the test suite. I don't ever want to see a failure of a unit test that doesn't reappear on the next run of the suite unless something significant -- either the test case or the code under test -- has changed.
If I have a flaw in my code I'd be more happy with a test that indicates this *sometime* rather than *never*. Of course *always* is even better, but then we're back to Utopia.
BTW: You can acheieve repeatability by specifying the random seed in the test setup. My personal approach is of course to seed with a maximum of randomness (using current time millis :-)
you might want to google 'seeding with time' to see why its not a great
idea.... especially when unit tests are concerned. Note too that unit-testing is not black-box testing. Good unit tests usually have pretty good knowledge of the underlying algorithm under test.
Again you add definition to unit testing without further reference. Unit testing is *in practice* white-box testing since the tests are normally written by the target code developer, but it is actually beneficial to treat it as a black-box test: Look at the class from the public API, consider the requirements, and then try to tear it appart without thinking too much about the code internals. This is at least my personal approach when writing unit tests for my own code.
white box /black box.... all the same really from a testing PoV... the
only difference is how tolerable the test case is to the code design
changing. White box..not terribly tolerant. Black box...tolerant.
With TDD, its better to consider the unit tests to be 'Behavior
Specification Tests'. They are validating that the specified Behavior
exists within the code under test. But each specification test is
specifying a small part of the code under test, as we have multiple
small test cases. Not few large testcases.
For example, we have Calculator class that can Add, Subtract, Multiply &
Divide Integers.
So we'd have the following tests...
testAddingZeros()
testAddingPositiveNumbers()
testAddingNegativeNumbers()
testAddingNegativeWithPositiveNumbers()
testAddingPositiveWithNegativeNumbers();
testDividingByZero()
testDividingPositiveNumberByNegative()
.....
I Don't need to have tests for different values within the Integer Range
within each test case, as I have separate testcases for the different
boundaries. One benefit of having separate named testcases rather than
lumping them all in a single testAdd() method, is that I can write Just
Enought code to make each test pass. However, the biggest benefit comes
later when I or someone else modifies the code and one or two Named
testcase fail rather than a single test case. Immediately - with having
debug! I can see what has broken.
"typing.... run all tests ... bang!
...
testAddingNegativeWithPositiveNumbers() failed - expected -10, got -30)
"
I know I've broken the negative with Positive code somehow, but I also
know I Have Not broken any other conditions (testcases).
if all of those asserts were in one testAdd() method, then any asserts
after the one testing -10 + 20 would NOT be run, so I would know if I've
broken anything else.
This might seem like a small thing, but when your application has 1700s
unit tests, its so much easier to see whats happening quickly with this
apporach.
Now each of these test cases my end up being the same apart from the
values passed to the Calc object and the expected output.
In that case I'd do one of two things:
1) refactor the tests to use a private helper method
private void testWith(Integer num1, Integer num2, Integer expected)..
2) Apply the 'ParameterisedTestcase pattern.
Andrew
Noah Roberts wrote:
.... Random inputs are difficult to regenerate.
Whether or not pseudo-random inputs are difficult to regenerate depends
on the design of the test framework.
I suggest the following requirements:
1. Each pseudo-random test must support both an externally supplied seed
and a system time based seed.
2. The seed is part of the output on any pseudo-random test failure.
Given those properties, I think one can set up a test regime that gets
the benefits of random testing without the costs.
All tests in the regression test suite that is run for each code change
must be effectively non-random. That includes random tests bound to a
fixed seed. This is important, because any failure in this context
should be due to the most recent code change.
Running with system time seeds is an additional test activity. If it
finds an error, the first step towards a fix is to add the failing
test/seed combination to the regression test suite, so that it fails.
Whether the system time seed testing is considered "unit test" is a
matter of how "unit test" is defined.
Patricia
On Mon, 27 Mar 2006 20:46:04 GMT, Patricia Shanahan <pa**@acm.org>
wrote, quoted or indirectly quoted someone who said : Running with system time seeds is an additional test activity. If it finds an error, the first step towards a fix is to add the failing test/seed combination to the regression test suite, so that it fails.
Good thinking. It would be so frustrating to discover an error you
can't reproduce.
--
Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
"Jacob" <ja***@yahoo.com> wrote: Adam Maass wrote:
In unit testing, you want to select several typical inputs, as well as boundary and out-of-range inputs. This is sufficient to obtain a general sense that the code is correct for the general case. It also requires the test-writer to /think/ about what the boundary conditions are. There may be several of these, at many points in the domain. You describe an ideal world where the unit test writer thinks of every possible scenario beforehand. In such a regime you don't need unit testing in the first place.
Well, no. You still need the unit tests for regression testing purposes.
(Make a change; does the code still obey the contract on it as expressed by
its test regime? If a unit test fails, it means that the code no longer
meets it contract.)
Unit tests are also a really good /development/ aide, if you write the test
cases first. Express your preconditions and postconditions, then write the
code to make the pre- and post- conditions hold true. The test cases are
often easier to write than the code that implements the logic required by
them.
My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
Which is why no test regime is complete if it relies solely on unit-testing.
You want to expend some effort exposing the code to novel inputs -- just to
see what happens. My argument is that these novel inputs do not belong in
/unit/ testing. My objection to random inputs is that unit-tests must be 100% repeatable for every run of the test suite. I don't ever want to see a failure of a unit test that doesn't reappear on the next run of the suite unless something significant -- either the test case or the code under test -- has changed.
If I have a flaw in my code I'd be more happy with a test that indicates this *sometime* rather than *never*. Of course *always* is even better, but then we're back to Utopia.
See above. No testing regime is complete if it relies solely on unit tests.
By all means, run your code through random inputs if you think it will
discover failures. But do not make it a main feature of your unit test
suite, because a unit test must be 100% repeatable from run to run. (Else
how do you know that you've really fixed any failure you've discovered?)
If other kinds of testing show a failure, by all means add that case to your
unit test suite [when it makes sense] so that it doesn't happen again.
BTW: You can acheieve repeatability by specifying the random seed in the test setup. My personal approach is of course to seed with a maximum of randomness (using current time millis :-)
[Unimpressed.] Yes, you *could* do that. But another important feature of a
unit-test suite should be that it is easy to run, not requiring any special
setup. In short, it shouldn't require any parameters, and yet still be 100%
repeatable from run to run. That means hard-coded inputs. Note too that unit-testing is not black-box testing. Good unit tests usually have pretty good knowledge of the underlying algorithm under test.
Again you add definition to unit testing without further reference. Unit testing is *in practice* white-box testing since the tests are normally written by the target code developer, but it is actually beneficial to treat it as a black-box test: Look at the class from the public API, consider the requirements, and then try to tear it appart without thinking too much about the code internals. This is at least my personal approach when writing unit tests for my own code.
My experience in many different organizations is that the QA teams expect
code to be unit-tested by the developers before being turned over to QA.
Developers writing unit tests means that the unit tests are white-box, of
necessity.
Story time! Consider your reaction to a failing test case.
"Gee, that's odd. The tests passed last time..."
"What's different this time?"
"Well, I just modified the file FooBar.java. The failure must have something
to do with the change I just made there."
"But the test case that is failing is called 'testBamBazzAdd1'. How could a
change to FooBar.java cause that case to fail?"
[Many hours later...]
"There is no possible way that FooBar.java has anything to do with the
failing test case."
"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1.
I wonder how that happened?"
"Well, let's fix the code to account for the novel input..."
[Make some changes, but do not add a new test case. The change doesn't
actually fix the error.]
"Well, that's a relief... the test suite now runs to completion without
error."
These are harried, busy developers working on a codebase that has thousands
of classes, and they're under the gun to get code out the door... they cut
corners here (bad developers!) but I think we can all relate to them.
Random inputs in a unit-test case can:
1. Mislead developers when a failure suddenly appears on novel inputs. If
they aren't working on the piece of code that the random inputs test, they
have to switch gears to understand what's going on;
2. Mislead developers into believing the code is actually fixed, when in
fact it is not, when the failure disappears on the next run of the test
suite.
3. Can create an air of suspicion around the unit-test suite. (To make
errors go away, just run the suite multiple times until you get a run
without errors.)
-- Adam Maass
Jacob wrote: My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise.
An observation; not written in stone; a subejective view.
Ignoring TDD, no unit test ever has and no unit test ever will verify a
requirement or testify to completeness of behaviour. You seem to think
that unit testing is to help find all possible inputs for a given
behaviour; I don't think this is true.
Unit tests are regression tests.
When you introduce new feature X in an iteration 5, you write unit tests
to show some confidence that the feature works; you're not guaranteeing
it works for any subset, or for the entire range, of input
possibilities. You could easily have a flaw in the program that gives
the correct output for a given input, but for entirely the wrong reason,
as would be apparent if you used input+1; but you didn't. The unit tests
you write in iteration 5 are, in fact, a cost without a return*.
When you introduce feature Y in iteration 6 is when you see the returns
for your iteration 5 unit tests. As when you run these again, and they
all pass, then you know that whatever you did in iteration 6 didn't
break those parts of iteration 5 that seen to run before. But they still
don't guarantee that feature X is fully tested. If you missed a test in
iteration 5, then re-running the tests in iteration 6 won't help. And
you could still have that bug iteration 5. Unit testing will never
uncover it. All they do is show that whatever you did in iteration 6
didn't change much.
Think of iteration tests like a camera. Before you go on holiday, you
take a snapshot of your treasury (you do have a treasury, don't you?) so
that you can quickly identify any that's stolen. When you come back from
you holiday, the police are there saying that there's been a break-in.
You take another snapshot of your treasury and compare the two photos:
damn it, they got the Ark of the Covenant. Again.
This does not, however, show you any objects that were stolen before you
took that first photograph.
By comparison, manual testing can be seen as taking an inventory before
you go and when you come back, based on the list of items (the
requirements) that have been updated ever since you had the treasury
installed.
[*] Actually, regression testing is useful even during feature X's
design phase, so there is some benefit accrued.
-- www.EdmundKirwan.com - Home of The Fractal Class Composition.
Download Fractality, free Java code analyzer: http://www.EdmundKirwan.com/servlet/...c-page130.html
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message
Jacob schreef: My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
This discussion about whether or not to use random inputs in tests makes
me curious: is it that important at all? The code I am working with now
uses almost no primitive types, except the occasional naming string and
perhaps an int or two. In other words, it is impossible to use random
input.
Is this such unusual? Is so much code working on ints and doubles that
it is possible to use random inputs?
Curious,
H.
--
Hendrik Maryns
================== www.lieverleven.be http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFEKPjRe+7xMGD3itQRAoQiAJ9levnvjpByW6AWhdGuOi n62ltqHQCffIjo
EJv4GfKu/b8p4LZI6a3gulI=
=04fH
-----END PGP SIGNATURE-----
Hendrik Maryns wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message
Jacob schreef:
My experience is that you tend to "forget" certain scenarios when you write the code, and then "forget" the exact same cases in the test. The result is a test that works fine in normal cases, but fails to reveal the flaw in the code for the not-so-normal cases. This is a useless and costly excercise. Random inputs may cover some of the cases that was forgotten in this process.
This discussion about whether or not to use random inputs in tests makes me curious: is it that important at all? The code I am working with now uses almost no primitive types, except the occasional naming string and perhaps an int or two. In other words, it is impossible to use random input.
Is this such unusual? Is so much code working on ints and doubles that it is possible to use random inputs?
I don't believe so. Very little of what I write can be tested
randomly.
Another problem is -- how does one determine the expected output
of a randomly generated test case? This requires the
implementation of a test oracle that reproduces the behaviour of
the code under test. If the code under test has some complex data
types that are used for efficiency, and can be replicated using
something similar, this may be useful, but more often than not,
this isn't the case.
Hendrik Maryns wrote: Is this such unusual? Is so much code working on ints and doubles that it is possible to use random inputs?
A lot depends on what you are doing. (As an aside, I think a lot of
programmers underestimate how much variety there is in /other/ programmers'
typical tasks.) For some people working with, say, double[] arrays is the
norm, others would hardly ever see a primitive type except that the language
forces us to use them.
Regarding random testing, it seems to me to be a compromise forced on us by the
fact that machines have limited speed. If computers were infinitely fast then
no one would ever consider random testing -- we'd use a brute-force exploration
of the /entire/ problem space instead. Random testing is one way (only one
way) of trying for a computationally feasible approximation to that ideal. But
I don't think the idea of "exhaustive" testing even makes sense in many
contexts, so random testing doesn't make sense in those contexts either.
For instance, I have some code for manipulating string data in variety of byte
encoding (not written in Java). At one level everything's wrapped up in nice
objects, and exhaustive testing makes no sense (all possible strings ? All
possible operations on strings ??). OTOH, I need to handle byte-encodings too,
such as Java's weird not-entirely-unlike-UTF-8 byte encoding, and that is
happening (in a sense) below the level of objects. I would dearly love to be
able to run some tests on every possible sequence of Unicode characters.
Obviously that's out, but in practical terms, it would almost certainly suffice
to test all sequences up to, say, 8 characters long (in order to avoid edge
effects). But even that isn't feasible. So I plan to do exhaustive testing of
all possible sequences of 1 Unicode character, and random testing of /lots/ of
somewhat longer sequences. There will be other tests too, of course, but I
wouldn't even consider going live with code of this nature without some attempt
to test the /entire/ problem domain.
-- chris
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message
Chris Uppal schreef: Hendrik Maryns wrote:
Is this such unusual? Is so much code working on ints and doubles that it is possible to use random inputs? A lot depends on what you are doing. (As an aside, I think a lot of programmers underestimate how much variety there is in /other/ programmers' typical tasks.) For some people working with, say, double[] arrays is the norm, others would hardly ever see a primitive type except that the language forces us to use them.
That is what I thought.
Regarding random testing, it seems to me to be a compromise forced on us by the fact that machines have limited speed. If computers were infinitely fast then no one would ever consider random testing -- we'd use a brute-force exploration of the /entire/ problem space instead. Random testing is one way (only one way) of trying for a computationally feasible approximation to that ideal. But I don't think the idea of "exhaustive" testing even makes sense in many contexts, so random testing doesn't make sense in those contexts either.
Precisely. I work with mathematical formulae and automata, and there
are countably many of either, and no obvious method of creating all of
them, I would say. Hm, ok, one could just keep on adding elements here
and there, yes, but there's no point.
For instance, I have some code for manipulating string data in variety of byte encoding (not written in Java). At one level everything's wrapped up in nice objects, and exhaustive testing makes no sense (all possible strings ? All possible operations on strings ??). OTOH, I need to handle byte-encodings too, such as Java's weird not-entirely-unlike-UTF-8 byte encoding, and that is happening (in a sense) below the level of objects. I would dearly love to be able to run some tests on every possible sequence of Unicode characters. Obviously that's out, but in practical terms, it would almost certainly suffice to test all sequences up to, say, 8 characters long (in order to avoid edge effects). But even that isn't feasible. So I plan to do exhaustive testing of all possible sequences of 1 Unicode character, and random testing of /lots/ of somewhat longer sequences. There will be other tests too, of course, but I wouldn't even consider going live with code of this nature without some attempt to test the /entire/ problem domain.
ACK.
H.
--
Hendrik Maryns
================== www.lieverleven.be http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFEKRG2e+7xMGD3itQRAnEPAJ90UjtgCYA7cskPOBIUZF aGy3nn8gCdGJC3
ZNgm7qJFmuD5WxnuUqjKnw4=
=L9Y/
-----END PGP SIGNATURE-----
Jacob wrote: Tom Leylan wrote:
Forgive me but you are terming it "fairly typical" and it isn't typical of anything I have seen.
The most typical methods around are getters and setters which are even less complex than the square example I used previously:
String name = getRandomString(0,1000); A.setName(name); assertEquals(A.getName(), name);
They are not the most interesting ones to test, but they should still be tested, and using random input increase the test coverage.
Unless of course you pass in an invalid string; too long, too short,
not unique, etc, and your setter silently fixes/fails, then because of
that your getter fails, and you get a false failure on your assertion. > Show me your assertEquals for IsPrime() for instance.
Not the best example I could come up with, but it indicates the principle:
for (int i = 0; i < 1000; i++) { int v1 = getRandomInt(); if (isPrime(v1)) { for (int j = 0; j < 1000; j++) { int v2 = getRandomInt(); if (isPrime(v2)) { assertNotEquals(v2 % v1, 0); assertNotEquals(v1 % v2, 0); } } } }
Again: It doesn't prove that isPrime() is correct, but it may be able to prove that it is wrong.
It doesn't prove either. You cannot prove that it was wrong based upon
a random input, as the input might be wrong.
I have long stopped using terms like "Unit", "Black box", "System" when
referrring to test, as there are too many definitions out there.
Instead describe tests by purpose and context, and leave names out. So,
for your random test your purpose would be to test a variety of inputs,
and the context would be on a method with unknown results. By doing
that instead of pre-placing a term like "Unit" and all the
prejudice/preconceptions that come with that term, you will better get
your point across as to why you are doing a test.
Adam Maass wrote: Story time! Consider your reaction to a failing test case.
"Gee, that's odd. The tests passed last time..."
"What's different this time?"
"Well, I just modified the file FooBar.java. The failure must have something to do with the change I just made there."
"But the test case that is failing is called 'testBamBazzAdd1'. How could a change to FooBar.java cause that case to fail?"
[Many hours later...]
"There is no possible way that FooBar.java has anything to do with the failing test case."
"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1. I wonder how that happened?"
"Well, let's fix the code to account for the novel input..."
[Make some changes, but do not add a new test case. The change doesn't actually fix the error.]
"Well, that's a relief... the test suite now runs to completion without error."
Given there is an error in the baseline I'd rather have a team
of developers tracing it for hours than having a test suite that
tells me that everything is OK.
Hendrik Maryns wrote: This discussion about whether or not to use random inputs in tests makes me curious: is it that important at all?
Not at all.
It was included as an issue in the guidelines of the original
post, but it has been taken out of context in a way that seem
to leave many with the impression that I think random testing
is *the* way to perform unit testing.
It is explicitly (and I will consider emphasize this) suggested
as an add-on to the conventional "typical" cases and "border"
cases to improve test coverage further.
As I have done unit testing for many years, and this simple
practice actually has helped me discover many errors, it was
included in the guidelines.
(And the discussion has been quite interesting. :-)
On Tue, 28 Mar 2006 10:50:25 +0200, Hendrik Maryns
<he************@despammed.com> wrote, quoted or indirectly quoted
someone who said : Is this such unusual? Is so much code working on ints and doubles that it is possible to use random inputs?
You can generate random strings.
see http://mindprod.com/jgloss/pseudorandom.html#STRINGS
--
Canadian Mind Products, Roedy Green. http://mindprod.com Java custom programming, consulting and coaching.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message Sc***********@gmail.com schreef: Jacob wrote: Tom Leylan wrote:
Forgive me but you are terming it "fairly typical" and it isn't typical of anything I have seen. The most typical methods around are getters and setters which are even less complex than the square example I used previously:
String name = getRandomString(0,1000); A.setName(name); assertEquals(A.getName(), name);
They are not the most interesting ones to test, but they should still be tested, and using random input increase the test coverage.
Unless of course you pass in an invalid string; too long, too short, not unique, etc, and your setter silently fixes/fails, then because of that your getter fails, and you get a false failure on your assertion.
Then you should have preconditions or postconditions for you setter
method which take care of that, and integrate them in the test.
H.
--
Hendrik Maryns
================== www.lieverleven.be http://aouw.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFEKl7Ie+7xMGD3itQRAoZwAJ0edqH80LGATcrH52oWi2 9CvvvJbwCfVNcA
OMsG49neRG7obAIGnMsqYCU=
=ZhaY
-----END PGP SIGNATURE-----
Hendrik Maryns wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message
Sc***********@gmail.com schreef: Jacob wrote: Tom Leylan wrote:
Forgive me but you are terming it "fairly typical" and it isn't typical of anything I have seen. The most typical methods around are getters and setters which are even less complex than the square example I used previously:
String name = getRandomString(0,1000); A.setName(name); assertEquals(A.getName(), name);
They are not the most interesting ones to test, but they should still be tested, and using random input increase the test coverage.
Unless of course you pass in an invalid string; too long, too short, not unique, etc, and your setter silently fixes/fails, then because of that your getter fails, and you get a false failure on your assertion.
Then you should have preconditions or postconditions for you setter method which take care of that, and integrate them in the test.
And what if every one of your random choices fails those conditions,
and the test is never run?
The point I was trying to make is that this type of random testing is
actually a form of another type of test, often referred to as monkey
testing, and by dropping the label of "unit" or "monkey", and instead
stating the purpose and context you eliminate this whole argument.
"Jacob" <ja***@yahoo.com> wrote: Adam Maass wrote:
Story time! Consider your reaction to a failing test case.
"Gee, that's odd. The tests passed last time..."
"What's different this time?"
"Well, I just modified the file FooBar.java. The failure must have something to do with the change I just made there."
"But the test case that is failing is called 'testBamBazzAdd1'. How could a change to FooBar.java cause that case to fail?"
[Many hours later...]
"There is no possible way that FooBar.java has anything to do with the failing test case."
"Ohhhh.... you know, we saw a novel input in the test case testBamBazzAdd1. I wonder how that happened?"
"Well, let's fix the code to account for the novel input..."
[Make some changes, but do not add a new test case. The change doesn't actually fix the error.]
"Well, that's a relief... the test suite now runs to completion without error."
Given there is an error in the baseline I'd rather have a team of developers tracing it for hours than having a test suite that tells me that everything is OK.
One has to wonder about the failure in this scenario -- it is a novel input
generated by a randomness generator. If the failure were critical to the
operation of the system, (one hopes that) it would have been noted, and
probably fixed, in other, earlier test cycles. (Perhaps not a unit test...
maybe a system test run by a QA.) Since this is a new failure that has not
been fixed in earlier cycles, the behavior of the system on these novel
inputs must not be that critical. If this is the case, I'd rather have my
developers finish the work they were doing on FooBar.java than trace the
failure in testBamBazzAdd1. (Of course, in a Utopian world, they would have
the time to do both.)
Ultimately, I'd like developers to be able to use a heuristic to determine
where to look for errors when a unit-test fails. That heuristic is "The
error is almost certainly caused by some delta in the code since the last
time you ran the test suite." (Note that controlling the size of the deltas
is an issue, which is why we get recommendations to make the test suite easy
and fast to run -- so that developers aren't afraid to run the suite very
frequently.)
If the unit-test suite also contains some randomly generated inputs, then
there are two heuristics that the developers must apply to determine where
the failure is:
1. "The error could be caused by a delta in the code since the last time you
ran the test suite"; or
2. "The error could be caused by an input value the test suite has generated
that we've never seen before."
Deciding which of these cases applies complicates the task of the developer
when faced with a failure.
-- Adam Maass
Jacob wrote: Ben Pope wrote:
Randomness just doesn't cut it, and I don't understand how you can check the output is correct, without knowing the input. You *do* know the input!
Consider testing this method:
double square(double v) { return v * v; }
Below is a typical unit test that verifies that the method behaves correctly on typical input:
double v = 2.0; double v2 = square(v); // You know the input: It is 2.0! assertEquals(v2, 4.0);
This is fine.
The same test using random input:
double v = getRandomDouble(); double v2 = square(v); // You know the input: It is v! assertEquals(v2, v*v);
This is completely broken. You can't test an implementation of 'square'
with an identical implementation. You need a separate representation
for your expected result. Otherwise, you are not testing anything.
If the test fails, all the details will be in the error report.
And this method actually *do* fail for a mjority of all possible inputs (abs of v exceeding sqrt(maxDouble)). This will be revealed instantly using the random approach.
This may not ever be revealed using random inputs, but in the case of
'square' this is a moot point. The contract of 'square' must stipulate
that the input (v) is invalid unless
'v * v < "max double"'. Since such inputs are invalid by the contract,
there is no point in testing them.
For an experienced programmer the limitation of square() might be obvious so border cases are probably covered sufficiently in both the code and the test. But for more complex logic this might not be this apparent and throwing in random input (in ADDITION to the typical cases and all obvious border cases) has proven quite helpful, at least to me.
This is also wrong. The boundaries of the input is stated in the
function's contract. It is not something determined by the user's level
of experience. Your test cases must cover the boundary conditions
stipulated by the function's documented contract *as* *well* *as*
boundary conditions based on white-box knowledge of the function's
implementation. If you cover these cases, plus a small assortment of
well-chosen "sanity" values, you don't need to waste time with large
amounts of random data.
If you can't test your function in this way, it is probably not
factored correctly. da********@warpmail.net wrote: This is completely broken. You can't test an implementation of 'square' with an identical implementation. You need a separate representation for your expected result. Otherwise, you are not testing anything.
I've already answered this in a different posting: The unit test
reflects the requirements. The requirements for square() is to
return the square of the input: v*v. From a black-box perspecitive
I don't know the implementation of square(). It can be anything.
This is also wrong. The boundaries of the input is stated in the function's contract. It is not something determined by the user's level of experience. Your test cases must cover the boundary conditions stipulated by the function's documented contract *as* *well* *as* boundary conditions based on white-box knowledge of the function's implementation. If you cover these cases, plus a small assortment of well-chosen "sanity" values, you don't need to waste time with large amounts of random data.
This is all correct given you are able to identify the boundary
cases up front. In some cases you are, but for more complex ones
you easily forget some in the same way you forget to handle these
cases in the original code (that's why there are bugs afterall).
Imagine implementing a tree container. In order to test correct
removal of nodes, some of the boundary cases might be:
remove root
remove intermediate node
remove leaf node
remove root when this is the only node
remove root with exactly one leaf
remove root with exactly one intermediate node
remove intermediate node with one child
remove intermediate node with many children
remove leaf node without siblings
remove leaf node with siblings
remove intermediate node with root parent
remove intermediate node with only leaf nodes
remove intermediate node with leaf nodes and other intermediate nodes
remove intermediate node with only other intermediate node children
remove non-existing node
remove null
remove node with unique name
remove node with non-unique name
etc.
The above might or might not be boundary cases, that actually depends
on the implementation: A good implementation has few! From experience
you "know" which cases are more likely to contains bugs, even
without knowing the implementation.
I don't say you shouldn't cover the boundary cases explicitly,
of course you should (see #13 in the guidelines).
But when that is in place I whould have built a tree on random, containing
a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on
random and performed a random (add, remove, movde, copy, whatever) operation
on those, a random number of times (0 - 10.000 perhaps) and verified that the
operation behave as expected and that the tree is always in a consistent state
afterwards. This whould leave me with the confidence that if there are
cases I've forgotten (or that appears during code refactoring) they might
be trapped by this additional test.
Adam Maass wrote: 1. "The error could be caused by a delta in the code since the last time you ran the test suite"; or 2. "The error could be caused by an input value the test suite has generated that we've never seen before."
Deciding which of these cases applies complicates the task of the developer when faced with a failure.
If I add a test to your test suite that is able to reveal a flaw in your code,
you still don't want it because when it fails your developers will be confused
about what happened?
I am not sure I get it? You should all be happy you identified an error shouldn't
you? The unit test failing should be pretty clear on what went wrong anyway.
Jacob wrote: da********@warpmail.net wrote:
This is completely broken. You can't test an implementation of 'square' with an identical implementation. You need a separate representation for your expected result. Otherwise, you are not testing anything. I've already answered this in a different posting: The unit test reflects the requirements. The requirements for square() is to return the square of the input: v*v. From a black-box perspecitive I don't know the implementation of square(). It can be anything.
This is why black-box tests are not entirely sufficient. You must
(especially for unit tests) use some white-box knowledge to test the
boundary conditions of both the contract and the implementation.
[snip - tree stuff] But when that is in place I whould have built a tree on random, containing a random number of nodes (0 - 1.000.000 perhaps), and then picked nodes on random and performed a random (add, remove, movde, copy, whatever) operation on those, a random number of times (0 - 10.000 perhaps) and verified that the operation behave as expected and that the tree is always in a consistent state afterwards. This whould leave me with the confidence that if there are cases I've forgotten (or that appears during code refactoring) they might be trapped by this additional test.
I went to Brian Kernighan's site at Princeton a while back. One of his
assignments was to implement associative arrays similar to those in
awk. Then, he provided a script generator that produces random output
(add, remove, lookup, etc). You are supposed to run this script against
both awk and your own implementation, and compare the results. So, I
think you would probably appreciate this.
Also, John Lakos' new book is due to be published later this year. In
it, he promises to address the issue of component-level testing in
great detail, including a section on random testing, which I think you
will find very interesting.
"Jacob" <ja***@yahoo.com> wrote: Adam Maass wrote:
1. "The error could be caused by a delta in the code since the last time you ran the test suite"; or 2. "The error could be caused by an input value the test suite has generated that we've never seen before."
Deciding which of these cases applies complicates the task of the developer when faced with a failure. If I add a test to your test suite that is able to reveal a flaw in your code, you still don't want it because when it fails your developers will be confused about what happened?
Let me clarify. I don't want it in the /unit/ test suite if it relies on
generation of random inputs, due to this confusion issue. If however, the
inputs are hard-coded, then the confusion issue does not apply, and I'd be
perfectly happy to have it in the unit test suite.
If there's a level of testing during which we generate random inputs to
improve the quality of the code, then that is where it belongs. If there
isn't this kind of testing already in the project, perhaps we ought to
start. It just doesn't belong in the /unit/ test suite. I am not sure I get it? You should all be happy you identified an error shouldn't you? The unit test failing should be pretty clear on what went wrong anyway.
Finding and fixing failures is, in general, a good thing, however it
happens. But a /unit/ test suite should give developers a really good idea
of where any failure originates from, and having to decide whether a failure
is due to a delta in the code under test or a novel input just overly
complicates a /unit/ test suite. The confusion issue is especially of
concern if a failure on one run of the suite simply disappears on the next
run because it didn't generate a set of inputs that causes the code to fail.
[If I saw a unit test suite with this behavior, I wouldn't have much
confidence in the value of passing all the tests -- because the next run
could just as easily produce a failure as a pass.]
Note too that there are some failures that are acceptable to tolerate, even
in shipping product. (Perhaps: It's an obscure corner case that no-one ever
actually encounters in production. It's in some subsystem that hardly anyone
uses. Or a variety of other justifications...) The critical cases should be
covered by hard-coded inputs. That leaves the non-critical cases -- and if
something non-critical fails, then it should be fixed but perhaps there are
more important things to do before it gets fixed.
-- Adam Maass This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: |
last post by:
Hi!
I'm looking for unit-testing tools for .NET.
Somthing like Java has --> http://www.junit.org
regards,
gicio
|
by: shuisheng |
last post by:
Dear All,
I was told that unit test is a powerful tool for progamming. If I am
writing a GUI code, is it possible to still using unit test?
I have a little experience in using unittest++. But...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
| |