Practice: Test Guidelines

Practice: Test Guidelines#

Good JUnit tests verify externally observable behavior, protect meaningful rules, and remain stable under refactoring. Bad tests add volume without value: they test implementation details, trivial accessors, object existence, or repeat the code under test. When reviewing AI‑generated tests, prefer cases that catch real bugs, use descriptive names, and remain valid after reasonable refactoring.

Core Rules#

  • Test observable behavior and outcomes, not implementation details.

  • Don’t recompute or mirror the implementation in tests (that makes tests fragile).

  • Avoid tests for trivial getters/setters or plain constructors unless they enforce invariants.

  • Each test must fail for a named, meaningful bug — know what defect it protects against.

  • Prefer fewer, stronger tests (one test per behavior) over many near‑duplicate tests.

  • Mock only external or expensive dependencies; assert results, not call choreography.

  • Focus on boundary and high‑risk cases, not random variations of the same scenario.

  • Keep tests simple: minimal setup, clear Arrange / Act / Assert, and readable names.

Quick Checklist#

Use this short checklist to approve or reject individual tests (keep if all true):

  • I can explain the bug or behavior this test protects.

  • The test checks observable behavior and would fail for a meaningful reason.

  • It would still pass after reasonable refactoring of internals.

  • It is not merely asserting getters/setters or not null.

  • The test name describes the behavior or rule.

  • Mocks are limited to external/expensive dependencies.

  • Setup is minimal and focused on the behavior.

  • The test covers a relevant edge case or rule, not an arbitrary value.

If a test fails any of the above, revise it to focus on behavior or delete it.

Detailed Test Guidelines#

When looking at AI-generated tests, consider these guidelines to distinguish the gems from the rubble.


Table 1 Test behavior and outcomes, not implementation#

Good Tests

❌ Bad Tests

Verify observable behavior

Asserts private fields

Remain valid after refactoring

Break when implementation details change

Assert what the system guarantees

Assert how the system is built

Good tests verify what the system does, not how it does it.
Litmus test: “If I rewrote the method but kept the contract, would this test still pass?” If the answer is no, the test is fragile.


Table 2 Don’t restate the implementation in the test#

Good Tests

❌ Bad Tests

Assert results, rules, or constraints

Recompute the same logic as the code

Uses realistic values that exercise logic

Mirror the algorithm step‑by‑step

Can catch logic bugs

Can be wrong in the same way as the code

AI loves writing tests that simply mirror the implementation line‑by‑line.
Example smell:
If the test can be wrong in the same way as the code, it adds little value.

assertEquals(a + b, calculator.add(a, b));

Table 3 Don’t test getters, setters, or trivial constructors#

Good Tests

❌ Bad Tests

A constructor enforces invariants

obj.setX(5);
assertEquals(5, obj.getX());

A setter rejects invalid input

Only check object existence

A getter computes derived state

Prove only that Java works

These tests are almost always low signal and high maintenance.
These good tests might be worth it. Rule of thumb: If a failure would only prove that Java works as documented, don’t test it.


Table 4 Every test should fail for a meaningful reason#

Good Tests

❌ Bad Tests

Identifies a real class of bug

Fail only if code is deleted

Protect important behavior

Only fails due to formatting, timing, or structure change

Every test should answer the question: “What defect would this test catch?”
If you can’t name the bug the test guards against, it’s probably noise.


Table 5 Prefer fewer, stronger tests over many weak ones#

Good Tests

❌ Bad Tests

One test per behavior or rule

One test per method call

Multiple assertions about one scenario

Many nearly identical tests

Focus on coverage of behavior

Maximize test count only

AI tends to maximize count, not coverage.
Guideline: One test per behavior, not per method.


Table 6 Tests should read like specifications#

Good Test names

❌ Bad Test names

testWithdraw_throwsException_whenBalanceIsInsufficient

testWithdraw1

testPasswordMeetComplexityRequirements

shouldWork

A strong test explains why the behavior exists.
If the test name doesn’t describe a rule or guarantee, the test probably isn’t asserting one.


Table 7 Avoid over‑mocking and interaction obsession#

Good Tests

❌ Bad Tests

Mock only expensive or external dependencies

Verifying internal method calls

Assert outcomes, not call choreography

Asserting “method X was called once” without caring about result

AI often overuses mocks, especially verifying call counts.


Table 8 Prefer boundary cases over random variation#

Good Boundaries

❌ Bad Tests

Empty vs non‑empty

Use arbitrary or random values

Focus on high‑risk scenarios

Duplicate logic with different numbers

First, last, and beyond limits

Large datasets with no semantic meaning

AI often generates many similar tests with “different numbers.”


Table 9 Tests should be cheap to understand and maintain#

Good Tests

❌ Bad Tests

Minimal setup

Excessive setup

Clear Arrange / Act / Assert

Dense or tangled structure

Only the necessary assertions

Looks “scary”

Complex tests rot quickly—even if they are “correct.”
If a developer hesitates to update a test, it’s already too complex.


Table 10 Avoid tests that only test “existence”#

Good Tests

❌ Bad Tests

Verify object is created correctly

Assert not null

Test meaningful state

Test that something was created

Tests visible behavior after creation

Tests nothing of value

AI loves existence tests.