There's a particular kind of engineer who argues about test-driven development at conferences, then goes home and writes no tests. This is the worst mix of dogma and practice — defending red-green-refactor purity while shipping untested code. Somehow, this dominates TDD discussions in our industry.

Let's be honest.

Strict TDD — write the failing test first, write the minimum code to pass, refactor, repeat, never skip the order — has real merits. It forces you to think about interfaces before implementation, produces high test coverage, provides a constant feedback loop, and results in smaller, decomposable code. These merits are worth wanting.

But the downsides are real. Strict TDD is slow during exploration phases when you don't know what you're building. Tests written before design stabilizes often test the wrong things, get discarded, and cost you twice. Writing failing tests first works better in well-understood domains (parsers, business logic, math-heavy code) and worse in emergent ones (UI work, exploratory data pipelines, external system integration).

Almost nobody who claims to do TDD actually does strict TDD. Good engineers write tests close to writing code, not necessarily before, and prioritize tests where being wrong is costly. This "tests-as-you-go" or "test-first-ish" approach captures most of strict TDD's value without the dogma.

Here's my take on test discipline.

What to test heavily. Anything where being wrong is expensive. Business logic calculating money. State machines that must be correct. Data transformations with uncontrolled inputs. Edge cases in date/time handling, character encoding, currency conversion. Boundary conditions (pagination, rate limits, retries). Code behind an API contract observable to customers. Authorization handling. Cryptographic code (which you shouldn't write, but if you do, test it).

For these, write tests. Before, after, during — order matters less than existence. Aim for branch coverage, not line coverage; interesting bugs hide in untaken branches.

What to test lightly. Integration glue. UI rendering (a snapshot test or two suffices). Code where tests re-assert obvious behavior. Code to be discarded in the next release. Configuration and wiring code where production deploy success matters more than unit tests.

What to barely test at all. Throwaway prototypes. Spikes. Code you'll delete next week. The Bubble version of your MVP. Anything where being wrong means "we throw it away and learn." Test fewer of these than your conscience suggests; you're trading test-writing time against learning velocity, and learning is the asset at the prototype stage.

Integration tests are underrated. Bugs that hurt in production often occur at seams — between services, modules, the app and the database. Unit tests don't catch these. Teams investing in integration tests (real database, real network calls to test doubles, real end-to-end) have dramatically fewer production incidents than those who don't, regardless of unit test coverage. Integration tests cost more per test but offer higher value per test, favoring investment.

End-to-end tests are usually overrated. Selenium suites, Playwright suites exercising the whole UI, big tests taking 20 minutes to run — they're seductive because they seem comprehensive, but they're slow, flaky, expensive to maintain, and produce ambiguous failures. ("The login button is broken" — is it the JavaScript, auth service, database, CSS?) Have a few true E2E tests for critical paths (sign-up, checkout, main user flow). Don't aim for full E2E coverage. The math doesn't work.

Property-based testing is the secret weapon nobody uses. Tools like Hypothesis (Python), QuickCheck (Haskell, ported to many languages), or fast-check (JavaScript) generate test inputs across a range of property-defined possibilities and find counter-examples. For algorithmic code, this catches bugs example-based tests will never catch. The investment to learn it is one afternoon. The payoff is enormous on the right code. Mostly unused in our industry. Try it.

Test brittleness is a real cost. Tests that break with every refactor hold your code back, not protect it. Tests should assert behavior, not implementation. A test mocking the database, cache, logger, and queue tests whether the code calls the mocks as expected. That's not behavior testing. It's testing one possible implementation. Refactor the implementation, and you must rewrite the test. After a few cycles, the team stops refactoring because it breaks tests, and the codebase ossifies. This is one of the most insidious failure modes in a tested codebase.

Fix this by asserting observable behavior — output, state, external side effects. Avoid mocking inside your own bounded context; mock only at the boundary (third-party API, queue, email service). Use real instances of your classes wherever possible.

Speed matters more than people admit. A test suite taking 20 minutes to run is one people skip. Engineers will run "just the relevant tests" before pushing. CI will run the full suite, but a 20-minute CI loop means PR reviews happen in batches and engineering velocity craters. Aim for a unit suite under 30 seconds, an integration suite under three minutes, an E2E suite under ten. If you're over these numbers, fixing test suite speed is one of the highest-leverage things an engineering team can do.

Flaky tests are bugs. Every time a test fails intermittently and someone re-runs CI to make it pass, the team trains itself to ignore test failures. Eventually, a real test failure gets re-run and ignored too. Treat flaky tests like production incidents — investigate, root-cause, fix. If you can't fix it, delete the test. A deleted test is better than a flaky test, because a deleted test forces you to acknowledge what you don't have, while a flaky test gives you false confidence.

Coverage numbers are vanity. A team with 90% line coverage and bad tests has worse quality than a team with 40% coverage and good tests. Don't set coverage targets. Set "we test the things that matter, and we know which things matter" as the bar. This is harder to measure and far more useful.

AI-generated tests are uneven. Claude and GPT can write tests now, and they're often pretty good — especially at filling in the obvious cases. The risk is they're too easy to generate, so teams create thousands of tests that mostly assert nothing meaningful and dilute the signal. Use AI tools to draft test scaffolding, then seriously review what they wrote. The fact that there's a test is not the same as the fact that the test is testing something useful.

Finally: the cultural part. Engineers who care about testing tend to ship more reliable software, and engineers who don't tend to ship more bugs. This is true on average. But it is not true that the most rigorous testers are the best engineers. The best engineers I've worked with have nuanced views about what to test, when to test, and what testing costs. The worst-tested codebases I've seen come from one of two places: engineers who don't write tests at all because they "know their code works" (this is fragile arrogance), or engineers who test everything with full mock pyramids and end up with codebases that are change-resistant (this is fragile rigor). The right disposition is somewhere in the middle: tests are a tool, the tool has costs, use the tool when the cost is worth it.

Write tests. Write fewer than the dogma demands, more than your instincts suggest, and target them at the places where being wrong is expensive. That's TDD in practice. The theory is fine. The theory will not ship the product.