Rediscovering the Joy of Writing Tests
TLDR: Reduce the amount of mocks in your code and start writing tests that are robust and humans can read.
Some Background
Recently my team has taken over ownership of an existing service that was more or less in maintenance mode for a long time and has accumulated a lot of technical debt.
For various reasons this service is suddenly getting a lot of attention and work is ramping up again. The problem is that the domain of the service is quite complicated with strict legal requirements. Combined with the fact that the codebase is not in a great state and nobody in the team has been working on this service before makes it scary to change anything, especially as bugs have a high chance of causing real life consequences for customers.
So what do you do when you’’re afraid to change a codebase?
You refactor the entire thing and rewrite it from Java to Kotlin and convert it to Hexagonal Architecture
Obviously.
This probably sounds crazy, but besides the fact that I personally detest Java (let’’s not get into it...), it made sense for us as all new services in the company are using Kotlin and it also often makes the code much easier to read (e.g. data classes instead of Lombok) and safer (nullability guarantees, ...).
Also with Claude Code refactoring was a breeze because this is a straightforward task for a coding agent. It managed to do it within 2 weeks without any hiccups in production. The biggest bottleneck was the human factor as multiple PRs with 100-200 changed files needed to be reviewed. Unfortunately some changes propagated into the entire codebase (thanks lombok ...) and prevented us from splitting the changes into smaller PRs.
Apart from rewriting the service to Kotlin our approach to be able to make changes to the core logic of the service with confidence was:
- Reducing the complexity of the service by getting rid of deprecated and unused features, using kotlin and improving the package structure.
- Adding a comprehensive test-suite to validate our assumptions of how the service works and also to protect us from potential future regressions.
For obvious reasons at a bank you need to make sure that what you (or AI) does is actually working, and we do that by writing tests. So coming back to our service there was already a comprehensive test suite in place, but for someone who hasn’’t written it themself it was not easy to deduct the core domain logic of the service and if they sufficiently cover it to confidently make changes.
I think this is a very common problem and not specific to this service and the most prominent reason for it often boils down to excessive Mocking.
Consider a test like this:
@Test
fun `simple test`() {
every { repositoryA.fetchUser() } returns user1
every { serviceB.importantLogic() } returns baz
systemUnderTest()
verify { serviceC.baz() }
}
Even with this very simple example (that is often much more complicated in practice) my eyes already start to glaze over. Why do I even need to know about the specific functions systemUnderTest() calls? What I’’m most interested in can often be summarised by adhering to a certain structure:
- Given: Setup initial state
- When: The input/action that triggers some change in my system
- Then: Make sure the output of the system is what you expected.
They should basically answer the question:
If I feed in this input into my system does the output match my expectations?
Note that this is not a hard and fast rule and it always depends, sometimes you it’’s hard to avoid verifying mocks when something is not immediately observable from the outside of your system (e.g. caching). But I’’m confident that the majority of tests shouldn’’t care too about how a classes structure and dependencies look like.
Unfortunately their quality often varies widely and when you do any changes in a service there’’s always this trade-off (or even unconscious decision) of adapting to how things are done in that code base vs doing something different.
If every developer would write code or tests the way they prefer then a codebase would become unreadable really soon as there is no common ground and each file can look wildly different, whis is often a strong argument to adapt to the existing style even though you might not prefer it.
In other cases when things are slowly getting out of control you do need to rethink how things are done to improve the overall health of the codebase. Here I think it was needed and since we were rewriting the service anyway it was the perfect opportunity to change it up a little.
Properties of good tests
Before I showcase how I think we can write better tests lets first define some properties that I personally think good tests should have:
- Provide Confidence
Tests should validate that your service works as expected, not just individual components in isolation. This is especially important in a microservices world where a service is the sum of all its parts. What matters is that the whole system works, not just each individual component.
- Easy to Understand and Write
Not only should it be easy to understand what the test does and asserts, but it should also help you understand potential test failures. Good tests serve as living documentation of your system’’s behavior.
- Robust
Good tests should:
- Only break when the underlying business logic actually changes
- Trigger use cases as they would in production (e.g., through REST endpoints)
- Only assert on outputs of the system
- Avoid indeterministic behavior (e.g., if scheduled jobs are involved, trigger them manually to avoid retries)
- Fun to Write!
Yes, writing tests can actually be enjoyable! When tests are well-structured and expressive, they become a pleasure to write rather than a chore.
Implementation
Example of a bad test. A lot of the existing tests heavily rely on mocking dependencies
How can we improve it ...
The structure is immediately readable and follows the natural flow of the feature: setup (given), action (when), verification (then).
@Test
fun `user can successfully create an account`() {
given {
user exists with email "[email protected]"
}
`when` {
user creates account with name "Test Account"
}
then {
account should be created
user should receive confirmation email
}
}
Conclusion
BDD tests aren’’t a new concept, but they’’re definitely under-appreciated. They’’re essentially "component tests lite"—giving you many of the benefits of integration tests without the complexity and maintenance overhead.
The best part? They make testing enjoyable again. When your tests read like specifications and actually help you understand your system, writing them becomes less of a chore and more of a useful exercise in clarifying requirements and behavior.
Give BDD-style tests a try in your next project. You might be surprised at how much they improve both your test suite and your understanding of what your service actually does.
TODO:
- Mention how BDD helps AI
- Kotlin language features that makes these tests easier to write (string literal functions, function blocks, ...). I haven’’t had the need for frameworks
- Easy to understand, even for non developers
- what’’s important is the in & output pairs
- briefly introduce hexagonal architecture + usecases ( with a diagram ) and how it works.
- no hard a fast rules. be pragmatic about what you eventually end up mocking, the less the better in most cases
- You can bend the rules to avoid flakiness
- We tend to immediately disable any flaky tests to avoid eroding trust in tests and then regularly try to fix them all at once.
- Show how to write tests help you figure out problems (colors, debug dump, ...)
- Aggresively reduce the amout of or even hide logs in tests to get rid of information overload. You can still turn them back on and fine tune when you notice that you’’re missing someting
- often we’’re just testing if we setup the mocks correctly and not the code itself
- use real db and if possible create random entity ids so tests can run in parallel