Top-notch stateful testing in Android

Broadly speaking, in an app, state is any value that can change over time. All Android apps display state to the user, e.g. an enabled/disabled button, the text displayed in a View, etc. We expect users to modify the state by performing actions through the UI.

These actions may occur in almost any order, leading to numerous potential states, and that's what makes testing them exponentially complicated.

But what if we could have many sequences of these actions automatically generated for each test?

That is what stateful testing is for.
Stateful testing is a more sophisticated version of property-based testing where we verify that after each predefined action, the state fulfils a given requirement or property, which might depend on its previous state.

A stateful test generates thousand of those actions, semi-randomly, trying to find a combination of those actions that fails.

So let's see how we can apply stateful testing to verify state logic in an Android app. For that, we'll take as an example a text editor that accepts text input, as well as undo and redo actions.

Define the subject under test: a Text Editor

Define its state

States are usually represented in a plain Java/Kotlin class that can be unit tested. So, we need to understand what state variables the Text Editor must hold to implement undo & redo features. For instance:

TextState. Displayed text & cursor position
Undo & Redo TextStates. The next TextStates to be rendered after a user's undo/redo action. To prevent memory issues, the maximum amount of undo & redo states is limited by a buffer size.

We could model it like this:

data class TextEditorModelState(
    val bufferSize: Int,
    val textState: TextState = TextState(),
    val undoTextStates: Stack<TextState> = 
          CircularBuffer(bufferSize),
    val redoTextStates: Stack<TextState> = 
          CircularBuffer(bufferSize),
) {

    data class TextState(
        val displayedText: String = "",
        val cursorPosition: Int = 0,
    )
    ...
)

A CircularBuffer is a LIFO (a queue) that in case of overflow, it kicks out the first element (i.e. the oldest) before pushing a new one.

Define the actions

After specifying a state class, still remains to define:

the action types the user can perform to modify it.
when each action type can be executed (i.e. preconditions), if not always.
the expected state after every action type (i.e. postconditions). These define the actual requirements.

Let's do it.

TextChange: Add/delete any text via keyboard input or paste/cut options.

State variable	Preconditions	Postconditions
Current `TextState`	–	reflects the new text and cursor position
`UndoTextStates` size	–	increases by one, up to buffer size
`RedoTextStates` size	–	empty afterwards (it is cleared)

Undo: Click on the undo action, if enabled

State variable	Preconditions	Postconditions
Current `TextState`	–	last value pushed into `undoTextStates`
`UndoTextStates` size	greater than zero -> undo is enabled	size decreased by one
`RedoTextStates` size	–	increased by one, up to buffer size (it lets redo this "undo action")

Redo. Click on the redo action, if enabled. We'll omit it here since it is analogue to undo.

So those are the requirements of our Text Editor! Its skeleton would look like this in code

class TextEditor(val bufferSize: Int = DEFAULT_BUFFER_SIZE) {

    private var modelState = TextEditorModelState(bufferSize)

    fun undo() { modelState = modelState.copy(...) }

    fun redo() { modelState = modelState.copy(...) }

    fun textStateChange( 
        newText: String, 
        cursorPosition: Int, 
    ) {         
        modelState = modelState.copy(...) 
    }

    fun getModelState(): TextEditorModelState = modelState.copy() 
}

You can find all the implementation details here.

How to use this TextEditor in our Android app? Easy. Make the ViewModel delegate the textStateChange(), undo() and redo() actions to the TextEditor. In doing so, its state becomes lifecycle aware. For more details, see the code here

Unit testing the Text Editor

In most cases, I strongly recommend that you write example unit tests before any stateful test. Stateful tests do not replace them, but complement them.

For instance, we could have a unit test for each of the following sequences

Action sequence	Expected result
Text change 21 times (buffer size is 20)	only 20 undo actions possible
Text change twice, undo twice, redo once	redo enabled, 1 redo action possible
Text change thrice, undo once, text change once	redo disabled

Such unit tests would be readable and easy to understand codewise. However, those tests have the following drawbacks:

only verify that the code is correct for those exact sequences. That may be sufficient in most cases though.
it is hard to ensure that they have covered every single relevant scenario. Users can perform text changes, undoes, and redoes in nearly any random order.

That's why we could mitigate these problems and gain confidence in our code by adding stateful tests to our already existing unit tests.

Unfortunately, neither Junit4 nor Junit5 supports stateful testing out of the box. Thus, we'll see how to use Jqwik testing library for stateful testing in the next examples.

Enabling Jqwik in Gradle for an Android project requires to configure kotlinOptions and testOptions. Check this out for a configuration that enables running all Jqwik, Junit4 & Junit5 tests together.

So let's see how to write some stateful tests!

As of 31st August 2022, Jqwik is the best testing library for Java/Kotlin that supports stateful testing, and its syntax is very similar to Junit5. Kotest, which is a popular Kotlin multi-platform testing library, also has plans to support stateful testing in the future!

Stateful testing the Text Editor with Jqwik

Implement the actions

At this point, we've already defined the action types and the expected state after executing them. The first step to write a stateful test is to translate that into code. For that, Jqwik requires us to do the following for each action type:

Inherit from Jqwik's Action<StateHolder> class.
override precondition(state: StateHolder). It defines when the action can be generated if any constraint applies.
override run(state: StateHolder). Here we:
1. Save the state before the action, i.e. the previous state
2. Perform the action that changes the state
3. Assert the new state, based on the previous state (i.e. assert the postcondition)

This looks like this for the TextChangeAction

class TextChangeAction( 
    private val newText: String,
    private val cursorPosition: Int, 
) : Action { 
    override fun run(state: TextEditor): TextEditor { 
        // 1. save previous state 
        val previousUndoTexts =   
            state.getModelState().copy().undoTextFieldStates   

        // 2. perform action 
        state.textStateChange(newText, cursorPosition)

        // 3. assert new state 
        expectThat(state.getModelState()) { 
            displayedTextEquals(newText)  
            undoActionsSizeIncreasedByOneUpToMax(
                previousActionsSize = previousUndoTexts.size,
                maxUndoActionsSize = state.bufferSize,
            )
            redoActionsSizeEquals(0)
        }

        return state
    }

    // override for more readable logs in the Jqwik report
    override fun toString(): String =
         "TextChangedAction($newText)"  
}

Observe that the assertions happen in the Action classes, and not in the test itself, which we'll write later.

UndoAction & RedoAction are pretty similar, with the peculiarity that they hold a precondition: we don't want to generate an undo/redo action if there is nothing to undo/redo. For that to happen, we also have to override the precondition(state: StateHolder) method, e.g. for the UndoAction:

class UndoAction : Action { 
    override fun precondition(state: TextEditor): Boolean =    
        state.getModelState().undoTextFieldStates.isNotEmpty()

    override fun run(state: TextEditor): TextEditor {
        // 1. save previous state
        val previousRedoTexts =  
            state.getModelState().copy().redoTextFieldStates

        val previousUndoTexts =
            state.getModelState().copy().undoTextFieldStates

        // 2. perform action
        state.undo()

        // 3. assert new state
        expectThat(state.getModelState()) {
            displayedTextEquals(   
                previousUndoTexts.peek().displayedText        
            )
            undoActionsSizeEquals(previousUndoTexts.size - 1) 
            redoActionsSizeIncreasedByOneUpToMax(
                previousActionsSize = previousRedoTexts.size,
                maxRedoActionsSize = state.bufferSize,
            )
        }
        return state
    }

    // override for more readable logs in the Jqwik report
    override fun toString(): String = "UndoAction"
}

RedoAction is analogue, so we left it out for brevity. However, you can find the code here

Implement the random action generator

Now that we've defined the actions, we'll generate a semi-random sequence of them. This involves:

Generating Arbitraries for each action.
Generating a sequence of those arbitrary actions.

Regarding the generation of Arbitraries for each action, the least simple to implement is the Arbitrary<TextChangeAction>. Such TextChangeAction contains an arbitrary text and cursor position i.e. TextState, so we need to generate an Arbitrary<TextState> first. Notice that the cursor position must be within the text length. We achieve that like this

private fun arbitraryTextState(): Arbitrary { 
    // low end of range to reduce the generation time 
    val textLengthRange = IntRange(1, 20)
    val arbText = Arbitraries.strings().ofLength(textLengthRange)
    val arbCursorPosition = arbText.flatMap { text ->    
        Arbitraries.integers().between(0, text.length) 
    }
    return Combinators.combine(arbText, arbCursorPosition).`as`{
        text, cursorPosition -> TextState(text, cursorPosition)
    }
}

Use flatMap & map to create Arbitraries that depend on other Arbitraries.
Use combine to create Arbitrary objects composed of other Arbitrary objects

And now we use arbitraryTextState() to generate Arbitrary<TextChangeAction>

private fun arbitraryTextChangeAction()
: Arbitrary<TextChangeAction> = 
    arbitraryTextState().map { 
        TextChangeAction(it.displayedText, it.cursorPosition) 
    }

Finally, the method to generate a semi-random sequence of all actions we've defined previously

@Provide
fun arbitraryTextEditorActionSequence() =
    Arbitraries.sequences( 
        Arbitraries.oneOf(
            arbitraryTextChangeAction(),
            Arbitraries.of(RedoAction()),
            Arbitraries.of(UndoAction()),
        )
)

@Provide tells the Jqwik test engine that this method generates Arbitrary values used as arguments in tests annotated with @Property

Writing the stateful test itself

Now comes the simplest step: writing the test itself. This requires just a few lines.

@Property 
fun executeRandomSequenceOfActions_textEditorModelStateIsCorrect(
    @ForAll("arbitraryTextEditorActionSequence") 
    actionSequence: ActionSequence<TextEditor>
){ 
    actionSequence.run(TextEditor()) 
}

@ForAll indicates the method used to generate Arbitraries of the corresponding type. Such methods must be annotated with @Provide

Analysing errors: the importance of shrinking

Imagine that there is a bug in our production code. For instance, we forgot to reset the RedoTextStateActions after every TextStateChangeAction in the TextEditor.

If we run our test, the error would be detected. The test report would show us the original and the shrunk sample generated by Jqwik. Something like this

Group 56.png

The original sample sequence looks overcomplicated. That's because the number of actions as well as the input parameters of each action (like in TextChangeAction, which contains non-ascii strings) are generated randomly! In the original example, 9 actions were generated, but 30 or more could have been generated.

That's why shrinking is such a useful feature in stateful tests: The shrunk sample is the simplest sample that causes the test to fail for the same reason. In this case, it is the sequence TextChangeAction, UndoAction, and TextChangeAction.

This means, we could add the shrunk sample as example-based test (i.e. standard unit test) and the test would fail! This approach is very useful for bug fixing & avoiding regressions. The stateful test generates new values on every run, but the example-based tests we've added for each failing sample ensure that the bugs we've fixed do not reappear without being noticed.

So we can fix the bug, run the tests again and we're done!

but are we actually done?

Cover the uncovered: statistics on generated values

We know that we've tested our Text Editor state under a semi-random sequence of actions, but we do not know how many actions and of which type it generated.
This is important because we want to make sure, for example, that the Text Editor respects its buffer size, which means getting the Text Editor into a situation where the size of 'UndoTextStateActions' could be larger than its buffer size.

Jqwik provides support to add statistics about the semi-randomly generated values. We can use them to check how often a use case has been tested.
Nevertheless, what makes it very valuable, is that we can make the test fail if a use case is not covered.

Back to our Text Editor, there are some scenarios we want to make sure we cover with the stateful tests. For the undoTextStates, that would be

It reached the buffer size, "undo at max"
Any value under the buffer size, "undo in between".

So we need to collect the corresponding statistics. Here is how to do it with Jqwik

fun collectStatsUndo(textEditorModelState: TextEditorModelState) {
    val bufferSize = textEditorModelState.bufferSize
    val undoStatesSize =
        textEditorModelState.undoTextFieldStates.size
    val reachedBufferSize = undoStatesSize == bufferSize
    val statistics = 
        if (reachedBufferSize) "undo at max" else "undo in between"

    Statistics.collect(statistics)
}

And then collect those statistics in our test, making it fail if any case was not covered

@Property
fun executeRandomSequenceOfActions_textEditorModelStateIsCorrect(                  
    @ForAll("arbitraryTextEditorActionSequence")
    actionSequence: ActionSequence,
){
    // "peek" accesses the internal state
    // after each successful execution of an action’s run(..) 
    actionSequence.peek { textEditor -> 
        collectStatsUndo(textEditor.getModelState())
    }.run(TextEditor())

    Statistics.coverage { checker -> // if predicate not met, fail!
        checker
            .check("undo in between")
            .count(Predicate { times -> times > 0 })

        checker
            .check("undo at max")
            .count(Predicate { times -> times > 0 })
    }
}

After running this test a few times (or just once), we likely get the following result...

In other words, the previous test verifies the correctness of our text editor for a random sequence of add/delete text, undo & redo actions, but it is uncertain whether it also covers the use case in which the undo buffer reaches its maximum, and a new TextChangeAction is executed.

Keep in mind that, the higher the buffer size, the more unlikely it is to cover that scenario with stateful tests.

So, how to solve this issue? We got 2 options:

Create an additional action that leads directly to that case, a TextChangeSequenceOverBufferSizeAction which executes at least bufferSize + 1 text changes in a row, e.g. 21 text changes if the buffer size is 20. Check out its implementation here
Add a separate property test that covers that special case, or even an example-based test. The property test has the advantage that we can use statistics to make it fail if this case is not covered, and the disadvantage of running more slowly.

I leave this as a task for the reader, but you can check the code for both solutions in this link, under @Group inner class ForceBufferAtMax

Conclusions

Stateful tests are property-based tests applied to state holder classes. They cover hundreds of sequences of actions on every test run. Some of those sequences might include relevant cases that we've forgotten to validate in our unit tests. As a result, they give us much more confidence in the correctness of our state logic.

On the other hand, they are more generic. This makes stateful tests less readable than standard unit tests. And since they generate thousand of sequences on every run, the test execution takes longer.

My advice is to complement the unit tests of such components with stateful tests if any of the following applies

Its functioning is critical for the business.
It's going to be reused in several parts of the app (e.g. belongs in a shared module).

And last but not least, some extra recommendations when writing stateful tests:

whenever a stateful test fails, write a unit test for the shrunk sample. This helps avoid regression bugs.
use statistics to ensure that hard-to-generate state transitions are covered.

Repo with samples and further reading

Multiplying the quality of your unit tests: The repo showcasing an android project containing all these examples and much more
Tech-talk "Writing bulletproof code with property-based testing" at Droidcon Lisbon 2022 & Droidcon Berlin 2022.
- Video at Droidcon Berlin 2022.
- Slides of the tech-talk.
Blog post on stateful testing with Jqwik by Jqwik's creator, Johannes Link.

Other blog posts of this series on unit testing and property-based testing:

Interested in Screenshot testing?
Check out my series "The road to effective snapshot testing on Android"

Top-notch stateful testing in Android

using Jqwik testing library

Table of contents

Define the subject under test: a Text Editor

Define its state

Define the actions

Unit testing the Text Editor

Stateful testing the Text Editor with Jqwik

Implement the actions

Implement the random action generator

Writing the stateful test itself

Analysing errors: the importance of shrinking

Cover the uncovered: statistics on generated values

Conclusions

Repo with samples and further reading

Top-notch stateful testing in Android

using Jqwik testing library

Table of contents

Define the subject under test: a Text Editor

Define its state

Define the actions

Unit testing the Text Editor

Stateful testing the Text Editor with Jqwik

Implement the actions

Implement the random action generator

Writing the stateful test itself

Analysing errors: the importance of shrinking

Cover the uncovered: statistics on generated values

Conclusions

Repo with samples and further reading

Did you find this article valuable?