e2e testing for gamedev

By “Automated Testing”, we mean a set of techniques which includes TDD (Test-Driven Development) and BDD (Behavior-driven Development). In this case study, I’ll focus on E2E (End-to-End Testing) in game development.

Sometimes, things go wrong.

It’s not unusual, while working on a game, to have to introduce a new global mechanic, or modify a component that had been used already in several places beforehand, or even just update the editor or part of it; and it’s not unusual, all the same, to have the game shatter over that single change.

When that happens, of course, you can’t just let it catch you with your pants down, and thus the game has to be tested, and tested, and tested, especially when several different platforms all need their own version. That does pose a problem, though: how do you check for every single line of script to be fitting within the boundaries of the interface? How do you make sure that a legitimate, solid change on one platform doesn’t result in a catastrophic crash on another?

An Automated Testing Approach: E2E Testing

In larger, more structured teams, this is all categorized as beta testing, either in-house or outsourced; but smaller teams usually have to go by hand, which can get taxing fast. Now, in other fields of programming, a way to easily and automatically check if the functioning flow of the code has been broken has been invented: the so-called “end-to-end testing“.
(there are also many other techniques game industry experts use to approach this problem, here some tips)

Unlike other automated testing approaches to these problems, end-to-end testing doesn’t really care for implementation, and only tries to ensure the software works in the loosest sense possible. It does so by imitating the exact behavior of a user working through the program, be it another program or a person. For example, taking a web application, the software will simulate a user’s action, clicking buttons and filling forms.

Going more into detail, the programmer will have to conceptually define a usage flow for their software that they consider critical, and then code a series of steps to complete it, as close as possible to real use of the program. These steps need to be as small and as simple as possible – press this button, fill that field, stuff like that.

As a short example linked to the world of the Internet, let’s assume we want to check a login procedure. The steps could then be:

Preconditions: a user with a valid username and password must exist

  • Navigate through the homepage
  • Click on the “login” button on the top-right
  • Fill the “username” field with the username
  • Fill the “password” field with the password
  • Click on the “Enter” button

Postconditions: I am logged in; I can verify there is “Welcome back, USERNAME” on the top bar; I can verify that the session has memorized by the user.

At this point, through specific frameworks, we can code a simple script that will execute these steps, by starting a browser, clicking on buttons, filling fields, and then checking the postconditions.

automated testing using e2e approach
An example of e2e automated testing with Protractor.

Is automated testing viable in game development?

This is all well and good, but a game is not a single webpage with well-defined flows that can hardly change from their preset tracks. Could this approach be used anyway? And if so, what would be needed to do so? These are the questions that popped out while developing Little Briar Rose, and whose answer is…

sorta kinda. But that needs a longer look at.

The issues start when considering what to look for. Ensuring preconditions is not too large a problem: it’s enough to give the player the stats, skills, items, or levels needed to go through with a given action, and then place them into the correct game stage. No, where things start to go awry is with postconditions. A video game is not as simple and linear as a webpage: too many variables, too many moving parts, especially on the player’s screen, all of them to be checked for any eventual error or bug.

And it goes even further: checking data can be relatively easy but making sure that the desktop is showing what it should really is not, because the interface, too, is more complex than it would be on a webpage. The scene is seldom standardized enough to be sure where each element should be, and player interaction can – and usually does – complicate the matter further. Too many elements, then. Too many possible commands, too many different decisions, and often a hefty helping of RNG, which makes the game less predictable and more enjoyable. Unless, of course, you’re trying to find a way to obtain a fixed and predictable output for testing purposes, in which case it’s not enjoyable at all.

Luckily, the situation is at least partly salvageable. While the entire game might not be fit for automated testing, some specific mechanics can manage to square the circle. Quest systems, for example, come to mind for their very precise structure, whose steps parallel the easier route of the aforementioned webpages. It’s entirely possible to check preconditions, steps, and postconditions.

Something else that is simple enough to be checked automatically is localization – or, to be more precise, how nice the translated text plays with text boxes. Is there any instance of words going out of bounds, or letters failing to be rendered? Are there any untranslated lines?

The problem is that these problems are worth checking and fixing, but they still do not make for the largest issues QA might find (and have to find). For that, what would be needed is any way to keep an eye on real-time input, like in action games, and on anything that is unlikely, if not impossible, to be repeated again in another playthrough. That would be less than easy, to use an understatement.

Not quite impossible, to be sure. One could simulate a playthrough, checking its preconditions and postconditions, by executing an exact, step-by-step set of precise and optimized actions, in order to verify if the ideal match results in any bugs or issues in its solution. This could be done by registering the sequence of inputs from a real player and then calibrating the engine to repeat those same choices, on a schedule bound to the game’s time. This would work to see if everything works as intended when the player does everything as intended; sadly, it fails to check for actions that are not ideal, which limits its use considerably. An AI might be developed to control the character and let it play “on its own” until a bug is found.

The Case Study

This is all quite interesting, but it also falls well out of the bounds of this article. Let’s look at a more related case study, instead, which was in fact applied to our game, saving us considerable time for every release. Little Briar Rose, being a graphic adventure with quests and items and mostly based on puzzles, worked particularly well with this testing system.

WARNING: this still can’t stand in for full-blown manual testing. Still, it’s some good solid support: it helps to find bugs, crashes, and other instances of clunky functions introduced by plugins or tools being updated, or new features being added, and it’s good to rapidly check different platforms or languages.

The general approach is really quite simple. We created an XML file in which we listed all the actions a player should undertake to complete a certain quest. Saves are generated to position the character in a precise game moment and the steps defined in the XML are then reproduced. The file looks something like this:

As can be seen, steps are as short and simple as possible: not “talk to the gnome”, but rather “click on given coordinates”, “click on the gnome”, “click to proceed with the dialogues”, and so on, to reproduce as faithfully as possible a player’s actions. To this, we followed with an interpreter that could simulate as much as possible these actions, as described by the file.

To conclude, we had to implement something that required a bit of forethought: we created a class to manage player inputs that were defined as “generic”, which implemented an interface we structured. This is not unusual when one wants to remap inputs with certain ease (i.e. different controllers). In our case, instead, we used it to externally “control” key pressing, thus simulating the input of a given key in a given moment, following the commands given by the interpreter for the various testing steps.

This is an example:

And then… it’s done! It’s enough to call the test runner, and voila! The game plays itself!

We don’t even need a player for our game: It plays itself!

Just to conclude this article, I’m leaving a couple of tricks to speed up and optimize testing:

  1. Speed up the time scale in Unity, so as to reduce the total game time: `Time.timeScale = 3;`. But take care! By starting the game sped up as such it can happen for unusual bugs to happen, so if you use it, apply some thought beforehand;
  2. You can activate the option `run in background` in Unity’s settings. It’s also possible to do so by code, using `UnityEditor.PlayerSettings.runInBackground = true;` ;
  3. With this method, it is possible to measure the duration of a full playtest, which is useful to get an estimate of the game’s length.

I hope you found this useful! It surely was for us: it saved us a lot of testing time, especially when talking about ports.

Did you ever try similar automated testing approaches?
How did you go about doing it?
It would be nice to hear about other experiences. Let us know!