Unit Testing Your Software: Getting it Right the First Time

Author: Guest Published: December 28, 2020 1 comment

Categories: analysis, programming, software

Tags: astropy, programming, python

Daniel Evans is a former PhD student at Keele University, where he worked on the detection of stellar companions to exoplanet host stars using high resolution imaging. He has spent the last two years as a Python software developer at JBA Risk Management, where he has worked on developing the first flood catastrophe model with global coverage. Daniel still occasionally hangs around in astronomy-related social media groups, such as the Python Users in Astronomy Facebook group.

In the world of software, unit testing is a technique aimed at ensuring that each individual unit of software (e.g., function or class) works correctly. Performing unit testing helps both in the initial creation of a piece of software by making sure that it works as designed, and also in the later maintenance of software by making sure that modifications to the code do not introduce new bugs into the existing behaviour.

The philosophy behind testing will feel quite familiar to those from a scientific background. We formulate a set of hypotheses about the behaviour of a correctly functioning piece of code, and then run a series of tests on the code to assert that the code being tested satisfies all of the hypotheses. Tests are commonly written using the Arrange-Act-Assert pattern.The three stages of this pattern are:

Arrange: An initial system state is created, usually by defining the input data.
Act: An action is performed, such as calling a function with specific inputs.
Assert: The system state is compared to an expected state.

A group of unit tests will focus on a particular unit of code – typically a function or a class – checking the behaviour both for normal situations and for edge cases of which the developer is aware.

The example below shows a short piece of Python code designed to convert the declination of a celestial object from sexagesimal format (e.g., +06 25 46.3) to decimal format (e.g., +6.429528). However, by designing some tests for the code, we will soon find out that this piece of code does not work as intended.

To test the code, we will use the Python Standard Library’s unittest module. Alternative Python test frameworks are available (for example, the Astropy Project uses pytest). The concepts of testing translate to other programming languages, and many have support for tests, either as part of the language’s standard libraries, or instead provided via third parties.

We start with the most basic test of all — a declination of 00:00:00. We don’t even need our calculator to know that we expect the output to be 0.0 degrees, and it’s difficult to see how this code could possibly get this wrong. However, future modifications could unintentionally introduce a bug, and having the test ensures that we will be able to detect that. This first test also serves to prove that our code runs without raising any exceptions.

Running unittest from the command line, we are informed that one test was run and passed successfully, as expected:

The second test uses the declination of a bright, well-known star, for which we were able to find a source with both the sexagesimal and decimal representations of its position.

On running the tests again, we get a message informing us that the new test has failed:

This has occurred because our expected value is rounded to six decimal places, and hence is not exactly equal to the output. Rather than comparing values exactly, the unittest module allows us to instead assert that two values are almost equal. After changing the code to use that assertion, and setting the precision of the comparison to six decimal places (the default is seven), the test now passes.

Next, we move to a star at negative declination. This completes a set of three categories of coordinate — equatorial, northern hemisphere, and southern hemisphere — which should encompass every celestial coordinate. With that, we will at least know that the code isn’t incorrect for a whole half of the sky.

On running the test, we are once again greeted by a failure.

However, this time, the failure isn’t trivial — the code is clearly wrong, having given an answer that is almost 2 degrees offset from the expected value (23.965 vs 22.064)! This third test has revealed our first bug: the code does not account for minutes and seconds values correctly when the degrees value is negative.

We correct the code by adding a check for a negative value in the degrees, after which the test passes again.

The next test covers the corner case that might have been overlooked. Having fixed the negative declination bug, we could have been confident that negative declinations no longer pose an issue. If we were unlucky, we could have even published results from the code before finding the second bug (I’ve seen corrections to published papers resulting from this bug), or even worse, the automated telescope we wrote the software for started operations before finding it (I’ve heard a conference speaker admit that this happened to their team’s new survey telescopes).

Let’s write another test just to be sure we’re converting negative declinations correctly. This time, for -0:30:30.

This test fails with a similar problem to the first bug — the minutes and seconds are being applied as if degrees were positive, not negative. But surely that makes no sense; we just fixed that, by explicitly checking the sign on the degrees!

Looking back at the original code, some may spot the issue: our code is checking if negative zero is less than zero, which is False, and hence the degrees value is considered to be positive. The problem is actually surprisingly difficult to solve — modern computers cannot differentiate between positive and negative zeros for integer data, although floating point numbers can differentiate between them. One possible solution is to explicitly copy the sign from the degrees value, requiring that it is input as a floating point value:

Writing a couple of unit tests has helped uncover this unanticipated complexity, and saved us from a potentially embarrassing issue. At this point, if this wasn’t an example made up just to demonstrate testing, it would be a good idea to abandon this function and use the coordinate conversion tools in Astropy. This code has already been written, tested, and benefited from years of community use to identify and fix bugs.

If you encounter a bug like this “in the wild,” and hence need to fix your code, it is good practice to add a new test before fixing the bug. This was the case when I first found this bug in my code — it was long after the code was originally written. Adding the new test helps to ensure that you have fully understood the bug, and that the bug does not unintentionally reappear after a subsequent code change.

Unit testing is not the only tool for ensuring the correctness of code. Integration tests look to make sure that the whole program works together correctly. Another common practise is the use of linting tools, such as flake8, which enforce good coding practices and highlight potentially problematic code, and can help to prevent many bugs. Raymond Hettinger’s talk, Preventing, Finding, and Fixing Bugs On a Time Budget, provides an excellent introduction to a variety of such tools, and gives helpful hints and tips on how to use them as part of your software development workflow.

Have you ever found a bug in your code that a unit test could have found earlier? Do you have any questions on how to design or run tests for your software? Leave a comment below to get the discussion started.