Callisto – Selecting Effective Mutation Operators for Mutation Testing

Mutation Testing

Mutation testing is a software strategy that works by deliberately introducing bugs to the source code, creating mutant versions of the program. Mutants are generated by mutation operators, which mutate a syntax token in a specific way. These mutants are then tested using the existing test suite. If the tests fail, then the mutation was detected and is now deemed killed. If the tests pass then the mutation has gone undetected and has survived. The more mutants are killed, the better the test suite is at detecting bugs in the source code. The mutation score is the percentage of mutants that is killed, and indicates the effectiveness of the test suite. Mutation testing therefore does not test the software directly, but rather the tests.

Performance Problems

The main reason why mutation testing is not widely applied in industry is because of its high performance cost. Every mutant requires up to a full run of the test suite, and there are hundreds to thousands of mutants possible, depending on the program size.

Several techniques have been developed to speed up mutation testing. This includes more efficient ways to generate and test mutants, for example by using coverage information or compiling multiple mutants into one program and activating them using code flags. For this project we focus on selective mutation, which reasons that too many mutants are generated and tries to select a subset of mutants such that mutation testing is still effective for judging a test suite.

Mutation Levels

This project introduces the use of mutation levels as a technique to speed up mutation testing. A mutation level is a subset of the available mutation operators, such that, when used, fewer mutants will be generated and thus fewer executions of the test suite will be needed during testing. Mutation operators are selected based on their resolution and performance impact. Resolution describes the ability of operators to generate hard to kill mutants, which require more specific test cases to kill and thus promote the creation of a high-quality test suite. Performance impact relates to the relative number of mutants an operator generates.

An ideal mutation level consists of mutation operators with a high resolution and low performance impact, so that a high-quality test suite is encouraged with good performance. The difficulty of designing the mutation levels therefore lies in choosing which mutation operators are worth keeping and which can be excluded, based on their resolution and performance impact. For this the tool Callisto is developed, which analyses mutation operators in an empirical setting to quantify their resolution and performance impact. This is done using an existing quality metric from literature.