Mutant dataset standard
We use RDF to define and store the metadata. This also ensures the FAIR data principles F1, F3, A1, I1, and I3. Although RDF is not designed to enforce the user to supply metadata and thus not enforcing for F2, MutantBench enables the user to include missing metadata during conversion.
This section goes into detail of this standard and how this is defined.
Program
The program (also original program) is the piece of software that does not have any mutants applied. In this standard, it is a subclass of the SoftwareSourceCode created by Schema, or schema:SoftwareSourceCode for short (https://schema.org/SoftwareSourceCode). This is to done to integrate with the existing and widely used language defined by Schema. Each program that we provide contains at least the following metadata (also called properties):
-
URI: A globally unique and persistent identifier. Same [name].[extension]
. This restricts programs with the same name and programming language existing multiple times in the dataset.
-
name: The name of the file without the extension.
- extension: The file extension
-
fileName: The name of the file including the extension.
-
programmingLanguage: The computer programming language.
- codeRepository: Link to the source code of the program.
Operator
Each operator is defined using the class mb:Operator. Which is a subclass of the schema:Thing (https://schema.org/Thing). The operator standard includes the following metadata:
- URI: A globally unique and persistent identifier. For this the abbriviation of the name is used. e.g. ABSI. For the full list, see Table~\ref{tab:mutantbench_operators}.
- name: A descriptive name. Example: ``Absolute insertion" for the ABSI operator.
- operatorAbbreviation: The short name, or abbreviation, of the mutant operator
- operatorDescription: The text representation in LaTeX of the mathematical operation.
- primitiveOperator: The primitive action of operator used as defined by by Y. Ma in "Description of muJava's Method-level Mutation Operators".
- operatorClass: The class of the operator as defined by King and Offurr in "A fortran language system for mutation-based software testing".
- operatorAction: The type of operation the operator will perform.
Mutant
Each mutant is defined using the class mb:Mutant}. This is a subclass of schema:SoftwareSourceCode (https://schema.org/SoftwareSourceCode). This comes with the following added properties:
- URI: A globally unique and persistent identifier. Generated by using the concatenated string of the program name (without extension) with the mutant difference on the SHA-1 algorithm (https://en.wikipedia.org/wiki/SHA-1) in hexadecimal. Python example:
hashlib.sha1((file_name + diff).encode()).hexdigest()
. This is chosen for its ease of implementation.
- citation: The original paper this mutant came from.
- program: The program that the mutant originates from.
- difference: The difference between the mutant and the program. Generated using the following bash command: diff -u0 --ignore-all-space --ignore-blank-lines [SoftwareSourceCode location] [mutant location]}.
- equivalence: If the mutant is equivalent to the program this should be true. If it is not equivalent, it should be false. Else empty/unknown/none/null
- operator: The operator(s) applied to the program to generate the mutant.
- RIP: The RIP class of the mutant.