What Bugs Do Prolog Students Write? An Empirical Taxonomy and Data-Driven Mutation Framework

May 5, 2026·

Ricardo Brancas

Pedro Orvalho

Carolina Carreira

Vasco Manquinho

Ruben Martins

· 0 min read

project PDF

Image credit: ICLP

Abstract

Automated feedback tools for logic programming education depend on realistic bug datasets that reflect the mistakes students actually make. However, existing mutation testing frameworks for Prolog treat all mutations as equally likely, producing synthetic faults that diverge from classroom reality. We present an empirical study of 7,201 Prolog submissions from 265 undergraduate students, from which we derive a fine-grained taxonomy of student bugs through manual classification of 200 bug-fixing submissions. Guided by this taxonomy, we develop LogMorph, a data-driven mutation tool whose 17 operators are weighted according to the observed error distribution. LogMorph enumerates valid mutation sites on the abstract syntax tree, samples operators proportionally, injects faults, delegating to an SMT-based synthesizer when new code fragments are needed, and validates each mutant against a reference test suite. An evaluation of 16,000 generated mutants shows that the synthetic error distribution closely matches the student distribution, with most bug categories agreeing to within two percentage points. We identify cut-related mutations and synthesizer-generated code as the main sources of residual divergence, and outline how combining the SMT back-end with a language model fine-tuned on student code can further improve realism.

Type

Conference paper

Publication

In the 42nd International Conference on Logic Programming (ICLP) [CORE B Conference]. [Accepted for Publication]