Towards Assessing and Repairing LLM-Generated Code via Model Checking and MaxSAT-Based Fault Localisation @ Dagstuhl 2026

Name: Towards Assessing and Repairing LLM-Generated Code via Model Checking and MaxSAT-Based Fault Localisation @ Dagstuhl 2026
Start: 2026-05-05T16:30:00Z
End: 2026-05-05T17:00:00Z
Location: Dagstuhl

May 5, 2026·

Pedro Orvalho

· 0 min read

Slides project

Image credit: Schloss Dagstuhl

Abstract

Large Language Models (LLMs) are increasingly used for programming tasks, yet their ability to truly understand program semantics remains unclear. In this talk, we investigate the robustness and reliability of LLMs in reasoning about code. We first present an empirical study evaluating whether LLMs’ output predictions are grounded in sound reasoning or arise from superficial pattern matching. By introducing a set of semantics-preserving code mutations—such as variable renaming, control-flow transformations, and loop modifications—we show that even state-of-the-art models frequently change their predictions, revealing limited robustness to syntactic variations. Motivated by these findings, we explore how to improve the correctness of LLM-generated code by integrating formal methods. We present an approach that combines model checking with MaxSAT-based fault localisation to guide LLMs toward more accurate program repairs. Our approach leverages counterexamples and minimal diagnoses to identify faulty code regions, which are then used to refine LLM prompts. Finally, we introduce a novel approach for verifying Python programs by leveraging LLM-based transpilation to C and bounded model checking. This hybrid pipeline enables precise fault localisation and verification of non-trivial Python programs, bridging the gap between the flexibility of LLMs and the rigor of formal verification tools. Overall, this work highlights fundamental limitations in current code models and demonstrates how neurosymbolic approaches can enhance their reliability in Software Engineering tasks.

Date

May 5, 2026 4:30 PM — 5:00 PM

Event

In the Dagstuhl Seminar 26192 - Evaluation of AI Models in Software Engineering

Location

Dagstuhl

Wadern, Saarland, Germany.

Last updated on May 5, 2026

Software Verification Model Checking Python Verification LLM-Based Transpilation MaxSAT-Based Fault Localisation Formal Methods CFaults PyVeritas