MultIPAs: Applying Program Transformations to Introductory Programming Assignments for Data Augmentation, ESEC/FSE 2022

Abstract

There has been a growing interest, over the last few years, in the topic of automated program repair applied to fixing introductory programming assignments (IPAs). However, the datasets of IPAs publicly available tend to be small and with no valuable annotations about the defects of each program. Small datasets are not very useful for program repair tools that rely on machine learning models. Furthermore, a large diversity of correct implementations allows computing a smaller set of repairs to fix a given incorrect program rather than always using the same set of correct implementations for a given IPA. For these reasons, there has been an increasing demand for the task of augmenting IPAs benchmarks. This paper presents MultIPAs, a program transformation tool that can augment IPAs benchmarks by (1) applying six syntactic mutations that conserve the program’s semantics and (2) applying three semantic mutilations that introduce faults in the IPAs. Moreover, we demonstrate the usefulness of MultIPAs by augmenting with millions of programs two publicly available benchmarks of programs written in the C language, and also by generating an extensive benchmark of semantically incorrect programs.

Publication
In the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
Avatar
Pedro Orvalho
Computer Science Ph.D. Student

My research interests include Automated Reasoning, Program Repair and Program Synthesis.

Related