TY - GEN
T1 - A systematic study of automated program repair
T2 - 34th International Conference on Software Engineering, ICSE 2012
AU - Le Goues, Claire
AU - Dewey-Vogt, Michael
AU - Forrest, Stephanie
AU - Weimer, Westley
PY - 2012/7/30
Y1 - 2012/7/30
N2 - There are more bugs in real-world programs than human programmers can realistically address. This paper evaluates two research questions: "What fraction of bugs can be repaired automatically?" and "How much does it cost to repair a bug automatically?" In previous work, we presented GenProg, which uses genetic programming to repair defects in off-the-shelf C programs. To answer these questions, we: (1) propose novel algorithmic improvements to GenProg that allow it to scale to large programs and find repairs 68% more often, (2) exploit GenProg's inherent parallelism using cloud computing resources to provide grounded, human-competitive cost measurements, and (3) generate a large, indicative benchmark set to use for systematic evaluations. We evaluate GenProg on 105 defects from 8 open-source programs totaling 5.1 million lines of code and involving 10,193 test cases. GenProg automatically repairs 55 of those 105 defects. To our knowledge, this evaluation is the largest available of its kind, and is often two orders of magnitude larger than previous work in terms of code or test suite size or defect count. Public cloud computing prices allow our 105 runs to be reproduced for $403; a successful repair completes in 96 minutes and costs $7.32, on average.
AB - There are more bugs in real-world programs than human programmers can realistically address. This paper evaluates two research questions: "What fraction of bugs can be repaired automatically?" and "How much does it cost to repair a bug automatically?" In previous work, we presented GenProg, which uses genetic programming to repair defects in off-the-shelf C programs. To answer these questions, we: (1) propose novel algorithmic improvements to GenProg that allow it to scale to large programs and find repairs 68% more often, (2) exploit GenProg's inherent parallelism using cloud computing resources to provide grounded, human-competitive cost measurements, and (3) generate a large, indicative benchmark set to use for systematic evaluations. We evaluate GenProg on 105 defects from 8 open-source programs totaling 5.1 million lines of code and involving 10,193 test cases. GenProg automatically repairs 55 of those 105 defects. To our knowledge, this evaluation is the largest available of its kind, and is often two orders of magnitude larger than previous work in terms of code or test suite size or defect count. Public cloud computing prices allow our 105 runs to be reproduced for $403; a successful repair completes in 96 minutes and costs $7.32, on average.
KW - automated program repair
KW - cloud computing
KW - genetic programming
UR - http://www.scopus.com/inward/record.url?scp=84864264923&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84864264923&partnerID=8YFLogxK
U2 - 10.1109/ICSE.2012.6227211
DO - 10.1109/ICSE.2012.6227211
M3 - Conference contribution
AN - SCOPUS:84864264923
SN - 9781467310673
T3 - Proceedings - International Conference on Software Engineering
SP - 3
EP - 13
BT - Proceedings - 34th International Conference on Software Engineering, ICSE 2012
Y2 - 2 June 2012 through 9 June 2012
ER -