The repo for this project can be found here.
Professional developers spend around 70% of their time debugging. Of the approximately 70 buggy lines of code that developers write for every 1000 lines, 15 make it to production. Compilers and interpreters have been designed with this in mind, often providing programmers with error-fixing tips, but often fail at providing insightful clues on how to fix bugs.
Neural networks have become a candidate for bug patching. This work involves using transformer models to translate one-line Java buggy lines to bug free lines.
We mined about 10,000 Java Github repos, and extracted 1 lines changes between pre-commit and post-commit. Since these changes occur between successive commits, we reason they represent logical bugs in addition to syntax or code-style errors. In the end, we obtained about 90,000 1 line changes.
We treated this problem as a translation problem, specifically involving the translation of a buggy line to a bug free line. Contrary to natural language, source code is very structured and coherent. In addition, most buggy lines need only small changes to be converted to bug free lines. For these reasons, we reasoned transformer models would be well suited for this task. For this work, we experimented with the following 5 configurations:
Model | Beam Size | BLEU | Fix Accuracy |
---|---|---|---|
Base Model | 4 | 77.9 | 53.3 |
Base Model, Beam Size 10 | 10 | 77.9 | 53.3 |
Base Model, with BPE (2k vocab) | 4 | 82.3 | 54.6 |
Base Model, 1 line content | 4 | 84.3 | 35.5 |
Base Model, 2 line context | 4 | 84.8 | 33.5 |
Input: @Exported ( name = ”str” )
Reference: @Exported ( name = ”str” , inline = true )
Model Output: @Exported ( name = ”str” , inline = true )
Input: String txt = yytext ( ) ;
Reference: }
Model Output: }
Input: public static String getDefaultAlias ( String aSourceName )
Reference: public static String getDefaultAlias ( String sourceName )
Model Output: public static String getDefaultAlias ( String sourceName )
Input: Map <String , DetectorNode >nodeMap = new HashMap <String ,
DetectorNode >( ) ;
Reference: Map <String , DetectorNode >nodeMap = new HashMap <>( ) ;
Model Output: Map <String , DetectorNode >nodeMap = new HashMap <>(
) ;
Input: ArrayList
Reference: ArrayList
Model Output: ArrayList
Input: Setting . byteSizeSetting ( ”str” , new ByteSizeValue ( 32 , ByteSizeUnit
. KB ) , Property . NodeScope ) ;
Reference: Setting . byteSizeSetting ( ”str” , new ByteSizeValue ( 64 , ByteSizeUnit . KB ) , Property . NodeScope ) ;
Model Output: Setting . byteSizeSetting ( ”str” , new ByteSizeValue ( 64 ,
ByteSizeUnit . KB ) , Property . NodeScope ) ;
These 4 error fixes presented are by no means the full extent of the model. A full file of translations is available in the repository. These examples do indicate the model is able to learn:
Input: private void checkIfClosed ( ) throws IOException {
Reference: private void checkIfClosed ( ) {
Model Output: private void checkIfClosed ( ) throws IOException {
Input: addMessage ( new Message ( MessageKind . INFORMATION , model .
getModelUID ( ) + ”str” + ( genOK ? ”str” : ”str” ) ) ) ;
Reference: addMessage ( new Message ( genOK ? MessageKind . INFORMATION : MessageKind . WARNING , model . getModelUID ( ) + ”str” + (
genOK ? ”str” : ”str” ) ) ) ;
Model Output: addMessage ( new Message ( MessageKind . INFORMATION
, model . getModelUID ( ) + ”str” + ( genOK ? ”str” : ”str” ) ) ) ;
Input: return 1 ;
Reference: return curSelection = = null ? 0 : 1 ;
Model Output: return 2 ;
These three errors are not comprehensize of all issues with the models presented, but highlight some key failure modes:
This work clearly shows that Transformer models are well suited for code patching. Their ability to reason with long-distance dependencies seemed promising in theory, and these experiments prove they indeed work in practice. In addition to mere syntactic fixes, these models are able to fix some style and even basic logical errors. We hope those who come after us will be able to further refine these models to solve increasingly complex bugs.