Accurate and Efficient Refactoring Detection in Commit History


Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios.

To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RefactoringMiner, a technique that overcomes the above limitations. At the heart of RefactoringMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RefactoringMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,187 refactorings from 185 open-source projects. Using this oracle, we found that RefactoringMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art. Moreover, RefactoringMiner's speed warrants novel applications, such as online refactoring detection.

Mining Software Repositories Refactoring Detection ICSE
  • The paper can be downloaded from here.
  • The presentation slides can be viewed here.
  • RefactoringMiner can be downloaded from GitHub.
  • The refactoring oracle can be accessed here.


    author={Tsantalis, Nikolaos and Mansouri, Matin and Eshkevari, Laleh and Mazinanian, Davood and Dig, Danny},
    title={Accurate and Efficient Refactoring Detection in Commit History},
    booktitle={Proceedings of the 40th International Conference on Software Engineering},
    series = {ICSE 2018},
    location = {Gothenburg, Sweden},
    numpages = {12}
    year = 2018,