The presence of duplicated code in software systems is significant and several studies have shown that clones can be potentially harmful with respect to the maintainability and evolution of the source code. Despite the significance of the problem, there is still limited support for eliminating software clones through refactoring, because the unification and merging of duplicated code is a very challenging problem, especially when software clones have gone through several modifications after their initial introduction. In this work, we propose an approach for automatically assessing whether a pair of clones can be safely refactored without changing the behavior of the program. In particular, our approach examines if the differences present between the clones can be safely parameterized without causing any side-effects. The evaluation results have shown that the clones assessed as refactorable by our approach can be indeed refactored without causing any compile errors or test failures. Additionally, the computational cost of the proposed approach is negligible (less than a second) in the vast majority of the examined cases. Finally, we perform a large-scale empirical study on over a million clone pairs detected by four different clone detection tools in nine open-source projects to investigate how refactorability is affected by different clone properties and tool configuration options. Among the highlights of our conclusions, we found that (a) clones in production code tend to be more refactorable than clones in test code, (b) clones with a close relative location (i.e., same method, type, or file) tend to be more refactorable than clones in distant locations (i.e., same hierarchy, or unrelated types), (c) Type-1 clones tend to be more refactorable than the other clone types, and (d) clones with a small size tend to be more refactorable than clones with a larger size.
You can find the data for replication and read more about the paper here.