Code Clone Detection: An Empirical Study of Techniques for Software Engineering Practice

Main Article Content

Harshita Kaushik, K D Gupta


— Software cloning has been a prevalent practice in software development for several decades, wherein code fragments are duplicated and reused throughout the codebase. While cloning can help boost productivity and code maintainability, it also has the potential to introduce new problems like bugs, inconsistencies, and code smells. The detection of code clones, which entails the identification of code segments that are structurally similar or identical, is one of the key issues in dealing with software cloning. Various types of code clones include Type 1 clones (identical code fragments), Type 2 clones (structurally similar code fragments), Type 3 clones (code fragments with semantic similarity), and Type 4 clones. Textual analysis, token-based analysis, and tree-based analysis are only a few of the methods explored to identify code clones. Another method that has shown promise in clone identification is probabilistic software modeling, in which code is modeled as a probabilistic network and clones are found by analysis of the graph structure. Herein, we survey the state-of-the-art in software cloning and code clone detection methods. The paper also covered the numerous kinds of code clones along with their benefits and drawbacks. We next explore and evaluate many methods for identifying code clones, including probabilistic software modeling. Finally, we investigate the ways in which probabilistic software modeling may be used for various software engineering purposes, such as predictive and generative.

Article Details