/static/assets/36.png

cregit: Token-level blame information in git version control repositories

Daniel M. German
2019
0
Downloads
99
Views
0
Upvotes
Cite this Paper
0
Downloads
99
Views
0
Upvotes

Description

The blame feature of version control systems is widely used—both by practitioners and researchers—to determine who has last modified a given line of code, and the commit where this contribution was made. The main disadvantage of blame is that, when a line is modified several times, it only shows the last commit that modified it—occluding previous changes to other areas of the same line. In this paper, we developed a method to increase the granularity of blame in git: instead of tracking lines of code, this method is capable of tracking tokens in source code. We evaluate its effectiveness with an empirical study in which we compare the accuracy of blame in git (per line) with our proposed blame-per-token method. We demonstrate that, in 5 large open source systems, blame-per-token is capable of properly identifying the commit that introduced a token with an accuracy between 94.5% and 99.2%, while blame-per-line can only achieve an accuracy between 75% and 91% (with a margin of error of +/-5% and a confidence interval of 95%). We also classify the reasons why either blame method fails, highlighting each method’s weaknesses. The blame-per-token method has been implemented in an open source tool called cregit, which is currently in use by the Linux Foundation to identify the persons who have contributed to the source code of the Linux kernel.
Terms of use

Comments