/static/assets/36.png

A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments

Mehdi Golzadeh
2021
0
Downloads
165
Views
0
Upvotes
Cite this Paper
0
Downloads
165
Views
0
Upvotes

Description

An automated tool to identify bots in GitHub repositories by analysing pull request and issue comments. The tool accepts the name of a GitHub repository and requires a GitHub API key to compute its output in three steps. The first step consists of downloading all pull request comments and issue comments from the specified repository using the GitHub GraphQL API. This step results in a list of commenters and their corresponding pull request and issue comments. The second step consists of computing the following features that are needed for the classification model: the number of comments, empty comments, comment patterns, and inequality between the number of comments within patterns. The third step applies the classification model on the repository data and outputs the bot prediction made by the classification model.
Terms of use

Comments