Querying Pull Request Data from GitHub
In this article we will use MergeStat and Arctype to monitor contributions to our projects.
Join the DZone community and get the full member experience.
Join For FreeTo make sure that bugs do not reach the end-user of your product, you need to do code reviews. In code review, metrics of pull requests matter a lot because they provide data on how well you are shipping. Software developers can use pull requests metrics to understand the team dynamics and act appropriately to correct behaviors before things get out of hand.
What Is Code Review?
Code review is when a peer or a senior developer examines a developer's code or a pull request. Code reviews help developers discover common bugs faster and reduce the work required to optimize code later.
Some of the benefits of code review include:
- Consistency in design and implementation.
- Optimizing code for better performance.
- Saving time and money.
- Monitoring project quality and requirements.
- Streamlining the development process.
- Collaboration and sharing of new techniques.
This article will help you learn how to use MergeStat to get pull request data from a GitHub repo.
We will start by fetching data from a GitHub repo using MergeStat. Then we will connect to SQLite using Arctype. Finally, we will explore the pull requests data by running some SQL queries on Arctype.
Arctype is a database management and visualization tool. Using Arctype, you will learn how to execute or run SQL queries to come up with metrics such as:
- How many unanswered issues does your project have?
- How many pull requests were merged in the last month?
- How many unapproved merged pull requests have we had?
Before querying GitHub repo data in Arctype, we will get the repo data using the MergeStat web app by querying GitHub API. Let's get started.
Getting GitHub Repo Data by Querying the GitHub API Using MergeStat
In this section, you will learn how to use the MergeStat web app to query a GitHub API and get all the pull requests in a GitHub repository. To query the GitHub API, you must provide an authentication token. You can create a personal access token by following these instructions.
Before we proceed, there is something you need to know. All the API requests to a GitHub API are rate limited. The SQL query we will be running uses the GitHub GraphQL API (v4), which rate limits based on the "complexity" of GraphQL queries. The more fields/relations in your GraphQL, the higher the "cost" of a single API request, and the faster you may reach the rate limit.
Unfortunately, it is hard to know ahead of time what an excellent client-side rate limit is. By default, each SQL query will fetch 100 items per page and permit 2 API requests per second.
With that in mind, navigate to the MergeStat web app.
To feel the magic of the MergeStat web app, click the blue "Execute" button at the bottom on the right side of the screenshot above. After the execution finishes running, scroll down a bit, and you should see the results below.
Let's now run a SQL query that returns all the React Codebase repository pull requests on GitHub. On the MergeStat web app, click the 'Query Settings' button at the bottom on the left side of this screenshot.
After clicking the 'Query Settings', make sure you are logged in to the MergeStat web app to add the GitHub repo you want to query. Follow these instructions to get GitHub Access Token and paste it in its field as shown below. Enter a "GitHub Per Page" limit of 10 to avoid running to errors when the query is executing. You can leave the GitHub Rate Limit field blank. Don't forget to save your inputs.
After saving your inputs, execute the following SQL query on the MergeStat web app.
SELECT * FROM GitHub_repo_prs('facebook/react');
You should get results once the execution is done, as shown below.
Let us now repeat what we did above, but in this case, after installing the MergeStat app on our computers. We will then use the installed MergeStat app to query a GitHub API and get all the pull request data in React Codebase by Facebook. After that, we will use Arctype to visualize the pull request data using charts.
How to Install the MergeStat Desktop App
One of the easiest ways to install MergeStat on your system is by using Homebrew. First, make sure Homebrew is installed in your system, and then run the following commands on your command line.
brew tap mergestat/mergestat
brew install mergestat
After MergeStat is installed, you must add your GitHub token and GitHub per page limit as environment variables. If you are on Linux, you can follow these instructions on adding environment variables.
The environment variables should look as shown below.
GitHub_TOKEN=’your_GitHub_token’
GitHub_PER_PAGE=10
MergeStat will look for a GitHub_TOKEN
environment variable when executing and use it for authentication. Additionally, GitHub_PER_PAGE
expects an integer between 1 and 100 to set how many items are fetched per page in API calls that paginate results.
Let us now query the GitHub API using the MergeStat app to get the pull request data of the React codebase by Facebook.
On your command line, run the command shown below.
mergestat export pr-file.db -e GitHub_repo_prs \
-e "SELECT * FROM GitHub_repo_prs('facebook/react')"
The command above will create a file called pr-file.db
in your current directory.
Exporting the React Codebase Pull Requests Data From GitHub
To query the React codebase pull requests data, we will use Arctype. You can install the Arctype app on your system here.
Once you have Arctype installed, you will be redirected to a page to add your database credentials. In this case, select SQLite as your connection.
Once you have selected the SQLite database as your connection, you will be redirected to a page to name your SQLite connection and add an SQLite file to Arctype.
The SQLite file that you need to add is the pr-file.db
file, which was downloaded to your directory when you ran the command below on your command line.
mergestat export pr-file.db -e GitHub_repo_prs \
-e "SELECT * FROM GitHub_repo_prs('facebook/react')"
Click the "Choose SQLite File" button and navigate to the folder that has the pr-file.db
file. In my case, the file is in a folder called mysql-server-main
, as shown below.
Once you have selected the file, click "Open" to add the file to Arctype. Then test the connection, and don't forget to click the save button. If everything is okay, you will be redirected to your workspace. On your workspace, click the table to explore it.
Once you click on the table name, you should see a table containing all pull requests data from the React codebase repo on GitHub.
Let us now run some queries in Arctype to check out some different metrics from the data.
Identifying Unanswered Issues and Pull Requests
Unanswered issues and pull requests on GitHub repos are no fun if you are the one asking for help. However, identifying unanswered issues can also be valuable to maintainers to ensure that they are addressing user concerns.
In this step, we will run a query on Arctype to identify stale issues or pull requests in React Codebase repository on GitHub. In Arctype, run the following SQL query that will return the oldest 25 issues created more than 30 days ago, with no comments, and remain open.
SELECT
title, author_login, comment_count, created_at, url
FROM GitHub_repo_prs
WHERE
created_at < date('now', '-30 days')
AND (merged = 0 OR closed = 0)
AND comment_count = 0
ORDER BY created_at ASC
LIMIT 25
Once the SQL query is done running, you should see a table like the one shown below, showing the oldest unanswered pull requests. "Add explanation if rendered a nonvalid component" was created in 2013 and never received any comments. It can probably be removed!
Finding Unapproved Merged Pull Requests
Allowing repository administrators to merge pull requests without a review can be a helpful shortcut for "emergency" hotfixes that avoid a potentially time-consuming code review cycle. However, too many unapproved merges are probably a sign of something wrong.
In this step, we will run a SQL query on Arctype that will list all unapproved pull requests merged into the main branch of React Codebase repository on GitHub.
SELECT
number,
title,
date(created_at) AS created_at,
date(merged_at) AS merged_at,
author_login,
review_decision,
merged
FROM GitHub_repo_prs -- set to your own repo
WHERE
merged = true
AND review_decision <> 'APPROVED'
AND base_ref_name = 'main' -- set to your own branch
ORDER BY created_at DESC
Once the SQL query is done running, you should have a table as shown below. These issues both seem like behind-the-scenes internal stuff. To be sure, we can pop the full text of the title open with cell inspector.
Pull Requests Merged in the Past 30 Days
When a project has more than a handful of regular contributors, it can be challenging to see all the changes.
In this step, we will run a query on Arctype that will show all pull requests that have been merged in the past 30 days on React codebase repository on GitHub.
SELECT
base_repository_name,
title,
number,
url,
author_login,
created_at,
merged_at
FROM GitHub_repo_prs
WHERE
merged_at > date('now', '-30 days') -- replace with the time period you care about
AND merged = true
Once the SQL query is done running, you should have a table as shown below. Notice the 62 rows count at the very bottom. This is a very active repo, averaging 2 merges a day!
Conclusion
Every time someone writes code and wants to merge it into the main branch, they should request that another developer take a look and offer feedback. A code review helps developers to maintain consistency between design and implementation styles across man team members.
However, as a code reviewer, you might miss some pull requests. Using tools like Mergestat and Arctype can be a great way to review pull requests in depth and take action on your findings.
Published at DZone with permission of Bonnie !. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments