Explore Deep in 4.6 Billion GitHub Events
Understand any GitHub project or quickly compare any two projects by digging deep into 4.6 billion GitHub events in real-time. Here are some ways you can play with it.
Join the DZone community and get the full member experience.
Join For Free4.6 billion is literally an astronomical figure. The richest star map of our galaxy, brought by Gaia space observatory, includes just under 2 billion stars. What does a view of 4.6 billion GitHub events really look like? What secrets and values can be discovered in such an enormous amount of data?
Here you go: OSSInsight.io can help you find the answer. It’s a useful insight tool that can give you the most updated open source intelligence, and help you deeply understand any single GitHub project or quickly compare any two projects by digging deep into 4.6 billion GitHub events in real-time. Here are some ways you can play with it.
Compare Any Two GitHub Projects
Do you wonder how different projects have performed and developed over time? Which project is worthy of more attention? OSSInsight.io can answer your questions via the Compare Projects page.
Let’s take the Kubernetes repository (K8s) and Docker’s Moby repository as examples and compare them in terms of popularity and coding vitality.
Popularity
To compare the popularity of two repositories, we use multiple metrics including the number of stars, the growth trend of stars over time, and stargazers’ geographic and employment distribution.
Number of Stars
The line chart below shows the accumulated number of stars of K8s and Moby each year. According to the chart, Moby was ahead of K8s until late 2019. The star growth of Moby slowed after 2017 while K8s has kept a steady growth pace.
The star history of K8s and Moby
Geographical Distribution of Stargazers
The map below shows the stargazers’ geographical distribution of Moby and K8s. As you can see, their stargazers are scattered around the world with the majority coming from the US, Europe, and China.
The geographical distribution of K8s and Moby stargazers
Employment Distribution of Stargazers
The chart below shows the stargazers’ employment of K8s (red) and Moby (dark blue). Both of their stargazers work in a wide range of industries, and most come from leading dot-com companies such as Google, Tencent, and Microsoft. The difference is that the top two companies of K8s’ stargazers are Google and Microsoft from the US, while Moby’s top two followers are Tencent and Alibaba from China.
The employment distribution of K8s and Moby stargazers
Coding Vitality
To compare the coding vitality of two GitHub projects, we use many metrics including the growing trend of pull requests (PRs), the monthly number of PRs, commits, and pushes, and the heat map of developers’ contribution time.
Number of Commits and Pushes
The bar chart below shows the number of commits and pushes submitted to K8s (top) and Moby (bottom) each month after their inception. Generally speaking, K8s has more pushes and commits than Moby, and their number grew stably until 2020 followed by a slowdown afterward. Moby’s monthly pushes and commits had a minor growth between 2015 and 2017, and then barely increased after 2018.
The monthly pushes and commits of K8s (top) and Moby (bottom)
Number of PRs
The charts below show the monthly and accumulated number of PRs of the two repositories. As you can see, K8s has received stable and consistent PR contributions ever since its inception and its accumulated number of PRs has also grown steadily. Moby had vibrant PR submissions before late 2017 but started to drop afterward. Its accumulated number of PRs reached a plateau in 2017, which has remained the case ever since.
The monthly and accumulated PR number of K8s (top) and Moby (bottom)
Developers’ Contribution Time
The following heat map shows developers’ contribution time for K8s (left) and Moby (right). Each square represents one hour in a day. The darker the color, the more contributions occur during that time. K8s have many more dark parts than Moby, and K8s’ contributions occur almost 24 hours a day, 7 days a week. K8s definitely has more dynamic coding activities than Moby.
Heat map of developers’ contribution time of K8s (left) and Moby (right)
Taken together, these metrics show that while both K8s and Moby are popular across industries worldwide, K8s have more vibrant coding activities than Moby. K8s is continuously gaining popularity and coding vitality while Moby is falling in both over time.
Popularity and coding vitality are just two dimensions to compare repositories. If you want to discover more insights or compare other projects you are interested in, feel free to visit the Compare page and explore it for yourself.
Of course, you can use this same page to deeply explore any single GitHub project and gain the most up-to-date insights about them. The key metrics and the corresponding changes are presented in a panoramic view. More in-depth analytics such as code changes by PR size groups and PR lines are also available. Explore it for yourself and you’d be surprised. Have fun.
Panoramic view of key GitHub metrics (K8s as an example)
Total PR number each month/PR groups (K8s as an example)
The number of lines of code change each month (K8s as an example)
Key Open Source Insights
OSSInsight.io does more than explore or compare repositories. It gives you historical, real-time, and custom open source insights. In this section, we’ll share some key insights into open source databases and programming languages. If you want to gain insights into other areas, you can explore the Insights page for yourself.
Note: If you want to get those analytical results by yourself, you can execute the SQL commands above each chart on TiDB Cloud with ease following this 10-minute tutorial.
Rust: the Most Active Programming Language
Rust was first released in 2012 and has been among the leading programming languages for 10 years. It has the most active repository with a total of 103,047 PRs at the time of writing.
Here's the SQL commands
SELECT
/*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */
programming_language_repos.name AS repo_name,
COUNT(*) AS num
FROM github_events
JOIN programming_language_repos ON programming_language_repos.id = github_events.repo_id
WHERE type = 'PullRequestEvent'
AND action = 'opened'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
PR numbers of the leading programming languages
Go: the New Favorite and the Fastest Growing Programming Language
According to OSSInsight.io, 10 programming languages dominate the open source community. Go is the most popular with 108,317 stars, followed by Node and TypeScript. Go is also the fastest-growing language in popularity.
Here's the SQL commands
WITH repo_stars AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ repo_id, ANY_VALUE(repos.name) AS repo_name, COUNT(distinct actor_login) AS stars FROM github_events JOIN programming_language_repos repos ON repos.id = github_events.repo_id WHERE type = 'WatchEvent' GROUP BY 1 ), top_10_repos AS ( SELECT repo_id, repo_name, stars FROM repo_stars rs ORDER BY stars DESC LIMIT 10 ), tmp AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ event_year, tr.repo_name AS repo_name, COUNT(*) AS year_stars FROM github_events JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id WHERE type = 'WatchEvent' AND event_year <= 2021 GROUP BY 2, 1 ORDER BY 1 ASC, 2 ), tmp1 AS ( SELECT event_year, repo_name, SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars FROM tmp ORDER BY event_year ASC, repo_name ) SELECT event_year, repo_name, stars FROM tmp1
WITH repo_stars AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ repo_id, ANY_VALUE(repos.name) AS repo_name, COUNT(distinct actor_login) AS stars FROM github_events JOIN programming_language_repos repos ON repos.id = github_events.repo_id WHERE type = 'WatchEvent' GROUP BY 1 ), top_10_repos AS ( SELECT repo_id, repo_name, stars FROM repo_stars rs ORDER BY stars DESC LIMIT 10 ), tmp AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ event_year, tr.repo_name AS repo_name, COUNT(*) AS year_stars FROM github_events JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id WHERE type = 'WatchEvent' AND event_year <= 2021 GROUP BY 2, 1 ORDER BY 1 ASC, 2 ), tmp1 AS ( SELECT event_year, repo_name, SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars FROM tmp ORDER BY event_year ASC, repo_name ) SELECT event_year, repo_name, stars FROM tmp1
The star growth trends of leading programming languages
Microsoft and Google: the Top Two Programing Languages Contributors
As world-renowned high-tech companies, Microsoft and Google take the lead in open source language contributions with a total of 1,443 and 947 contributors respectively at the time of writing.
Here's the SQL commands
SELECT /*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */ TRIM(LOWER(REPLACE(u.company, '@', ''))) AS company, COUNT(DISTINCT actor_id) AS num FROM github_events github_events JOIN programming_language_repos db ON db.id = github_events.repo_id JOIN users u ON u.login = github_events.actor_login WHERE github_events.type IN ( 'IssuesEvent', 'PullRequestEvent','IssueCommentEvent', 'PullRequestReviewCommentEvent', 'CommitCommentEvent', 'PullRequestReviewEvent' ) AND u.company IS NOT NULL AND u.company != '' AND u.company != 'none' GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
SELECT /*+ read_from_storage(tiflash[github_events]), MAX_EXECUTION_TIME(120000) */ TRIM(LOWER(REPLACE(u.company, '@', ''))) AS company, COUNT(DISTINCT actor_id) AS num FROM github_events github_events JOIN programming_language_repos db ON db.id = github_events.repo_id JOIN users u ON u.login = github_events.actor_login WHERE github_events.type IN ( 'IssuesEvent', 'PullRequestEvent','IssueCommentEvent', 'PullRequestReviewCommentEvent', 'CommitCommentEvent', 'PullRequestReviewEvent' ) AND u.company IS NOT NULL AND u.company != '' AND u.company != 'none' GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
Companies who contribute the most to programming languages
Elasticsearch Draws the Most Attention
Elasticsearch was one of the first open-source databases. It is the most liked database with 64,554 stars, followed by Redis and Prometheus. From 2011 to 2016, Elasticseasrch and Redis shared the top spot until Elasticsearch broke away in 2017.
Here's the SQL commands
WITH repo_stars AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ repo_id, ANY_VALUE(repos.name) AS repo_name, COUNT(distinct actor_login) AS stars FROM github_events JOIN db_repos repos ON repos.id = github_events.repo_id WHERE type = 'WatchEvent' GROUP BY 1 ), top_10_repos AS ( SELECT repo_id, repo_name, stars FROM repo_stars rs ORDER BY stars DESC LIMIT 10 ), tmp AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ event_year, tr.repo_name AS repo_name, COUNT(*) AS year_stars FROM github_events JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id WHERE type = 'WatchEvent' AND event_year <= 2021 GROUP BY 2, 1 ORDER BY 1 ASC, 2 ), tmp1 AS ( SELECT event_year, repo_name, SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars FROM tmp ORDER BY event_year ASC, repo_name ) SELECT event_year, repo_name, stars FROM tmp1
WITH repo_stars AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ repo_id, ANY_VALUE(repos.name) AS repo_name, COUNT(distinct actor_login) AS stars FROM github_events JOIN db_repos repos ON repos.id = github_events.repo_id WHERE type = 'WatchEvent' GROUP BY 1 ), top_10_repos AS ( SELECT repo_id, repo_name, stars FROM repo_stars rs ORDER BY stars DESC LIMIT 10 ), tmp AS ( SELECT /*+ read_from_storage(tiflash[github_events]) */ event_year, tr.repo_name AS repo_name, COUNT(*) AS year_stars FROM github_events JOIN top_10_repos tr ON tr.repo_id = github_events.repo_id WHERE type = 'WatchEvent' AND event_year <= 2021 GROUP BY 2, 1 ORDER BY 1 ASC, 2 ), tmp1 AS ( SELECT event_year, repo_name, SUM(year_stars) OVER(partition by repo_name order by event_year ASC) as stars FROM tmp ORDER BY event_year ASC, repo_name ) SELECT event_year, repo_name, stars FROM tmp1
The star growth trend of leading databases
China: the Number One Fan of Open Source Databases
China has the most open source database followers with 11,171 stargazers of database repositories, followed by the US and Europe.
Here are the SQL commands
select upper(u.country_code) as country_or_area, count(*) as count, count(*) / max(s.total) as percentage
from github_events
use index(index_github_events_on_repo_id)
left join users u ON github_events.actor_login = u.login
join (
-- Get the number of people has the country code.
select count(*) as total
from github_events
use index(index_github_events_on_repo_id)
left join users u ON github_events.actor_login = u.login
where repo_id in (507775, 60246359, 17165658, 41986369, 16563587, 6838921, 108110, 166515022, 48833910, 156018, 50229487, 20089857, 5349565, 6934395, 6358188, 11008207, 19961085, 206444, 30753733, 105944401, 31006158, 99919302, 50874442, 84240850, 28738447, 44781140, 372536760, 13124802, 146459443, 28449431, 23418517, 206417, 9342529, 19257422, 196353673, 172104891, 402945349, 11225014, 2649214, 41349039, 114187903, 20587599, 19816070, 69400326, 927442, 24494032) and github_events.type = 'WatchEvent' and u.country_code is not null
) s
where repo_id in (507775, 60246359, 17165658, 41986369, 16563587, 6838921, 108110, 166515022, 48833910, 156018, 50229487, 20089857, 5349565, 6934395, 6358188, 11008207, 19961085, 206444, 30753733, 105944401, 31006158, 99919302, 50874442, 84240850, 28738447, 44781140, 372536760, 13124802, 146459443, 28449431, 23418517, 206417, 9342529, 19257422, 196353673, 172104891, 402945349, 11225014, 2649214, 41349039, 114187903, 20587599, 19816070, 69400326, 927442, 24494032) and github_events.type = 'WatchEvent' and u.country_code is not null
group by 1
order by 2 desc;
The geographical distribution of open source database stargazers
OSSInsight.io also allows you to create your own custom insights into any GitHub repository created after 2011.
Published at DZone with permission of Max Liu. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments