Understanding Status Page Aggregation: Inside the Technology of a Typical Status Page Aggregator
Learn how to leverage status page aggregation to transform monitoring by amalgamating uptime data from diverse cloud services' status pages.
Join the DZone community and get the full member experience.
Join For FreeTo explore status page aggregation, we’ll share our experience building a status page aggregator tool – StatusGator, which has been availble for eight years. We will share our technical insights and also share how you can build your own aggregator.
What Is a Status Page Aggregation?
Most IT infrastructure relies heavily on hosted services, cloud applications, and third-party software. Teams might use various services, whether popular cloud services like AWS, Azure, or Google Cloud, customer service tools, or marketing platforms like HubSpot or Salesforce. Whenever any of the services experience downtime, it might take a while for the tech team to understand that an outage of a third-party service causes the problem.
Visiting each service's status page individually can be cumbersome. Surprisingly, some IT helpdesk teams maintain a list of these pages in a Google Sheets file and visit each link manually to check the status. A status page aggregator can help by consolidating all this information into a single, more manageable internal page.
How Is Status Page Aggregation Different From Existing Monitoring Solutions?
Status page aggregation isn't a replacement for traditional monitoring software, but it can certainly complement it. With status aggregation, the data is gathered from official status pages. It means that this data is based on what the service providers officially report. It includes scheduled maintenance, ongoing outages, and performance issues, depending on the transparency of the provider.
This data might not always be accurate because it really depends on how well each provider manages and reports incidents. In our research, we found that CircleCI is good at this, but some other cloud providers aren’t as careful or regular with their reporting.
Despite that, some data may not be accurate, and the tech team doesn’t have to visit each status page when an issue arises.
Overview of the Technology Behind the Status Page Aggregation
Not all status page aggregators are the same or have the same features, but the technology behind them is always consistent. We find it’s built on a three-component system:
Automated Checkers
Numerous automated checkers or bots are at the heart of status page aggregation technology. These checkers routinely visit the status pages of various services, gathering and relaying data back to the central system.
These checkers adapt to changes in status page data to collect up-to-date information. They are typically capable of interacting with the APIs of significant services, utilizing undocumented APIs, or even resorting to HTML scraping.
There are many different status page providers out there, each with its own format for presenting data. Therefore, automated checkers need the capability to interpret and process information not only from widely used formats but also from custom status pages created by providers who create their own.
Normalization of Data
Status aggregators play a crucial role in standardizing data due to the wide range of formats and terms used across different service status pages. They convert various status formats into a uniform display, making it easier to compare information from different services.
The process of data normalization involves sorting service statuses into several clear categories. While this may require some judgment to interpret different terms, it ensures comprehensive coverage of all events. For instance, a status described as "degraded" might be grouped under a warning category, whereas "severe" could indicate a complete service outage.
This categorization helps in sending users notifications that are both relevant and specific to their needs, ensuring they stay informed about crucial service disruptions.
Display and Notifications
Once the data is gathered and standardized, all the service statuses are shown together on one page for easy viewing. The system checks each status page for updates every few minutes, offering both passive and active updates. Passive updates occur when the status pages themselves are updated, while active updates involve proactively sending notifications to users based on their chosen settings, like getting specific alerts for certain service disruptions.
It's common for teams to display an internal status page on a TV in their office, providing a constant, real-time overview of service statuses. Additionally, proactive updates can be integrated with other monitoring systems. This is often achieved through APIs or by sending messages directly to communication platforms like Slack, MS Teams, Discord, etc. These integrations ensure that the relevant teams are swiftly alerted in their preferred environments, allowing for immediate response and action when service disruptions occur.
The Two Approaches to Status Page Aggregation
There are two ways to get a status page aggregation: ready solutions or building your tool.
DIY solution
A DIY status page aggregator usually involves two key elements to get started: the checker and the display.
- The checker’s role is to regularly assess the status of your services, gathering and storing this information. You can use a basic script or a more advanced monitoring system. Make sure to include checkers specific to the services, components, and locations you want to monitor.
- The display component showcases these results in a clear and easy format for your users to understand. For setting up the display, you have options like using a third-party hosted status page or integrating a custom status page on your website.
You’ll need to determine where to host your status page. It should be a webpage on your server or a service like StatusPage.io.
Ready Solutions
With the ready solutions, the checker, data normalization, and display are taken care of for you. These solutions include StatusGator, StatusTicker, and some others.
Crowd Status Solutions
Another solution that is worth mentioning is Downdetector. It collects user feedback on service availability rather than the official information from status pages.
Conclusion
In conclusion, the role of status page aggregation is to enhance incident communication and monitoring processes. This article has outlined three key components of a typical status page aggregator and explored two distinct approaches to its implementation. If you have any questions or need further technical clarification, please feel free to ask in the comments section below. We're here to help!
Opinions expressed by DZone contributors are their own.
Comments