The Future of Mobile Observability Is OpenTelemetry

Examine how OpenTelemetry enables the extension of observability practices all the way to the impact and behavior of their users.

Andrew Tunall

Jun. 14, 24 · Opinion

Likes (2)

Comment

Save

3.5K Views

OpenTelemetry is built on the premise of transparent, portable, and extensible data collection. While these practices are changing the way development teams work for server-side infrastructure and application monitoring, these same principles have not been realized for the client-side layer, often labeled "RUM" in legacy terminology.

But that’s changing rapidly.

New observability vendors are building on OpenTelemetry from day one, while legacy vendors are adding basic support for OpenTelemetry across their platforms. For companies that want to modernize their observability practices, the writing is on the wall: OpenTelemetry is the future of full-stack visibility.

The Drive for Modern Observability

Modern, forward-thinking organizations that really value their users don’t just want their observability practices to stop at the edge of the data center – they want them to extend all the way through understanding the impact (and behavior) of their users.

Users connect to business KPIs while SLAs really don’t. Businesses want their engineering teams to work on what matters, and that means measuring work based on how it affects the company. Engineering teams thus need to connect their observability directly to user outcomes so that they can understand the true business impact of technical failures.

For backend teams working in a silo of infrastructure and service health metrics, this requires better collaboration and data sharing with frontend teams. Engineering teams need to speak the same language and access the same datasets to draw insights directly from user experiences.

Let’s briefly address the philosophy behind these changes and what the longer-term vision to modernize observability practices looks like.

There’s a Glaring Problem That’s Separating Frontend and Backend Teams

One of the biggest challenges site reliability and developer teams consistently face is an inability to integrate insights from their user-facing web and mobile apps into their observability practice. Ideally, frontend teams collect data about the health of end-user experiences, and backend teams collect data about the health of infrastructure and services.

Today it's common for these to be entirely separate tools that don’t share a common set of telemetry, don’t interoperate, and thus prevent engineering teams from speaking the same language.

Companies want to work on what matters, and that requires understanding where to invest engineering resources to deliver the best user experiences. Engineering teams are increasingly being judged on business KPIs, but they do not have the visibility to effectively collaborate across frontend and backend.

Broadly speaking, there are some key observability needs for companies building best-in-class mobile apps and user experiences:

Prioritizing issues and outages by understanding the actual user impact, which is only possible by connecting backend observability data directly to end-user experiences.
Providing visibility into complete user experiences with deep context that highlights root causes among combinations of behavioral and technical factors.
Making decisions based on business impact by connecting backend and frontend issues directly to business KPIs.
Resolving issues with streamlined workflows thanks to connected data and collaboration among teams.

And yet, existing solutions leave most teams and companies wanting. They consist of limited crash and error reporting tools, or legacy real-user monitoring tools. They might give some number of sampled stack traces with a set of breadcrumbs, or some highly sampled and extrapolated dataset that tries to answer some key performance metrics questions, but they don’t really allow frontend teams to build observability into their everyday engineering practices with the same level of rigor their peers building services or managing infrastructure do.

Enter Open Standards for Instrumentation and Language Across Best-In-Class Tools

There’s no shortage of options for observability tooling. However, traditional vendors favor proprietary, closed codebases, resulting in a lack of common standards. They may support OTel, but do they really adhere to its principles of open, portable, and extensible? No. Everyone models and collects telemetry differently, which burdens teams to have to invest in proprietary instrumentation across the stack. Changing vendors thus incurs significant engineering costs.

Open standards allow company-wide investment in instrumentation practices, so teams don’t have to re-instrument when changing vendors. Site reliability and developer teams can use the same language and semantics to create telemetry that’s accessible to everyone across the entire engineering org.

In addition, with the rise of open standards solutions, teams are free to build their own tech stacks across any combination of supporting software. With OpenTelemetry becoming the telemetry standard for observability, software vendors now need to do real innovation, based on specialization, to avoid becoming commodity data ingest and visualization tools. After all, the push for open standards is in large part a reaction to the immense toil and lock-in created by existing vendors. Loyalty now requires building the best product, not just a platform that’s difficult to escape from.

The success of modern open-source companies is a testament to sustainable business models built around providing services for non-proprietary software. By collaborating with the worldwide software community, open-source software vendors benefit not just from a larger base of core code contributors, but also from a healthy ecosystem of supporting products like plugins, extensions, and connectors to better interoperate with other software.

Finally, let’s not ignore the fact that observability customers greatly outnumber vendors, and endlessly find new ways to innovate with their own SDKs, third-party libraries, and development patterns. Observability vendors cannot keep up with this endless complexity, and when they tell you that instrumentation for your custom library is on the roadmap and they’ll be delivering it soon, they’re probably lying to you. Without common standards, this often means teams have to go without visibility into key functionality or be forced to build their own custom tooling.

Telemetry From User-Facing Apps Doesn’t Make Sense When It Looks Like APM Because Your Customers Are Not Computers

Open standards alone are unfortunately not enough to ensure telemetry is actionable for site reliability and developer teams. You need investment from experts to ensure data modeling and collection truly capture the key signals and context needed to understand user experiences. That way, when backend teams investigate an outage, they can connect it to not just a raw number of affected users, but also the usage patterns and, ultimately, the negative business impact it’s causing.

Connecting frontend and backend observability is crucial because otherwise, site reliability and developer teams are operating from guesswork:

Site reliability teams are forced to assume user and business impact from service and infrastructure metrics.
Developer teams are tasked with magically solving issues that are out of their control.

Site reliability and developer teams need visibility into endless combinations of variables across user behaviors, network connectivities, heterogeneous devices, operating systems, app versions, and more. While backend sessions take milliseconds and are mostly chains of individual service calls, mobile sessions can span minutes, hours, or even days and the “state of the system” often is totally unique to that individual user, and how they used your app.

And finally, simply knowing you had a crash, an error, or a simple increase in duration for a key activity in your app isn’t enough. Ultimately, you want to know the impact, not just the impact of metrics in a vacuum. Did increasing the duration of that activity materially decrease app engagement, and is the root cause a service getting incrementally worse? The difference between truly excellent engineering organizations and those checking the boxes is extending their field of vision to the user.