Why Replace External Database Caches?
Although external caches are a great companion for reducing latencies, they often introduce more problems than benefits. Learn more in this post.
Join the DZone community and get the full member experience.
Join For FreeTeams often consider external caches when the existing database cannot meet the required service-level agreement (SLA). This is a clear performance-oriented decision. Putting an external cache in front of the database is commonly used to compensate for subpar latency stemming from various factors, such as inefficient database internals, driver usage, infrastructure choices, traffic spikes, and so on.
Caching might seem like a fast and easy solution because the deployment can be implemented without tremendous hassle and without incurring the significant cost of database scaling, database schema redesign, or even a deeper technology transformation. However, external caches are not as simple as they are often made out to be. In fact, they can be one of the more problematic components of a distributed application architecture.
In some cases, it’s a necessary evil, such as when you require frequent access to transformed data resulting from long and expensive computations, and you’ve tried all the other means of reducing latency. But in many cases, the performance boost just isn’t worth it. You solve one problem, but create others.
Here are some often-overlooked risks related to external caches and ways to achieve a performance boost plus cost savings by replacing their core database and external cache.
Why Not Cache?
We’ve worked with countless teams struggling with the costs, hassles, and limits of traditional attempts to improve database performance. Here are the top struggles we’ve seen teams experience with putting an external cache in front of their database.
An External Cache Adds Latency
A separate cache means another hop on the way. When a cache surrounds the database, the first access occurs at the cache layer. If the data isn’t in the cache, then the request is sent to the database. This adds latency to an already slow path of uncached data. One may claim that when the entire data set fits the cache, the additional latency doesn’t come into play. However, unless your data set is considerably small, storing it entirely in memory considerably magnifies costs and is thus prohibitively expensive for most organizations.
An External Cache is an Additional Cost
Caching means expensive DRAM, which translates to a higher cost per gigabyte than solid-state disks (see this P99 CONF talk by Grafana’s Danny Kopping for more details on that). Rather than provisioning an entirely separate infrastructure for caching, it is often best to use the existing database memory, and even increase it for internal caching. Modern database caches can be just as efficient as traditional in-memory caching solutions when sized correctly. When the working set size is too large to fit in memory, then databases often shine in optimizing I/O access to flash storage, making databases alone (no external cache) a preferred and cheaper option.
External Caching Decreases Availability
No cache’s high availability solution can match that of the database itself. Modern distributed databases have multiple replicas; they also are topology-aware and speed-aware and can sustain multiple failures without data loss.
For example, a common replication pattern is three local replicas, which generally allows for reads to be balanced across such replicas to efficiently make use of your database’s internal caching mechanism. Consider a nine-node cluster with a replication factor of three: essentially every node will hold roughly a third of your total data set size. As requests are balanced among different replicas, this grants you more room for caching your data, which could completely eliminate the need for an external cache. Conversely, if an external cache happens to invalidate entries right before a surge of cold requests, availability could be impeded for a while since the database won’t have that data in its internal cache (more on this below).
Caches often lack high availability properties and can easily fail or invalidate records depending on their heuristics. Partial failures, which are more common, are even worse in terms of consistency. When the cache inevitably fails, the database will get hit by the unmitigated firehose of queries and likely wreck your SLAs. In addition, even if a cache itself has some high availability features, it can’t coordinate handling such failure with the persistent database it is in front of. The bottom line: rely on the database, rather than making your latency SLAs dependent on a cache.
Application Complexity: Your Application Needs to Handle More Cases
External caches introduce application and operational complexity. Once you have an external cache, it is your responsibility to keep the cache up to date with the database. Irrespective of your caching strategy (such as write-through, caching aside, etc.), there will be edge cases where your cache can run out of sync from your database, and you must account for these during application development. Your client settings (such as failover, retry, and timeout policies) need to match the properties of both the cache as well as your database to function when the cache is unavailable or goes cold. Usually, such scenarios are hard to test and implement.
External Caching Ruins the Database Caching
Modern databases have embedded caches and complex policies to manage them. When you place a cache in front of the database, most read requests will reach only the external cache and the database won’t keep these objects in its memory. As a result, the database cache is rendered ineffective. When requests eventually reach the database, its cache will be cold and the responses will come primarily from the disk. As a result, the round-trip from the cache to the database and then back to the application is likely to add latency.
External Caching Might Increase Security Risks
An external cache adds a whole new attack surface to your infrastructure. Encryption, isolation, and access control on data placed in the cache are likely to be different from the ones at the database layer itself.
External Caching Ignores The Database Knowledge And Database Resources
Databases are quite complex and built for specialized I/O workloads on the system. Many of the queries access the same data, and some amount of the working set size can be cached in memory to save disk accesses. A good database should have sophisticated logic to decide which objects, indexes, and accesses it should cache. The database also should have eviction policies that determine when new data should replace existing (older) cached objects.
An example is scan-resistant caching. When scanning a large data set, say a large range or a full-table scan, a lot of objects are read from the disk. The database can realize this is a scan (not a regular query) and choose to leave these objects outside its internal cache. However, an external cache (following a read-through strategy) would treat the result set just like any other and attempt to cache the results. The database automatically synchronizes the content of the cache with the disk according to the incoming request rate, and thus the user and the developer do not need to do anything to make sure that lookups to recently written data are performant and consistent.
Therefore, if, for some reason, your database doesn’t respond fast enough, it means that:
- The cache is misconfigured
- It doesn’t have enough RAM for caching
- The working set size and request pattern don’t fit the cache
- The database cache implementation is poor
A Better Option: Let the Database Handle It
How can you meet your SLAs without the risks of external database caches? Many teams have found that by moving to a faster database with a specialized internal cache, they’re able to meet their latency SLAs with less hassle and lower costs.
Although external caches are a great companion for reducing latencies (such as serving static content and personalization data not requiring any level of durability), they often introduce more problems than benefits when placed in front of a database.
The top tradeoffs include elevated costs, increased application complexity, additional round trips to your database, and an additional security surface area. By rethinking your existing caching strategy and switching to a modern database providing predictable low latencies at scale, teams can simplify their infrastructure and minimize costs. At the same time, they can still meet their SLAs without the extra hassles and complexities introduced by external caches.
Published at DZone with permission of Felipe Mendes. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments