Caching With Apache HTTP Client and Spring RestTemplate
We take a look at server and client caching, and how Apache and Spring can help developers implement them in their applications.
Join the DZone community and get the full member experience.
Join For FreeServer Caching Is a Hard Job
In a typical SOA web application, we have a web server that gathers data from many backend services and generates HTML output for the user's browser.
Chances are that some services are performing too slow and you are thinking about adding a caching mechanism to them. The Spring Framework has the cache abstraction out of the box and Hibernate has the second-level cache that can help you to improve the performance of services. But hold on! Caching on the service layer is not as easy as you might think. From our experience, there are two burdens that can make you regret deciding to do so.
1. You have to always be concerned about cache eviction whenever you change or develop new APIs for the service. Let's see some example situations. Suppose you have a very simple product service with the only APIs for CRUD operations. At this state, it is not yet too difficult to manage to cache for this service, you just have to cache the READ API and clear the cache when UPDATE and DELETE APIs are called.
Imagine you have to add a new API to search for a list of products using the input criteria. Instead of just thinking only about the application logic, you now have to define another cache storage for the output of this API and be careful not to forget to clear the cache in the CREATE, UPDATE, and DELETE APIs.
It can get harder still. Say you have to create a new API to update product stock when users update their carts or confirm orders. You have to write some not-so-simple code to clear the cache of all products in the given cart. Chances are that you will get called on your happy Saturday night because you forget to add logic to clear the cache of the product list of the search API, which is causing the customers to see the wrong product stocks in the search result page of your website.
2. It will introduce a nightmare into the horizontal scaling of your service. Horizontal scaling is simple. You deploy more instances of your service to share the work distributed by a load balancer. No code change and you develop the service without the concern that it will be run in multiple instances. That's true only when you haven't added caching to the service. With the cache, you have to find some way to manage them across all instances. Mature cache engines such as Ehcache support this but it is still not an easy task.
HTTP Caching to the Rescue!
HTTP standards already define a mechanism to handle caching efficiently by having the client manage the cache storage and having the server check the validity of the cached resources. All HTTP client engines are supposed to support this standard and web browsers we use everyday are an example of an HTTP client. In the simplest form, the cache storage is in the web browser which works when the web server returns an HTTP response with appropriate headers.
From the picture above, when the web browser sends a request to http://mywebsite.com/popular-products
for the first time, the web server sends back the response with headers
Cache-Control: public
Last-Modified: Fri, 27 Jul 2018 12:45:26 GMT
The web browser then understands that this response is "cacheable" and that it was modified onFri, 27 Jul 2018 12:45:26 GMT
, thereby the response is stored in the browser's cache. When the user refreshes the page, the browser sends the same request but this time with an additional header If-Modified-Since: Fri, 27 Jul 2018 12:45:26 GMT
. The web server, finding the header, uses the date in the header to check if there has been any changes made to the resource since a specified time. If so, it returns the new version of the resource as if it is the first-time request, otherwise, it only responds using HTTP status 304: Not Modified
with empty body to tell the browser to use the resource from its cache.
Explicit Modified Date Re-Validation Style
There are some variant flows for HTTP caching supported by the standard such as when you cannot use the modified date to determine if the resource has changed. You have an option to hash the entire response and put the hash value in an ETag header or you can tell the browser to keep using the resource from its cache without asking the server at all by specifying the resource's max-age in the Cache-Control
header.
In this article, we will focus only on using Last-Modified
together with If-Modified-Since
, which we will refer to as the explicit modified date re-validation style. It has some advantages over the other variants, the only constraint is that you have to always track the last modified date of the resource, which you should do that by design even if it's not for caching.
With this style, the resource is validated by the origin server every time, which enables the safe control flow where the origin server has the authority to refresh the resource. The cost of data transferring over the network is not an issue as the parties mostly communicate with an empty body. The origin server itself has the chance to skip the heavy business logic when it finds that the resource is not modified.
Browser Caching in SOA
In the first section, we talked about the SOA application where our product service is too slow. Lets see what it looks like when we add browser caching supports to our application.
The browser sends a first-time request to http://mywebsite.com/popular-products
. The web server then requests the popular products from the service using a predefined query http://product-service/products?orderBy=perchaseCount&page=0&size=10
. The product service puts the headers in the response. The web server then renders the HTML and forwards the headers to the browser.
Cache-Control: public
Last-Modified: Fri, 27 Jul 2018 12:45:26 GMT
When the browser sends the second request with If-Modified-Since
in the headers, the web server forwards this header to the product service. The service then uses the value in the header to check if the products
table in the database has had any modifications since the specified time. When not modified, it responds with the HTTP status 304
. The web server, in turn, forwards the 304
response to the browser.
Browser Caching Is Not Enough!
Caching in the web browser can help us a lot in the case where a single user accesses the same resource many times. But what about when many users access the resource? Our poor new users still have to deal with the slow responses caused by the product service.
Fortunately, in our SOA application, we have the light-weight web server in the middle tier between the sluggish product service and the users. Our web server is another kind of the aforementioned HTTP client that should support HTTP caching standard. We can handle caching between the web server and the product service in the same way we have done so with the browser and the web server.
With this architecture, the new users can have the fast result from the web server's cache. The later requests can still, optionally, use the browser cache to further help with data transferring load.
In the subsequent sections, we will show how to handle cache control headers in a product service, a typical Spring REST API, and how we use Spring's RestTemplate together with Apache HTTP Client at the web server to handle caching and forwarding the cache control header to the web browser.
Getting Started
Initialize two Maven projects for the product service and the web server.
Product Service Dependencies
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.0.RELEASE</version>
<relativePath/>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
Web Server Dependencies
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.0.RELEASE</version>
<relativePath/>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient-cache</artifactId>
</dependency>
</dependencies>
Product Service
In product service project, we have a simple REST controller with a single API for searching products with given criteria from request parameters.
@RestController
@RequestMapping("/products")
public class ProductApiController {
private ProductService productService;
public ProductApiController(ProductService productService) {
this.productService = productService;
}
@GetMapping
public ResponseEntity<List<ProductDTO>> searchProducts(
@RequestParam MultiValueMap<String, String> params,
WebRequest webRequest) {
ZonedDateTime productTableLastModifiedDate = productService.getProductTableLastModifiedDate();
if (webRequest.checkNotModified(productTableLastModifiedDate.toEpochSecond())) {
return null;
}
List<ProductDTO> productList = productService.searchProducts(params)
.stream()
.map(ProductDTO::new)
.collect(Collectors.toList());
return ResponseEntity.ok()
.header(HttpHeaders.CACHE_CONTROL, CacheControl.empty().cachePublic().getHeaderValue())
.body(productList);
}
}
There are three remarkable points in this API.
We have a service method to get the last modified date of the product table. Modern RDBMSs store this information out of the box.
We use Spring's
WebRequest#checkNotModified()
to handle some of the boring work. We give it the last modified date of our resource, it then compares the date with the input headerIf-Modified-Since
and set the response headers and status304
appropriately. We only have to return null body if the method returntrue
.If the resource has modified or the request header has no
If-Modified-Since
in headers, we process the normal case calling the sluggish method to find the product from the database. We then setCache-Control: public
in the response header to tell the client that this resource is cacheable.
Web Server
At the web server, we have an MVC controller that uses RestTemplate
to call to product search API and render the result in popular-products.html
.
@Controller
@RequestMapping("/")
public class ProductWebController {
private RestTemplate restTemplate;
public ProductWebController(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
}
@GetMapping("popular-products")
public String renderPopularProductPage(
Model model,
HttpServletRequest request,
HttpServletResponse response) {
UriComponents uriComponents = UriComponentsBuilder
.fromHttpUrl("http://localhost:8081/products")
.queryParam("page", 0)
.queryParam("size", 10)
.queryParam("orderBy", "purchaseCount")
.build();
String ifModifiedSince = request.getHeader(HttpHeaders.IF_MODIFIED_SINCE);
HttpHeaders headers = new HttpHeaders();
if (ifModifiedSince != null) {
headers.set(HttpHeaders.IF_MODIFIED_SINCE, ifModifiedSince);
}
HttpEntity<Object> httpEntity = new HttpEntity<>(headers);
ResponseEntity<List<ProductDTO>> apiResponse = restTemplate.exchange(
uriComponents.toUri(),
HttpMethod.GET,
httpEntity,
new ParameterizedTypeReference<List<ProductDTO>>() {});
if (apiResponse.getStatusCode().equals(HttpStatus.OK)) {
List<ProductDTO> productList = apiResponse.getBody();
model.addAttribute("productList", productList);
String lastModified = apiResponse.getHeaders().getFirst(HttpHeaders.LAST_MODIFIED);
if (lastModified != null) {
response.setHeader(HttpHeaders.LAST_MODIFIED, lastModified);
}
String cacheControl = apiResponse.getHeaders().getFirst(HttpHeaders.CACHE_CONTROL);
if (cacheControl != null) {
response.setHeader(HttpHeaders.CACHE_CONTROL, cacheControl);
}
return "popular-products";
} else if (apiResponse.getStatusCode().equals(HttpStatus.NOT_MODIFIED)) {
response.setStatus(HttpStatus.NOT_MODIFIED.value());
return null;
} else {
throw new RuntimeException("Got unexpected response from product service");
}
}
}
In the code, there are works handled before and after calling the API. First, we check if the request from the browser has If-Modified-Since
in the header, if so we forward it the API. After the call, if the response is 200: OK
, we add the header Cache-Control
and Last-Modified
to the browser before rendering the HTML. But if the response is 304: Not Modified
, we forward the response status to the browser with an empty response body.
For the injected RestTemplate
, let's define the simple version for now. We will come back to add caching configuration in this class later.
@Configuration
class RestTemplateConfiguration {
@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
}
}
And for the popular-products.html
, we print the current time when the page is rendered, the time should not change if the page is from the cache.
<html>
<body>
<h1>
Response At : <span th:text="${#dates.format(#dates.createNow(), 'dd MMM yyyy HH:mm:ss')}"></span>
</h1>
</body>
</html>
Browser Caching Test
Let's test if the caching is working in the browser.
Since we have to run the two applications together, we have to set them to run in different ports. We can do this by creating the file src/main/resources/application.properties
in both projects and putting the following line in product service so that it runs on port 8081.
server.port = 8081
And for the web server, we would like that it runs on port 8080
server.port = 8080
Now, start both applications together, open a web browser, and navigate to http://localhost:8080/popular-products
. You will see the actual generation time of the response written on the web page.
Pressing F5 to refresh the page, you will see that the time doesn't change. But if you press ctrl+F5 to force the browser to get the fresh resource from the server, you will see the changes.
If you use Google Chrome or a browser with similar features, you can press F12 to bring up the developer toolbar. Go to tab "Network" and press F5 again, you will see that the response status is 304
.
HTTP Caching Configuration
Let's configure cache for RestTemplate
, change the code in RestTemplateConfiguration
to the following:
@Configuration
public class RestTemplateConfiguration {
@Bean
public RestTemplate restTemplate() {
SimpleClientHttpRequestFactory requestFactory = new SimpleClientHttpRequestFactory();
RestTemplate restTemplate = new RestTemplate(requestFactory);
// BufferingClientHttpRequestFactory allows us to read the response more than once - Necessary for debugging.
restTemplate.setRequestFactory(new BufferingClientHttpRequestFactory(new HttpComponentsClientHttpRequestFactory(httpClient())));
return restTemplate;
}
@Bean
public HttpClient httpClient() {
return CachingHttpClientBuilder
.create()
.setCacheConfig(cacheConfig())
.build();
}
@Bean
public CacheConfig cacheConfig() {
return CacheConfig
.custom()
.setMaxObjectSize(500000) // 500KB
.setMaxCacheEntries(2000)
// Set this to false and a response with queryString
// will be cached when it is explicitly cacheable .setNeverCacheHTTP10ResponsesWithQueryString(false)
.build();
}
}
Most of the code is self-explanatory. There are other configuration options to try. But in order to do it properly, you have to understand the behavior of the cache engine. Chances are that you have to download the source code of the cache engine and gradually debug the application and explore its implementation. The following are some notes from our investigation of Apache Caching HTTP Client.
The cache key is formed by combining the following elements of the request URL to the API endpoint:
hostname + port + path + query-string
. This means you should be careful if the API can return different results with regard to some value in the request header.By default, it does not cache requests with query strings in the URL, so you have to enable it like so:
CacheConfig.setNeverCacheHTTP10ResponsesWithQueryString(false)
To explicitly declare that the response is cacheable, the API should put
Expires
andDate
in headers and the value ofExpires
should be greater than the value ofDate
, or putCache-Control
in the headers with value in one of following entries:max-age, s-max-age, must-revalidate, proxy-revalidate or public
.Only requests with the methods
GET
orHEAD
will be cached.Only responses with the status
200, 203, 300, 301, 410
will be cached.Responses with the header
Content-Length
greater than the configuredmaxObjectSize
will not be cached.Responses with the header
Age
greater than 0 will not be cached. Note that the headerAge
is the time in seconds that the object has been stored in a proxy cache. In this case, it means that only the response from origin server will be cached.Responses without the header
Date
will not be cached.Responses with the header
Expires
greater thanDate
will not be cached.Responses with the header
Vary = *
will not be cached.Responses with
Cache-Control
inno-store
orno-cache
will not be cached.If the cache is configured as a shared cache, it will not cache a response with the header
Cache-Control: private
.If the cache is shared it will not cache a request with the header
Authorization
unless the response explicitly has aCache-Control
value ofs-maxage
,must-revalidate
orpublic
.
Web Server Caching Test
To test the caching of the Apache Caching HTTP Client, let's enable its log by putting this line in application.properties
of the web server project.
logging.level.org.apache.http = TRACE
Start the two applications, navigate to http://localhost:8080/popular-products
.
For the first request, you will see the log similar to the following which shows that the cache was missing and that the API returned a response status 200
.
o.a.h.i.c.cache.CacheableRequestPolicy : Request was serveable from cache
o.a.http.impl.client.cache.CachingExec : Cache miss
o.a.http.impl.client.cache.CachingExec : Cache miss [host: http://localhost:8081; uri: /products?page=1&size=10&purchaseCount=50]
o.a.http.impl.client.cache.CachingExec : Calling the backend
...
org.apache.http.headers : http-outgoing-0 << HTTP/1.1 200
org.apache.http.headers : http-outgoing-0 << Last-Modified: Sun, 18 Jan 1970 17:45:17 GMT
org.apache.http.headers : http-outgoing-0 << Cache-Control: public
...
o.a.http.impl.client.cache.CachingExec : Handling Backend response
Press F5 while in the browser and then come back to see the log. You will see the cache was hit but needs re-validation from the API server which then returned 304
to tell the cache engine that the cache is still valid.
o.a.h.i.c.cache.CacheableRequestPolicy : Request was serveable from cache
o.a.http.impl.client.cache.CachingExec : Cache hit [host: http://localhost:8081; uri: /products?page=1&size=10&purchaseCount=50]
h.i.c.c.CachedResponseSuitabilityChecker : Cache entry was not fresh enough
o.a.http.impl.client.cache.CachingExec : Revalidating cache entry
...
org.apache.http.headers : http-outgoing-1 >> GET /products?page=1&size=10&purchaseCount=50 HTTP/1.1
org.apache.http.headers : http-outgoing-1 >> If-Modified-Since: Sun, 18 Jan 1970 17:45:17 GMT
...
org.apache.http.headers : http-outgoing-1 << HTTP/1.1 304
Conclusion
Caching at the resource origin is hard. It's better to delegate the caching work to the client, which then you can scale out the resource server easily.
Caching in the web browser is free as it is implicitly supported by most browsers but it is not enough to replace the server cache because it is not shared across many users.
Middle tier caching is the best of both worlds. It removes the complexity of caching at the resource server and can also serve the cached resource for many users.
Finally, you can find the complete source code in this article on GitHub.
Opinions expressed by DZone contributors are their own.
Comments