Compress Your Data Within Elasticsearch

Learn how to compress your data within Elasticsearch to reduce network latency.

Hakan Altındağ

Jun. 22, 20 · Tutorial

Likes (4)

Comment

Save

15.3K Views

Compressing is awesome, making something smaller than the original size sounds like magic but it is possible. We know it from our WinRar, 7Zip or other tools. Even Elasticsearch has a property to compress the data which will be tossed between the nodes and the clients, this could be very useful to reduce network latency when handling huge responses from Elasticsearch. Within this article we will cover the following topics:

Enable HTTP/TCP compression
Handling compressed responses
- Elasticsearch 7.7 and below
- Elasticsearch 7.8 and upwards
- Future Elasticsearch release 7.9 and 8.0

Most of us are already familiar with Elasticsearch from Elastic when working with application logs, but a-lot of people never heard about. Below is a short summary:

What is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open-source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.

Enable HTTP/TCP Compression

Elastic has made it really easy to enable http compression on their nodes. Just providing the following properties within the elasticsearch.yml file will do the trick:

     YAML 
   
 x 
         
http.compression: true
http.compression_level: 1

Or the following property for tcp compression:

     YAML 
   
xxxxxxxxxx
 
transport.compress: true

Handling Compressed Responses

With the changes from the previous section we enabled compression. When enabling tcp compression you don't need to do anything for handling the compressed data. Elasticsearch will use tcp communication protocol to communicate between the different Elasticsearch nodes and it is able to decompress that by itself. But when you have enabled http compression your client (terminal, postman, java client) needs to know how to decompress that or else you will get not human-readable data. For this article we will focus on the java client.

18th of June 2020 Elastic has released Elasticsearch 7.8 with their java library which makes handling compressed data easier, see here the release notes: Elasticsearch 7.8 release notes

Even though you enabled Elasticsearch to send you compressed data, Elasticsearch will only compress it when the client is requesting for it. The java client can request for it by sending additional request options within the http request, see below for an example:

     Java 
   
          x 
         
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
    .addHeader("Accept-Encoding", "gzip")
    .addHeader("Content-type", "application/json");

Handling Compressed Responses Elasticsearch 7.7 and Below

The Java library of Elastic provides two clients: Rest High Level Client and Low Level Rest Client. The high level client doesn't support handeling compressed responses. It will throw a runtime exception when it is receiving a compressed response. The low level client will provide you the raw response send by Elasticsearch and therefor it is possible to decompress that. There are multiple ways, but we will cover two ways for this article:

     Java 
   
          x 
         
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
    .addHeader("Accept-Encoding", "gzip")
    .addHeader("Content-type", "application/json");
Request request = new Request("GET", "test/_search");
request.setOptions(requestOptions);
Response response = client.getLowLevelClient().performRequest(request);
byte[] entity = EntityUtils.toByteArray(response.getEntity())
String decompressedResponse = "";
try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(entity);
     GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream);
     BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream, StandardCharsets.UTF_8));) {
    
    decompressedResponse = bufferedReader.lines()
                    .collect(Collectors.joining());
}
System.out.println(decompressedResponse)

It can also be decompressed with the following snippet:

     Java 
   
          x 
         
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
    .addHeader("Accept-Encoding", "gzip")
    .addHeader("Content-type", "application/json");
Request request = new Request("GET", "test/_search");
request.setOptions(requestOptions);
Response response = client.getLowLevelClient().performRequest(request);
String decompressedResponse = EntityUtils.toString(new GzipDecompressingEntity(response.getEntity()))
System.out.println(decompressedResponse)

Handling Compressed Responses Elasticsearch 7.8

The rest high level client has now the feature with release 7.8 to automatically decompress compressed data. The example above could be rewritten:

     Java 
   
x
 
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
    .addHeader("Accept-Encoding", "gzip")
    .addHeader("Content-type", "application/json");
  
SearchRequest request = new SearchRequest("twitter");
SearchResponse searchResponse = client.search(searchRequest, requestOptions);
System.out.println(decompressedResponse)
String decompressedResponse = EntityUtils.toString(new GzipDecompressingEntity(response.getEntity()))

So you as a developer don't need to write additional logic to handle the response. If you are using the low level REST client you still need to write your own custom decompression logic as seen within the previous example above.

Handling Compressed Responses with Future Elasticsearch Release 7.9 and 8.0

With the upcoming release the low level client will also have the built in decompression feature, see the details within this pull request: 55413. This will make the developer experience of the Java-library users much better as it doesn't require any custom decompression logic within your code base. The code example for the low level client will be:

     Java 
   
          x 
         
RequestOptions.Builder requestOptions = RequestOptions.DEFAULT.toBuilder()
    .addHeader("Accept-Encoding", "gzip")
    .addHeader("Content-type", "application/json");
Request request = new Request("GET", "test/_search");
request.setOptions(requestOptions);
Response response = client.getLowLevelClient().performRequest(request);
String responseBody = EntityUtils.toString(response.getEntity())
System.out.println(responseBody)

This change within the low-level client will also have breaking change for their end-users. If you already have your own decompression logic it will probably throw a runtime exception as it will try to decompress not compressed data (because it is already decompressed by the client). The rest high-level client remains untouched and still has the same ability to decompress compressed data out of the box.

Hope you enjoyed reading the small, yet big change within the Java API for the different versions of Elasticsearch!

Elasticsearch Data (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending