Time Series Compression Algorithms and Their Applications

In this article, learn more about time series compression algorithms along with their role in real-world applications in different sectors.

Rosana de Oliveira Gomes

Sep. 26, 22 · Analysis

Likes (8)

Comment

Save

7.5K Views

This is an article from DZone's 2022 Database Systems Trend Report.

For more:

Read the Report

Time series is present in our daily lives in multiple sectors of society, such as finance, healthcare, and energy management. Some of these domains require high data volume so that insights from analysis or forecasting the behavior of target variables can be obtained. Transferring and processing high data rates and volume across platforms with several users requires storage and computer power availability. Compression techniques are a powerful approach to avoid overwhelming systems. In what follows, time series compression algorithms will be discussed along with their role in real-world applications in different sectors.

What Is Time Series?

Time series is defined as a sequence of values of a quantity obtained at successive times, often with equally spaced intervals. We experience the use of timestamped data from when we monitor our exercises with a fitness app to when we track our pizza delivery traveling through the city all the way to our doorstep. Time series is relevant to problems when understanding the evolution of a variable over time is needed, such as understanding the time profile of a variable or forecasting their values.

Time series most commonly appear in the form of timestamped numerical data in a tabular format. Audio data itself is already represented as a time series, as it is defined in terms of frequencies. However, although time series are themselves a data type, they can also be combined with other data types in order to produce more complex entities, which contain an embedded temporal aspect such as:

A sequence of images over time defines a video.
A time series of coordinate pairs from geospatial data defines a tracking path.

Time series data requires specific techniques in order to obtain insights not only from the different patterns in data, but also among those over time. In a data science pipeline, these techniques are employed during the preprocessing, analysis, and modeling steps. The most common time series techniques are illustrated in Figure 1 below:

Figure 1: Time series methods

Data Compression

Data compression is the process of transforming data in order to reduce the number of bits necessary to represent it. This process is done via the alteration of the data through encoding or the rearrangement of its bits structure. Compression is a valuable technique utilized in scenarios where resources are crucial for storage, processing, and transmission of data. Two extremes of such scenarios are:

Limited resources scenario, where the available storage and processing are limited by costs
Big data scenario, where a high-frequency data influx requires efficient data management

The process of data compression involves encoding the data into a smaller format. In order to perform the reverse process, a decoder is needed to decompress the data, as illustrated in Figure 2:

Figure 2: Data compression scheme

Compression expresses the same information present in the data in a smaller format with less bits. Since it searches for patterns in data that can be encoded, it is a computationally expensive procedure that may demand time and memory. State-of-the-art compression techniques are:

Lossless compression identifies and removes data redundancyin a way that no information is lost in the process. The decoded data is restored exactly to its original state.
- Common uses: databases, emails, spreadsheets, documents, and source code
Lossy compression identifies and permanently removes redundant data bits, not making it possible to recover the original data after decompression.
- Common uses: audio, images, graphics, videos, and scanned files

The trade-off between accuracy and compression, present in the attempt to preserve the data while still addressing the storage bound, is an often-encountered challenge in data compression. Lossless data compression can only shrink data to a certain extent, having Shannon’s information as a threshold. For high-frequency data, lossy compression is needed in order to perform an effective reduction in size.

Table 1:

ADVANTAGES AND DISADVANTAGES OF DATA COMPRESSION
Advantages of Compression	Disadvantages of Compression
Reduction of file size and storage usage costs	Time-consuming for large data volume
Increase data reading/writing speed due to the reduction of memory gaps during disk storage	Algorithms need intensive processing from the system, which becomes costly for large data volume
Faster file transfer via the internet, requiring less computational resources	Quality of decompressed data may depend on level of compression
Algorithms can be used to approximate and/or predict the data, as well as identify noise	Requires a decoder program in order to decompress the files

Compression Methods for Time Series

The rise of big data and use of smart devices reveal a demand for powerful compression techniques able to fulfill the processing needs of industries that rely on time series data. In the case of high frequencies (around 10kHz), even databases that specialize in time series data can get overloaded. Compression algorithms are widely explored due to their high-value returns. The quality of a compression technique is measured by its compression ratio (between compressed and original files), speed (measured in cycles per bite), and the accuracy of the restored data.

Time series compression algorithms take advantage of specific characteristics in time series produced by sensors — such as the fact that some time series segments often repeat themselves in the same or other related time series (redundancy), or the possibility to recover a time series via approximating it by functions or predicting them through neural network models. The state-of-the-art methods are listed below:

Table 2:

TIME SERIES COMPRESSION ALGORITHMS
Algorithm	Description	Common Methods	Performance
Dictionary-based	Represents time series through a series of common segments, using a dictionary to translate the segments into content. The segments’ size determines accuracy and compression.	TRISTAN is an algorithm divided into a learning and a compression phase, with a dictionary that may contain typical patterns or that learns them from a training set. CORAD is an extension of the latter that considers autocorrelations to improve compression and accuracy	Effective for datasets with high redundancy. Can be lossy and lossless.
Function approximation	Divides the time series into segments and applies a function to approximate each of them. Each method follows a different family of functions.	Piecewise polynomial approximation (PPA) and Chebyshev polynomial transform (CPT) are two lossy techniques that split a time series into several segments and fit polynomial functions to them. The Discrete Wavelet Transform (DWT) method approximates time series to wavelet functions.	Suitable for smooth time series, low compression ratios, and high accuracy.
Sequential algorithms	Sequential combination of several compression techniques. The most common are Huffman coding, delta encoding, run-length encoding, and Fibonacci binary coding.	Delta encoding, run-length, and Huffman (DRH) is a method that requires low computational power. Spritz is designed for high decompression speed and low energy consumption. Run-Length Binary Encoding (RLBE) is developed for low memory and computational resources devices. RAKE is an algorithm with a preprocessing and a compression phase that utilizes sparsity to compress the data.	Majority of methods are lossless and computationally efficient. Suitable for Internet of Things (IoT) devices that have limited computational resources.
Autoencoders	Neural network architectures composed of a symmetric pair of encoder and decoder. Trained to generate an output that reproduces the input passed to it.	Recurrent Neural Network Autoencoder (RNNA) methods consider a time-dependent neural network that has a lossy compression and a loss threshold parameter.	Accuracy and compression ratio depend strongly on the ability of the RNN of finding patterns in the training set.

Time Series Applications

The compression algorithm to be chosen for a certain problem depends on the domain of the application and data in question. Applications of compression can be found in many sectors, with multimedia through the compression of images, video, and audio data being the most popular. In particular, time series compression is used in crucial industries. Time series use cases in different sectors and the highlights on compression in such applications are shown in Table 3. In all use cases, the advantages presented in Table 1 are also applicable.

Table 3:

TIME SERIES USE CASES
	Medicine	Maintenance	Energy	Economics
Use case	Monitoring of multiple life signals of patients integrated into a warning system to guarantee full time assistance.	Monitoring of industrial equipment and further automated report of equipment status, ensuring safety and efficient production.	Short-term forecast of energy consumption by smart meters.	Data collected at high frequencies reports the status of stock market statistics in real time.
Compression added value	Faster data processing for performing calculations, such as triggering warnings.	Easier and cheaper storage of large data volumes, making it affordable for manufacturing companies to adopt data-driven solutions.	Encoding algorithms help gather insights from the data, like noise and behavior, making more accurate forecasts.	Faster transmission of information through a large network, permitting users to make decisions in real-time.

Conclusion

We live in the era of big data, where over 250 exabytes of data are produced every day, from which a large portion is present in the form of time series in a broad range of industries (note: 250 exabytes = 250×10 to the 18th power). Time series compression techniques are a powerful approach to efficiently collect, store, manipulate, and transfer data, which is crucial for database maintenance and the implementation of robust data management pipelines.

The great adoption of smart devices has also increased the need for compression techniques that are suitable for scenarios of low computational resources, as in IoT. This article presented both lossy and lossless techniques that are suitable for multiple time series profiles and applications scenarios and discussed the limitations and strengths of such methods.