Battle of the Views: ClickHouse Window View vs. Live View
ClickHouse Window Views and Live Views are alternatives to query newly arrived data from event streams. In this article, we compare and contrast their usage.
Join the DZone community and get the full member experience.
Join For FreeClickHouse added experimental Window Views starting from 21.12. Window Views aggregate data over a period of time and then automatically post at the end of the period. They add another exciting tool for stream processing and analytics to ClickHouse’s toolbox while expanding its collection of view tables. In general, the concepts behind Window View can also be found in other systems, such as Azure Stream Analytics, Kafka Streams, and Apache Spark, among others.
Window Views follow the addition of experimental Live View tables added in 19.14.3.3. In this article, we will look at both of them to find the differences and similarities between the two. Let’s get a better view of these two different views!
General Applications
Window View allows you to compute aggregates over time windows, as records arrive in real-time into the database, and provides similar functionality that is typically found in other stream processing systems.
Typical applications include keeping track of statistics, such as calculated average values over the last five minutes or a one-hour window, which can be used to detect anomalies in a data stream while grouping data by time into one or more active time windows.
On the other hand, Live View is not specialized for time window processing and is meant to provide real-time results for common queries without the need to necessarily group arriving records by time.
Some typical applications, as pointed out in the documentation, include providing real-time push notifications for query result changes to avoid polling, caching the results of the most frequently used queries, detecting table changes and triggering follow-up queries, as well as real-time monitoring of system table metrics using periodic refresh.
Window View
Window View is designed to work with Time Window Functions, in particular Tumble and Hop. These time window functions are used to group records together over which an aggregate function can be applied. The time reference can either be set to the current time, as provided by the now()
function, or as a table column.
To better understand Time Window Functions, we can visualize them as follows:
For processed results, the Window View can either push to a specified table or push real-time updates using the WATCH query.
Live View
On the other hand, Live View is not tied to any group by function. Instead, it allows you to provide streaming query results for queries where the result of the query can be computed by applying the same query on the current data and the new data separately and merging the results of the two to compute the final result. Live View query processing can be visualized as follows:
Live View can output the computed result by SELECT
ing from it, or it can push real-time updates using the WATCH
query.
Comparison in Action
Let’s use the following source table to see how Window View and Live View work in action. We will use ClickHouse 22.3.6.5 for our examples.
CREATE TABLE data (`id` UInt64, `timestamp` DateTime) ENGINE = MergeTree() ORDER BY tuple();
Because both features are experimental, we need to ‘set’ the corresponding settings to make them available.
set allow_experimental_live_view = 1 set allow_experimental_window_view = 1
Example: Counting the Number Of Events Per Interval
The default example provided in the Window View documentation shows an example of how we can use Window View to count the number of events per 10 seconds using the live data that comes into our data
table. In this case, the tumble() time window function is ideal to provide non-overlapping fixed interval time windows. The tumbleStart() function can be used to get the front edge of the time window.
CREATE WINDOW VIEW wv AS SELECT count(id), tumbleStart(w_id) AS window_start FROM data GROUP BY tumble(timestamp, toIntervalSecond('10')) AS w_id
Let’s try to do something similar using a Live View. Our first attempt uses the following query:
CREATE LIVE VIEW lv AS SELECT count(id), toStartOfInterval(timestamp, toIntervalSecond(10)) AS window_start FROM data GROUP BY window_start
Now, let’s start WATCH
ing both views while inserting records into the source table and observing the results. We will need three active ClickHouse client sessions, as two of them will be used to run WATCH
queries, and the third will be used to perform INSERT
s.
Open clickhouse-client1
and execute:
WATCH wv
Then, open clickhouse-client2
and execute:
WATCH lv
You should see that both WATCH
queries are stuck without any output, as the WATCH
query blocks to output an infinite stream of results until it is aborted if the LIMIT
clause is not specified. There is no data in our source data
table and, therefore, no output yet.
Next, let’s insert some rows into the source data
table in clickhouse-client3
:
INSERT INTO data VALUES(1,now())
We can immediately observe that WATCH lv
query provides immediate output while WATCH wv
output is delayed until the time window closes.
Both outputs after 10 sec are the following.
| WATCH lv
┌─count(id)─┬────────window_start─┬─_version─┐ │ 1 │ 2022-06-28 15:23:20 │ 2 │ └───────────┴─────────────────────┴──────────┘
| WATCH wv
┌─count(id)─┬────────window_start─┐ │ 1 │ 2022-06-28 15:23:20 │ └───────────┴─────────────────────┘
Disregarding the _version
column in the WATCH lv
output, we see that the results are the same.
Now, let’s do exactly the same INSERT
one more time. The new result for both is as follows:
INSERT INTO data VALUES(1,now())
| WATCH lv
┌─count(id)─┬────────window_start─┬─_version─┐ │ 1 │ 2022-06-28 15:27:00 │ 3 │ │ 1 │ 2022-06-28 15:23:20 │ 3 │ └───────────┴─────────────────────┴──────────┘
| WATCH wv
┌─count(id)─┬────────window_start─┐ │ 1 │ 2022-06-28 15:27:00 │ └───────────┴─────────────────────┘
As we can see, the Live View’s second result contains data for both time intervals, while the result of the Window View only shows the result of the latest time window.
Let’s confirm this behavior by doing two more identical INSERT
s as before, one after another:
INSERT INTO data VALUES(1,now()) INSERT INTO data VALUES(1,now())
| WATCH lv
┌─count(id)─┬────────window_start─┬─_version─┐ │ 2 │ 2022-06-28 15:30:00 │ 5 │ │ 1 │ 2022-06-28 15:27:00 │ 5 │ │ 1 │ 2022-06-28 15:23:20 │ 5 │ └───────────┴─────────────────────┴──────────┘
| WATCH wv
┌─count(id)─┬────────window_start─┐ │ 2 │ 2022-06-28 15:30:00 │ └───────────┴─────────────────────┘
The results are as expected. The Live View keeps accumulating results for all the time intervals, while the Window View with the tumble()
time window function only outputs the result for the last closed time window.
Let’s abort WATCH lv
query and DROP
the current Live View, and replace it with the following view that tries to mimic the behavior of the Window View.
CREATE LIVE VIEW lv AS SELECT count(id), toStartOfInterval(timestamp, toIntervalSecond(10)) AS window_start FROM data GROUP BY window_start ORDER BY window_start DESC LIMIT 1
Now, let’s restart the WATCH lv
query in clickhouse-client2
:
WATCH lv
We can see that the WATCH lv
now returns the same data as WATCH wv
.
┌─count(id)─┬────────window_start─┬─_version─┐ │ 2 │ 2022-06-28 15:30:00 │ 1 │ └───────────┴─────────────────────┴──────────┘
Let’s do two more INSERT
s:
INSERT INTO data VALUES(1,now()) INSERT INTO data VALUES(1,now())
We can see that the last result for the Window View and the Live View match.
| WATCH wv
┌─count(id)─┬────────window_start─┐ │ 2 │ 2022-06-28 15:42:50 │ └───────────┴─────────────────────┘
| WATCH lv
┌─count(id)─┬────────window_start─┬─_version─┐ │ 2 │ 2022-06-28 15:42:50 │ 3 │ └───────────┴─────────────────────┴──────────┘
Using Time Window Functions in Live View
It’s time to have some fun and use the tumble()
and tumbleStart()
functions in Live View instead of Window View:
CREATE LIVE VIEW lv AS SELECT count(id), tumble(timestamp, toIntervalSecond('10')) AS w_id, tumbleStart(w_id) AS window_start FROM data GROUP BY w_id
Now, if we do WATCH lv
, we get:
┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 2 │ ('2022-06-28 15:30:00','2022-06-28 15:30:10') │ 2022-06-28 15:30:00 │ 1 │ │ 1 │ ('2022-06-28 15:27:00','2022-06-28 15:27:10') │ 2022-06-28 15:27:00 │ 1 │ │ 2 │ ('2022-06-28 15:42:50','2022-06-28 15:43:00') │ 2022-06-28 15:42:50 │ 1 │ │ 1 │ ('2022-06-28 15:23:20','2022-06-28 15:23:30') │ 2022-06-28 15:23:20 │ 1 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘
This shows that Live View can be used with tumble()
and tumbleStart()
functions.
Let’s modify our Live View query as before to match the output of Window View:
CREATE LIVE VIEW lv AS SELECT count(id), tumble(timestamp, toIntervalSecond('10')) AS w_id, tumbleStart(w_id) AS window_start FROM data GROUP BY w_id ORDER BY w_id DESC LIMIT 1
Now, we get the following for WATCH lv
:
┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 2 │ ('2022-06-28 15:42:50','2022-06-28 15:43:00') │ 2022-06-28 15:42:50 │ 1 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘
Again, let’s perform a couple of new INSERT
s to compare the results of Window View and Live View:
INSERT INTO data VALUES(1,now()) INSERT INTO data VALUES(1,now())
Note that, depending on query timing, different inserts can go into different time windows. If that happens, just try your INSERT
s again.
| WATCH wv
┌─count(id)─┬────────window_start─┐ │ 2 │ 2022-06-28 15:53:00 │ └───────────┴─────────────────────┘
| WATCH lv
┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 1 │ ('2022-06-28 15:53:00','2022-06-28 15:53:10') │ 2022-06-28 15:53:00 │ 4 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘ ┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 2 │ ('2022-06-28 15:53:00','2022-06-28 15:53:10') │ 2022-06-28 15:53:00 │ 5 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘
After 10 sec passes, Window View closes the time window, and the results of Live View and Window View are the same. However, if you pay attention, you will see that the Live View provides immediate results for the current time window, as it does not wait for the time window to close.
When Window View and Live View Results Will Differ
So far, we have seen that we can make Live View behave very close to Window View. However, this is only because we used a table column for the time attribute to the tumble()
time window function. When we try to use the now()
function instead, the results from Live View will not be what you expect. You can see this if we drop the previous tables and create new ones as follows:
CREATE WINDOW VIEW wv AS SELECT count(id), tumbleStart(w_id) AS window_start FROM data GROUP BY tumble(now(), toIntervalSecond('10')) AS w_id
CREATE LIVE VIEW lv AS SELECT count(id), tumble(now(), toIntervalSecond('10')) AS w_id, tumbleStart(w_id) AS window_start FROM data GROUP BY w_id ORDER BY w_id DESC LIMIT 1
If you execute WATCH lv
, you will see something like this:
| WATCH lv
┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 10 │ ('2022-06-28 16:34:50','2022-06-28 16:35:00') │ 2022-06-28 16:34:50 │ 1 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘
The WATCH wv
will not provide output until you do an INSERT
. If you try to do an INSERT
, the Live View keeps assigning records to the same time window bucket as the now()
function is not re-evaluated in Live View when new data arrives but stays fixed, while the Window View properly creates time windows based on the current value of now()
as expected.
When Live View Must Be Used Instead of Window View
It is also useful to know when Live View must be used and Window View is not an option. Let’s see an example where we try to create a Window View that does not include the GROUP BY
clause, which causes an error:
CREATE WINDOW VIEW wv AS SELECT count(id) FROM data Received exception from server (version 22.3.6): Code: 80. DB::Exception: Received from localhost:9000. DB::Exception: GROUP BY query is required for WindowView. (INCORRECT_QUERY)
This shows that Window View is meant to be used only with the GROUP BY
clause. We can also check if the Window View can be used with GROUP BY
that does not use one of the two-time window functions. Again, we get an error:
CREATE WINDOW VIEW wv AS SELECT count(id) FROM data GROUP BY toStartOfInterval(timestamp, toIntervalSecond(10)) Received exception from server (version 22.3.6): Code: 80. DB::Exception: Received from localhost:9000. DB::Exception: TIME WINDOW FUNCTION is not specified for WindowView. (INCORRECT_QUERY)
Both of the above queries will work fine for Live View, as we can see below:
CREATE LIVE VIEW wv AS SELECT count(id) FROM data
| WATCH lv
┌─count(id)─┬─w_id──────────────────────────────────────────┬────────window_start─┬─_version─┐ │ 13 │ ('2022-06-28 16:37:10','2022-06-28 16:37:20') │ 2022-06-28 16:37:10 │ 4 │ └───────────┴───────────────────────────────────────────────┴─────────────────────┴──────────┘
CREATE LIVE VIEW lv AS SELECT count(id) FROM data GROUP BY toStartOfInterval(timestamp, toIntervalSecond(10))
| WATCH lv
┌─count(id)─┬─_version─┐ │ 2 │ 1 │ │ 1 │ 1 │ │ 1 │ 1 │ │ 2 │ 1 │ │ 1 │ 1 │ │ 2 │ 1 │ │ 2 │ 1 │ │ 1 │ 1 │ │ 1 │ 1 │ └───────────┴──────────┘
Note that the Live View result in the last view grows with input and causes the server to run out of memory. Ensure you keep in mind that the result of the query should either be an aggregated value or be fixed using the LIMIT
clause in the innermost query.
Other Differences
The other difference between Live View and Window View tables is that Window View stores its results in an inner table engine, which by default is AggregatingMergeTree
if not specified explicitly, using the INNER ENGINE
clause when creating the view. Live View, on the other hand, currently stores all intermediate results only in memory, which can cause server memory issues when proper care is not taken.
Also, Window View can write its results to a dedicated table by specifying the TO
clause, while Live View can only be used to SELECT
and WATCH
. A workaround for Live View is to use the INSERT INTO db.table WATCH [lv]
statement to manually output results into a table.
Another big difference between Window View and Live View is that Window View supports defining windows based on processing time or event time while also allowing control of how late events are processed. The event time is defined as the time embedded in the arriving records and processing time is defined as the local machine’s time. In practical applications, using event time is more appropriate as it provides determinism, whereas using processing time does not.
Conclusion
While Window View and Live View tables have some similarities, they are best used for their intended purpose. It is also important to note that both of these features are still in an experimental mode and most likely will be modified in a backward incompatible mode as ClickHouse generalizes support for streaming queries and stream analytics processing.
Nonetheless, these two features show the power of ClickHouse as an open-source project that allows third-party contributors to show how the core project can evolve to solve needs that were not initially anticipated by the core development team. These features are definitely worth a closer look, so keep WATCH
ing how Window View and Live View tables evolve in future ClickHouse releases.
Published at DZone with permission of Vitaliy Zakaznikov. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments