A Guide to Data Warehousing Clickstream Data, Part 1
In Part 1 of this two-part series, we take a look at the benefits that collection and analyzing clickstream data brings.
Join the DZone community and get the full member experience.
Join For FreeTable of Contents
Part 1
- Why clickstream is so important to your online business
- No data science without data
- Understanding customer – key advantage
- Going beyond charts and dashboards
- What is clickstream data?
- Example data output
Part 2
- Clickstream analysis
- Traffic analysis
- Sales funnel analysis
- Browse/Cart abandonment and recovery
- Personalization
- Tracking Experiments (A/B testing)
- Identity Stitching
- Conclusion
Why Cickstream Is So Important to Your Online Business
Clickstream data allows you to see what actions customers are taking on your website. Given how commerce is shifting more and more online, this data is becoming essential for your business to stay competitive. Before defining what kind of data this is, let's take a look at the main reasons why a business needs to own it in the first place.
No Data Science Without Data
The first reason why you should collect and own clickstream data is to be able to take advantage of data science. Unfortunately, as the name implies, data comes first before any science can be made and without it even the most sophisticated models won’t work. This is why you would want to pursue strategic data acquisition, which will make your business more defensible in the long run.
Understanding the Customer - A Key Advantage
Often, clickstream is associated with web analytics, due to its being able to analyze your customer's behavior. For example, you can find out how many customers drop off during the process that takes you from the landing page to completing the purchase. The advantage of owning such data is that you can filter by any trackable metric down to the individual visitor level without limitations of reporting dashboards that are provided by web analytics tools.
Also, you are free to combine reports with any other data source at your disposal. For example, one can stitch orders, paid advertisement reports, geo, and other sources which increases the utility your data assets. Of course, this is possible only when you have full access to the collected data set, and it's available in one unified location.
Going Beyond Charts and Dashboards
Tracking KPIs with charts and dashboards is helpful for monitoring business health and detecting problems in real-time. Though this is useful when making high level business decisions, to truly bring business to the next level the data must be utilized for optimizing activity down to each customer level. One of the most popular examples is personalizing the customer experience.
Personalization can be done on different customer touch points. For example, when the customer is visiting your website we know from the data what they have bought before, or what pages they have visited. Combining single customer data with other customers, you can recommend relevant products or content tailored specifically to the customer who is browsing your website. The same approach can be extended to email, advertisement campaigns, or even physical stores. This way customer experience can stay consistent across all touch points. For any business, this can serve as a key differentiator.
A good case study showing how taking advantage of owned data can drive business is Zara. Using data as its backbone they manage each of their 2000 stores inventory and what's on display on a daily basis. This would be impossible to do if they did not have full access to the collected data set.
What Is Clickstream Data?
To understand how we can use clickstream dataset, first, we need to define what kind of data it contains and how clickstream data is collected. We can define clickstream as a sequence of events that represent visitor actions on the website.
The most common and useful event is called a ‘click,’ which indicates what a visitor has viewed. Of course, we are not limited to collecting just clicks; we can also look at impressions, purchases, and any other events relevant to the business.
Furthermore, an event can include multiple contexts that enrich it, like how long the page load took or what type of browser/device the visitor is using. Essentially, good clickstream data clearly defines a full set of events which allows you to get a complete picture of customer behavior. Conceptually, we can look at events as having their own grammar.
Traditionally. such events are collected using a JavaScript tracker which is loaded with the page on every request. The tracker sends a JSON POST request to a collector website which stores, validates, and enriches it with additional data, and finally sends it to the data warehouse for further analysis. It can be visualized as below:
Image credit: https://github.com/snowplow/snowplow
Later in the article, we’ll take a look at different options for tracking events.
Example Data Output
The best way to gain a deeper understanding of clickstream data is to have a look at particular examples. Below we provide a sample event for page view:
APP: joes_bikes
EVENT: Pageview
TIME: Thu, 25 Apr 2019 08:33:03 GMT
COLLECTOR: collector.stacktome.com
METHOD: POST
Beacon
Event Type |
stringPageview |
Application ID |
stringjoes_bikes |
Event ID |
stringe8468c4a-5d95-42aa-81e1-c72d27a5018a |
Device Created Timestamp |
string2019-04-25T08:33:03.200Z |
Device Sent Timestamp |
string2019-04-25T08:33:03.204Z |
Platform |
stringweb |
Tracker Name |
stringcf2 |
Tracker Version |
stringjs-2.8.2 |
Context
iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0
0 |
iglu:com.stacktome/page/jsonschema/1-0-2
Browser
|
In the table above you can see a sample of data sent from a fictional online store, joesbikes.com, which is based on a real tracking event. Most essential fields are the event timestamp which allows analyzing events as time series.
Another important part is a custom page context which describes viewed paged details. A notable field is the search tag, as it provides what the user is searching for and if that matches the page they have viewed. Combining such events into a sequence allows us to see if the path the user takes for a purchase is optimal or if there are ways to improve it and, at the same time, improve conversion rates.
Lastly, we can see that we also get browser information. This can be useful to understand what type of devices your visitors are using and especially if there are problems with rendering certain pages. For instance, we can analyze if our mobile visitors convert at the same rate as desktop users. Given how important the mobile experience is today, its critical for a business to have this visibility.
Now let's have a look at different event sample of a product impression.
Context:
iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0
iglu:com.stacktome/product_impression/jsonschema/1-0-2
id |
number104463 |
name |
stringJoes Leather Gloves |
type |
stringGloves |
regularPrice |
number29.99 |
currentPrice |
number19.99 |
currency |
stringusd |
nReviews |
number11732 |
avReviews |
number4.3 |
row |
number2 |
column |
number1 |
containerName |
stringbestsellers |
Here, we can see the main attributes of a product shown on the page. The captured event of an impression should help us determine what product was displayed, at which location on the page, and what variable attributes it used. From the above event, we can see that gloves were displayed at the second row and the first column in a container on a page called 'bestsellers.' We can also see the price and review score used for the product. This information alone is enough to determine which products displayed perform as well based on their exposure across the website. Also, we can determine how well they “compete” with each other given the same or different variables (price, location, etc.).
As you can see from the examples above, the information that's being tracked is fairly trivial from a single event perspective. The power comes from having access to these events across all the pages that visitors are interacting with, over a period of time. Then you can measure which pages might need improvement or if the overall website can perform much better. We’ll take a look at a few use cases in the next article.
Published at DZone with permission of Evaldas Miliauskas. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments