How to Setup Realtime Analytics over Logs with ELK Stack
Join the DZone community and get the full member experience.
Join For FreeOnce we know something, we find it hard to imagine what it was like not to know it.
- Chip & Dan Heath, Authors of Made to Stick, Switch
Update: I have recently published a book on ELK stack titled - Learning ELK Stack , more details can be found here.
What is the ELK stack ?
The ELK stack is ElasticSearch, Logstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.
ElasticSearch
ElasticSearch,built on top of Apache Lucene, is a search engine with focus on real-time analysis of the data, and is based on the RESTful architecture. It provides standard full text search functionality and powerful search based on query. ElasticSearch is document-oriented/based and you can store everything you want as JSON. This makes it powerful, simple and flexible.
Logstash
Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.In ELK Stack logstash plays an important role in shipping the log and indexing them later which can be supplied to Elastic Search.
Kibana
Kibana is a user friendly way to view, search and visualize your log data, which will present the data stored from Logstash into ElasticSearch, in a very customizable interface with histogram and other panels which provides real-time analysis and search of data you have parsed into ElasticSearch.
How Do I Get It ?
http://www.elasticsearch.org/overview/elkdownloads/
How Do They Work Together ?
Logstash is essentially a pipelining tool. In a basic, centralized installation a logstash agent, known as the shipper, will read input from one to many input sources and output that text wrapped in a JSON message to a broker. Typically Redis, the broker, caches the messages until another logstash agent, known as the collector, picks them up, and sends them to another output. In the common example this output is Elasticsearch, where the messages will be indexed and stored for searching. The Elasticsearch store is accessed via the Kibana web application which allows you to visualize and search through the logs. The entire system is scalable. Many different shippers may be running on many different hosts, watching log files and shipping the messages off to a cluster of brokers. Then many collectors can be reading those messages and writing them to an Elasticsearch cluster.
How Do I Fetch Useful Information Out of Logs?
Fetching useful information from logs is one of the most important part of this stack and is being done in logstash using its grok filters and a set of input , filter and output plugins which helps to scale this functionality for taking various kinds of inputs ( file,tcp, udp, gemfire, stdin, unix, web sockets and even IRC and twitter and many more) , filter them using (groks,grep,date filters etc.) and finally write ouput to ElasticSearch,redis,email,HTTP,MongoDB,Gemfire , Jira , Google Cloud Storage etc.
A Bit More About Log Stash
Filters
Transforming the logs as they go through the pipeline is possible as well using filters. Either on the shipper or collector, whichever suits your needs better. As an example, an Apache HTTP log entry can have each element (request, response code, response size, etc) parsed out into individual fields so they can be searched on more seamlessly. Information can be dropped if it isn’t important. Sensitive data can be masked. Messages can be tagged. The list goes on.
e.g.
input {
file {
path => ["var/log/apache.log"]
type => "saurzcode_apache_logs"
}
}
filter {
grok {
match => ["message","%{COMBINEDAPACHELOG}"]
}
}
output{
stdout{}
}
Above example takes input from an apache log file applies a grok filter with %{COMBINEDAPACHELOG}, which will index apache logs information on fields and finally output to Standard Output Console.
Writing Grok Filters
Writing grok filters and fetching information is the only task that requires some serious efforts and if done properly will give you great insights in to your data like Number of Transations performed over time, Which type of products have most hits etc.
Below links will help you a lot in writing grok filters and test them with ease -
Grok Debugger
http://grokdebug.herokuapp.com/
Grok Patterns Lookup
https://github.com/elasticsearch/logstash/tree/v1.4.2/patterns
References
- http://www.elasticsearch.org/overview/
- http://logstash.net/
- http://rashidkpc.github.io/Kibana/about.html
Opinions expressed by DZone contributors are their own.
Comments