We mentioned previously that ELK stands for Elasticsearch, Logstash, and Kibana because these three applications or systems are the building blocks of a complete monitoring and reporting solution. Each part has its own purpose and functions it performs – Logstash gathers all the data into a consistent database, Elasticsearch is able to quickly go through all the data that Logstash stored, and Kibana is here to turn search results into something that is both informational and visually appealing. Having said all this, ELK recently changed its name. Although it is still referred to as the ELK Stack, and almost the entirety of the internet will call it that, the ELK stack is now named the Elastic Stack, for the sole reason that, at the time of writing, there is another fourth component included in the stack. This component is called Beats, and it represents a significant addition to the whole system.
But let’s start from the beginning and try to describe the whole system the way its creators describe it.
The first component that was created and that got traction in the community was Elasticsearch, created to be a flexible, scalable system for indexing and searching large datasets. Elasticsearch was used for thousands of different purposes, including searching for specific content in documents, websites, or logs. Its main selling point and the reason a lot of people started using it is that it is both flexible and scalable, and at the same time extremely fast.
When we think of searching, we usually think about creating some kind of query and then waiting for the database to give us back some form of answer. In complex searches, the problem is usually the waiting since it is exhausting having to tweak our queries and wait for them to produce results. Since a lot of modern data science relies on the concept of non-structured data, meaning that a lot of data that we need to search has no fixed structure, or no structure at all, creating a fast way to search inside this pool of data is a tough problem.
Imagine you need to find a certain book in a library. Also, imagine you do not have a database of all the books, authors, publishing information, and everything else that a normal library has; you are only allowed to search through all the books themselves.
Having a tool that is able to recognize patterns in those books and that can tell you the answer to questions such as who wrote this book? or how many times is KVM mentioned in all the books that are longer than 200 pages? is a really useful thing. This is what a good search solution does.
Being able to search for a machine that is running the Apache web server and has problems with a certain page requested by a certain IP address is essential if we want to quickly and efficiently administer a cluster or a multitude of clusters of physical and virtual servers.
The same goes for system information when we are monitoring even a single point of data, such as memory allocation across hundreds of hosts. Even presenting that data is a problem and searching for it in real time is almost impossible without the right tool.
Elasticsearch does exactly that: it creates a way for us to quickly go through enormous amounts of barely structured data and then comes up with results that make sense. What makes Elasticsearch different is its ability to scale, which means you can use it to create search queries on your laptop, and later just run them on a multi-node instance that searches through a petabyte of data.
Elasticsearch is also fast, and this is not something that only saves time. Having the ability to get search results faster gives you a way to learn more about your data by creating and modifying queries and then understanding their results.
Since this is just a simple introduction to what ELK actually does, we will switch to the next component, Logstash, and come back to searching a bit later.
Logstash has a simple purpose. It is designed to be able to digest any number of logs and events that generate data and store them for future use. After storing them, it can export them in multiple formats such as email, files, HTTP, and others.
What is important about how Logstash works is its versatility in accepting different input streams. It is not limited to using only logs; it can even accept things such as Twitter feeds.
The last part of the old ELK stack is Kibana. If Logstash is storage and Elasticsearch is for computing, then Kibana is the output engine. Simply put, Kibana is a way to use the results of Elasticsearch queries to create visually impressive and highly customizable layouts. Although the output of Kibana is usually some kind of a dashboard, its output can be many things, depending on the user’s ability to create new layouts and visualize data. Having said all this, don’t be afraid – the internet offers at least a partial, if not full solution, to almost every imaginable scenario.
Using the ELK stack is, in many ways, identical to running a server – what you need to do depends on what you actually want to accomplish; it takes only a couple of minutes to get the ELK stack running, but the real effort only starts there.
Of course, for us to fully understand how the ELK stack is used in a live environment, we need to deploy it and set it up first. We’ll do that next.