Various software and components are available in the market to manage and extract meaningful information from the massive data. These systems work as follows: data collection, pre-processing, aggregation, storage, search & indexing, and data analytics. In this note, we will understand ELK stack architecture.
ELK Stack Architecture
The classic Elastic Logstash Kibana (ELK) stack Architecture is well-known architecture for log management and analytics. There are thousands of organizations that are currently using it, and this list is kept on growing. This is one of the applications of elastic search, and ELK is one of the popular open-source stacks for managing and analyzing logs (text data). We will see each component before the deep drive into the ELK architecture.
Logstash
It is used for data aggregation and processing from various sources and different formats. The support for different formats is enabled with input and output plugins. It works as follows, It pulls data by using the available input plugins, aggregates the received data by applying different filters, if any, and enriches data, and in the end, stores data using the output plugins. Example of input plugins:
- Read data from GitHub webhook.
- Decodes the output of an HTTP API into events
- Reads mail from an IMAP server
- Reads events from an IRC server
- Reads events from a Redis instance
- Reads events from the Twitter Streaming API
This is a long list. However, it is designed so that if a certain type of plugin is not available, we can write one based on the data sources & requirements. For example, though the name suggested log data only, we can use it for any text data. Once the data is aggregated, it needs to be stored somewhere, and elastic search acts as a storage mechanism. It indexes those received data to make queries and quickly available for data visualization at the Kibana interface.
Kibana
It is used to analyze the log data stored in the elastic search, allows queries or options to search data stored into the elastic search, and provides the options to build a dashboard for data visualization.
There are other popular products for managing logs, such as Splunk, Grafana, Lobsters, Fluentd, Beats, and many more. It is also possible to use elastic search and integrate other data aggregation, processing, and visualization software. The elastic search is the heart for handle all the indexing parts and makes searches simpler. For example, grafana can be used for building dashboards instead of kibana.
This architecture is very suitable for small organizations where logs are generated by a certain limit. However, it does not scale well when massive data is generated by different log generation clients. For those kinds of requirements, we need to add other components in the ELK architecture.
ELK Stack Architecture for Massive Amounts of Data
We will first understand why Logstash does not able to handle massive data. By default, Logstash uses an in-memory bounded queue between pipeline stages, such as inputs to pipeline workers to buffer events. Suppose, Logstash experiences a temporary machine failure, and then the in-memory queued data will be lost. So if we put in between persistent storage, then this problem will be solved. For that reason, we can use Redis, RabbitMQ, or Apache Kafka, and many more for persistent storage.
Segregation of data collection and data aggregation & processing speed up the entire system and provides a certain level of scalability. For data collection, we can use Beats or Fluentd or any other agent packages, which run as a daemon and pushes data to Buffering units, and Logstash fetches those data and performs the defined task.
However, the latest version of Elastic stack provides persistent storage within Logstash, eliminating additional components as per the references.
229 total views, 1 views today