Elasticsearch: Understanding the basic architecture

Jeevan George John

Published Mar 16, 2023

Elasticsearch is a distributed, open-source search engine that is used for full-text search and analytics. It is designed to handle large amounts of data and provide fast and flexible search capabilities. The basic architecture of Elasticsearch consists of nodes, which are the basic building blocks of a cluster.

A node is a single instance of Elasticsearch that stores data and participates in the cluster's search and indexing capabilities. Nodes can be installed on a single machine or multiple machines, depending on the size and complexity of the data being indexed.

The nodes in Elasticsearch can be classified into two types: data nodes and master-eligible nodes.

Data Nodes: These nodes store data and perform data-related operations such as indexing, searching, and aggregations. Data nodes hold the primary and replica shards of an index.
Master-Eligible Nodes: These nodes perform cluster management tasks such as creating or deleting indices, assigning shards to nodes, and monitoring the health of the cluster. Master-eligible nodes also participate in the election of a new master node in the event of a failure.

Each node in Elasticsearch is assigned a unique name and can communicate with other nodes in the cluster over a network. Elasticsearch uses a discovery mechanism to find and join other nodes in the cluster. There are several discovery mechanisms available, such as unicast discovery, multicast discovery, and cloud discovery.

A cluster in Elasticsearch is a group of one or more nodes working together to store and manage data. When multiple nodes are connected and working together in a cluster, Elasticsearch automatically distributes data and load balances queries across all the nodes in the cluster.

Recommended by LinkedIn

Data ingestion at scale using Logstash

Abhishek Raj Simon 6 years ago

Data Engineering Best Practices for Building Scalable…

Hemant Panse 1 year ago

Kubernetes data persistence: from host-path to Longhorn

Rafael Ribeiro 5 years ago

Sharding is the process of breaking down a large index into smaller parts called shards, which can be distributed across multiple nodes in a cluster. Each shard is a self-contained index that can be stored and managed independently of other shards. By breaking an index into shards and distributing them across multiple nodes, Elasticsearch can handle large amounts of data and scale horizontally.

Elasticsearch can automatically balance the data across nodes in the cluster using the shard allocation feature. Each index is divided into multiple shards, and Elasticsearch can automatically distribute these shards across multiple nodes to ensure data availability and scalability.

In summary, the basic architecture of Elasticsearch with nodes involves multiple nodes (data and master-eligible nodes) working together in a cluster to store and manage data, with Elasticsearch automatically distributing the data across the nodes for scalability and reliability.

Elasticsearch: Understanding the basic architecture

Jeevan George John

Recommended by LinkedIn

Elastic Search 101

738 followers

More articles by Jeevan George John

Others also viewed

Lakehouse Architecture: Transforming Raw Data Into AI-Powered Insights

Decoding Databricks Storage

Modern Data Platforms: Architecting Asynchronous Ingestion with Kafka and Lakehouse

Architecting Real-Time Data Pipelines on AWS: Ingestion to Visualization

Mastering Techniques for Efficiently Handling Billion+ Row Tables

Demystifying Logstash: The Unsung Hero of the ELK Stack

Understanding How Databricks Data Pipeline Jobs Work Internally

Graph databases go HTAP with Infinigraph

Migrate Synapse to Databricks: A Modern Data Engineering Framework

Buckets and Data Lakes: Building Scalable Data Solutions

Explore content categories

Recommended by LinkedIn

Elastic Search 101

738 followers

More articles by Jeevan George John

📊 Attack Timeline Breakdown

Comprehensive Overview of WiFi Pineapple, WiFiSlax, and Wireless Security Best Practices

🚀 Modern Wi-Fi Demystified: Technologies, Components & Smart Features You Should Know !

Efficiently Removing Duplicates from a Sorted List in Python

SCCM for Endpoint Security: Strengthening Defense Against Cyber Threats

"Is Your Number Happy? Learn How to Check Using Python"

Embedding Security in DevOps: Automating Security in CI/CD Pipelines

The Critical Role of Endpoint Detection and Response (EDR) in Preventing Modern Cyber Threats

Decoding Excel: Converting Column Titles to Numbers with Python

Finding the Needle in the Haystack

Others also viewed

Lakehouse Architecture: Transforming Raw Data Into AI-Powered Insights

Decoding Databricks Storage

Modern Data Platforms: Architecting Asynchronous Ingestion with Kafka and Lakehouse

Architecting Real-Time Data Pipelines on AWS: Ingestion to Visualization

Mastering Techniques for Efficiently Handling Billion+ Row Tables

Demystifying Logstash: The Unsung Hero of the ELK Stack

Understanding How Databricks Data Pipeline Jobs Work Internally

Graph databases go HTAP with Infinigraph

Migrate Synapse to Databricks: A Modern Data Engineering Framework

Buckets and Data Lakes: Building Scalable Data Solutions

Explore content categories