What is Elasticsearch: Elasticsearch is a search and analytics engine. Its core component is based on Lucene. Lucene is information retrieval system. Basically used for inverted index.
Elasticsearch History: Elasticsearch was first released in February 2010 and since then there are many releases till date. Following is a screen shot of the versions and release dates.
- Document: A document is a basic unit of information that can indexed. It is like a row in table in Relational DB or an object in object-oriented programming language. Documents are saved as JSON objects. Following is an example of Document
- Type: Type is a collection of same type of documents. It is like ‘Table’ in relational DB. Type consists of documents and its mapping. Because Lucene, which Elasticsearch is built on, has no concept of document types, this is stored within an _type field. What happens internally is that when searching for a specific type of document, Elasticsearch applies a filter on this field.
- Index: Index is a collection of Types. It can be considered same as Database in relational Db.
- Node: Node is collection of Indexes, which is very similar to Relational Db Server.
- Clusters: Cluster is a network of Nodes.
- Shard: Index can be divided into multiple pieces. Each piece is called Shard. An index can contain very large amount of data which may exceed hardware limit. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.Shards provide horizontal scaling and parallel operations thus increasing performance\throughput.
- Replica: To avoid network failure or nay kind of unexpected failure Replica is a failover mechanism of Elasticsearch. Elasticsearch allows you to make one or more copies of Shard which are called Replica.
Besides providing high availability Replica also increases performance, since parallel search operations can be done on each replica.
By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica
Following is a simple analogy to understand these terms for beginners.
What are its benefits: (why do we use it)
- Distributed – Shard and replica makes Elasticsearch a distributed architecture. Index are distributed in Shards and each Shard is a Lucene index. Each Shard has Primary Node and Replica Node. Following is an example of distributed architecture.
- Scales Massively – We can scale it from one node to many nodes. It can be scaled horizontally and vertically.
- High Availability – Elasticsearch uses multiple duplicity using replica shard. If one node goes down then Elasticsearch has capalibily to distribute the load to another nodes at run time. So data is never loss. While creating Index, one can define number of shards.
- Restful API: Elasticsearch uses simple Restful APIs and JSON. Its APIs are easy to use and understand.
- Schema Free: Elasticsearch uses dynamic mapping. If mapping of index is not provided it intelligently maps the document to index.
- Open-Source: Elasticsearch is an open source project.
Further read: To know more about ES please see the following links.
To read more about Elasticsearch please visit the following links.