Elasticsearch and Its Basics

What is Elasticsearch: Elasticsearch is a search and analytics engine. Its core component is based on Lucene. Lucene is information retrieval system. Basically used for inverted index.

Elasticsearch History: Elasticsearch was first released in February 2010 and since then there are many releases till date. Following is a screen shot of the versions and release dates.

elasticsearch_history

Basic Concepts:

  1. Document: A document is a basic unit of information that can indexed. It is like a row in table in Relational DB or an object in object-oriented programming language. Documents are saved as JSON objects. Following is an example of Document

 {

          “id”: “722683”,

          “url”: “testurl”,

          “name”: “testName”,

          “fields”: {},

          “type”: “testType”,

          “templateName”: “testtemplate”,

          “path”: [],

          “keyWords”: [

            {

              “key”: “test2”,

              “value”: 2

            },

            {

              “key”: “test”,

              “value”: 18

            }

          ]

        }

      }

  1. Type: Type is a collection of same type of documents. It is like ‘Table’ in relational DB. Type consists of documents and its mapping. Because Lucene, which Elasticsearch is built on, has no concept of document types, this is stored within an _type field. What happens internally is that when searching for a specific type of document, Elasticsearch applies a filter on this field.
  2. Index: Index is a collection of Types. It can be considered same as Database in relational Db.
  3. Node: Node is collection of Indexes, which is very similar to Relational Db Server.
  4. Clusters: Cluster is a network of Nodes.
  5. Shard: Index can be divided into multiple pieces. Each piece is called Shard. An index can contain very large amount of data which may exceed hardware limit. To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.Shards provide horizontal scaling and parallel operations thus increasing performance\throughput.
  6. Replica: To avoid network failure or nay kind of unexpected failure Replica is a failover mechanism of Elasticsearch. Elasticsearch allows you to make one or more copies of Shard which are called Replica.

Besides providing high availability Replica also increases performance, since parallel search operations can be done on each replica.

By default, each index in Elasticsearch is allocated 5 primary shards and 1 replica

Following is a simple analogy to understand these terms for beginners.

ESvsRleationalDb

What are its benefits: (why do we use it)

  1. Distributed – Shard and replica makes Elasticsearch a distributed architecture. Index are distributed in Shards and each Shard is a Lucene index. Each Shard has Primary Node and Replica Node. Following is an example of distributed architecture.elasticsearch_distributed architectur
  2. Scales Massively – We can scale it from one node to many nodes. It can be scaled horizontally and vertically.
  3. High Availability – Elasticsearch uses multiple duplicity using replica shard. If one node goes down then Elasticsearch has capalibily to distribute the load to another nodes at run time. So data is never loss. While creating Index, one can define number of shards.
  4. Restful API: Elasticsearch uses simple Restful APIs and JSON. Its APIs are easy to use and understand.
  5. Schema Free: Elasticsearch uses dynamic mapping. If mapping of index is not provided it intelligently maps the document to index.
  6. Open-Source: Elasticsearch is an open source project.

Further read: To know more about ES please see the following links.

To read more about Elasticsearch please visit the following links.

Elasticseatch website

Wiki

 

Author: Rupesh

Hi! I'm Rupesh, a funophile and technophile. I'm an Application Developer, Solution Architect and IT-Consultant, and an author in the works. I am a Microsoft certified Professional and Solution Developer (MCP and MCSD).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s