An overview of Elasticsearch – Paul Zerkel



An overview of Elasticsearch – Paul Zerkel

0 2


pres-es-overview

Presentation about using Elasticsearch and .NET integration with Nest

On Github paulzerkel / pres-es-overview

An overview of Elasticsearch

Paul Zerkel

1/8/2014

Topics

  • Overview of Elasticsearch
  • Storage concepts
  • Queries and Filters
  • Nest, a .NET client library

Overview

What is Elasticsearch?

At its core, Elasticsearch is an open, distributed, and document oriented full text search engine that indexes data in real time through a RESTful API.

Buzzword Bingo!

Features

  • Full text search engine
  • Open source
  • Distributed
  • Document oriented
  • RESTful API
  • Real-time

Requirements

Elasticsearch (and the underlying engine, Lucene) are written in Java and requires Java 6 or higher to run. You can download and install Java, or if you have it installed you can easily check your version with the following command:

$java -version

That is the only requirement for Elasticsearch.

Installation & Running

Download Elasticsearch and extract it onto your computer. After that, navigate to the location you installed it and run:

$bin/elasticsearch -f

or on a Windows machine:

bin\elasticsearch.bat

Testing the New Instance

You can check to see that Elasticsearch is up and running. Once it is started open a new browser window and navigate to http://localhost:9200

If everything is working properly you should see a JSON document with information describing the running instance of Elasticsearch.

Sidenote - cURL

cURL is a command line tool that can be used to transfer data across many protocols such as HTTP. It is often used to quickly interact with RESTful APIs. You will see it in much of the Elasticsearch documentation.

Documents

Data that is added to Elasticsearch is called a document. Documents are represented in JSON format and you are not required to create a schema before adding it.

Document Example

{
	"id": "1",
	"title": "Ulysses",
	"author": "James Joyce",
	"publish_date": "1922-02-02",
	"description": "Ulysses is a modernist novel by James Joyce."
}
					

Indexes

Elasticsearch stores data in an index. Elasticsearch can contain multiple indexes to separate data and an index can also be sharded across multiple nodes in a cluster. A search can span multiple indexes.

Indexes can be created and deleted within Elasticsearch.

$curl -XPOST http://localhost:9200/library/ $curl -XDELETE http://localhost:9200/library/

Types

A type is how you keep documents with different schemas separate. As an example, if you had an ecommerce site, you might create an index containing documents of a Product type and also a Review type.

Each of these would have different data structures, but could be included together in a search.

Analysis

Documents that are added must be analyzed so that they can be searched for. This analysis work is done by an Analyzer and can be configured per field in the document.

There are multiple analyzers that are built in and are useful for different circumstances. It is also possible to turn off analysis for a field if you do not want the field to be indexed.

API Basics - CRUD Operations

It is possible to Create, Read, Update, and Delete documents within an index. In addition you can bulk load data into an index.

By default, Elasticsearch will store the entire source of a document in a special field named _source. This is required for certain operations, such as an update.

Searching

Once data is added to an index it can be searched for. Searching can be accomplished either through the the URL or via a GET with a JSON body.

The main way to query Elasticsearch is through their Query DSL (domain specific language). The DSL is a JSON document that describes how the search should be put together.

{
  "query" : {
    "term" : {
      "title" : "gatsby"
    }
  }
}
					

Search Types

  • Term
  • Terms
  • Match
  • Multi Match
  • Phrase Match
  • Query String
  • Prefix
  • Fuzzy
  • etc

Filters

Filters are a way to select a subset of data as part of a query. They can be used include or exclude data from the query. They do not impact the scoring of the results (how relevant the result is to the query). They should be used instead of a query if the criteria is not important to the score of the result.

{
  "filter" : {
    "bool" : {
      "must" : { 
        "term" : { "user" : "pzerkel" }
      }
    }
  }
}
					

Filter Types

  • Term
  • Terms
  • Bool
  • Range
  • Script
  • etc

Questions?

Thanks!