training-elasticsearch



training-elasticsearch

0 0


training-elasticsearch


On Github vspiewak / training-elasticsearch

Training ES

Created by Vincent Spiewak / @vspiewak

Agenda

Introduction

  • Orienté document (JSON)
  • Non structuré
  • Recherche full text
  • Analytique
  • Temps réel
  • RESTful + API

Propriétés

  • Distribué
  • Haute disponibilité
  • Multi tenant
  • Gestion de conflits
  • Lucene
  • Licence Apache 2

Installation

https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.0.zip

unzip elasticsearch-$VERSION.zip

Distribution

.
├── ...
├── bin
│ ├── elasticsearch
│ ├── plugin
│ └── ...
├── config
│ ├── elasticsearch.yml
│ └── logging.yml 
├── data
│ └── ...
├── lib
│ ├── elasticsearch-x.y.z.jar
│ └── ...
└── logs
  ├── elasticsearch.log
  └── ...

Lancement

cd elasticsearch-$VERSION

./bin/elasticsearch
[2014-07-18 20:19:57,837][INFO ][node                     ] [Leonard Samson] version[1.2.1], pid[687], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-18 20:19:57,838][INFO ][node                     ] [Leonard Samson] initializing ...
[2014-07-18 20:19:57,844][INFO ][plugins                  ] [Leonard Samson] loaded [], sites [head]
[2014-07-18 20:19:59,979][INFO ][node                     ] [Leonard Samson] initialized
[2014-07-18 20:19:59,979][INFO ][node                     ] [Leonard Samson] starting ...
[2014-07-18 20:20:00,073][INFO ][transport                ] [Leonard Samson] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.43.97:9300]}
[2014-07-18 20:20:03,102][INFO ][cluster.service          ] [Leonard Samson] new_master [Leonard Samson][r_gpL3SXQcWwGki0cI2UBg][mbp-de-vincent][inet[/192.168.43.97:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-18 20:20:03,124][INFO ][discovery                ] [Leonard Samson] elasticsearch/r_gpL3SXQcWwGki0cI2UBg
[2014-07-18 20:20:03,138][INFO ][http                     ] [Leonard Samson] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.43.97:9200]}
[2014-07-18 20:20:03,150][INFO ][gateway                  ] [Leonard Samson] recovered [0] indices into cluster_state
[2014-07-18 20:20:03,151][INFO ][node                     ] [Leonard Samson] started

Lancement avec paramètres

# premier plan
./bin/elasticsearch

# arrière plan
./bin/elasticsearch -d

# fichier pid
./bin/elasticsearch -p /var/run/elasticsearch.pid

health check

curl 'http://localhost:9200'
{
  "status" : 200,
  "name" : "Leonard Samson",
  "version" : {
    "number" : "1.2.1",
    "build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
    "build_timestamp" : "2014-06-03T15:02:52Z",
    "build_snapshot" : false,
    "lucene_version" : "4.8"
  },
  "tagline" : "You Know, for Search"
}

Count documents

curl 'http://localhost:9200/_count'
{"count":0,"_shards":{"total":0,"successful":0,"failed":0}}

curl 'http://localhost:9200/_count?pretty'
{
  "count" : 0,
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  }
}
paramètre pretty pour activer le "pretty-print"

Insertion d'un document

curl -XPOST http://localhost:9200/twitter/tweet?pretty -d 
'{ "user":"vspiewak", "content":"My first tweet #yolo" }'

{
  "_index" : "twitter",
  "_type" : "tweet",
  "_id" : "9RNpJqcCRouzD79Drbexew",
  "_version" : 1,
  "created" : true
}
un document de type tweet dans l'indice twitter sur le cluster local écoutant sur le port 9200

Specifier un ID

curl -XPOST http://localhost:9200/twitter/tweet/1?pretty -d 
'{ "user":"vspiewak", "content":"Another tweet #NerdBrigade" }'

{
  "_index" : "twitter",
  "_type" : "tweet",
  "_id" : "1",
  "_version" : 1,
  "created" : true
}
Intéressant lors de l'indexation d'une autre DBVous pouvez utiliser PUT si l'indice et le type existe déjà (BAD REST)

Recupérer un document

curl -XGET http://localhost:9200/twitter/tweet/1?pretty
{
  "_index" : "twitter",
  "_type" : "tweet",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{ "user":"vspiewak", "content":"Another tweet #NerdBrigade" }
}

Existence

curl -i -XHEAD http://localhost:9200/twitter/tweet/1
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

curl -i -XHEAD http://localhost:9200/twitter/tweet/mystrangeuuid HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0

Recherche

curl -XGET http://localhost:9200/twitter/tweet/_search?pretty
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "twitter",
      "_type" : "tweet",
      "_id" : "aYmf05FqTlCcMRxBUt71aQ",
      "_score" : 1.0,
      "_source":{ "user":"vspiewak", "content":"My first tweet #yolo" }
    }, {
      "_index" : "twitter",
      "_type" : "tweet",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{ "user":"vspiewak", "content":"Another tweet #NerdBrigade" }
    } ]
  }
}
return les top 10 resultats par défaut

Recherche avec un champ specifié

curl 'http://localhost:9200/twitter/tweet/_search?pretty&q=user:vspiewak'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "twitter",
      "_type" : "tweet",
      "_id" : "aYmf05FqTlCcMRxBUt71aQ",
      "_score" : 0.30685282,
      "_source":{ "user":"vspiewak", "content":"My first tweet #yolo" }
    }, {
      "_index" : "twitter",
      "_type" : "tweet",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source":{ "user":"vspiewak", "content":"Another tweet #NerdBrigade" }
    } ]
  }
}

Suppression

curl -XDELETE http://localhost:9200/twitter/tweet/1?pretty
{
  "found" : true,
  "_index" : "twitter",
  "_type" : "tweet",
  "_id" : "1",
  "_version" : 2
}

Plugins populaires

  • mobz/elasticsearch-head
  • lukas-vlcek/bigdesk
  • karmi/elasticsearch-paramedic
  • lmenezes/elasticsearch-kopf
  • royrusso/elasticsearch-HQ
  • elasticsearch-marvel

Gestion des plugins

 # installation depuis url / fichier
 bin/plugin -install -url http://website.com/plugin.zip

 # installation depuis github
 bin/plugin --install mobz/elasticsearch-head

 # suppression
 bin/plugin --remove elasticsearch-marvel

Plugin head

Elasticsearch vs SQL

MySQL   |   Partition  |     Base      |    Table   |   Ligne    |   Colonne 
--------+--------------+---------------+------------+------------+----------
ES      |   Noeud      |    Indice     |    Type    |   Document |   Champ 

Glossaire

  • Cluster: un ou plusieurs noeuds ayant le même cluster.name

  • Noeud: une instance d'Elasticsearch appartenant à un cluster

  • Indice: une collection de documents

  • Shard: une portion d'un indice

  • Primary / Replica shard: un shard peux avoir plusieurs copies

Configuration

Essentiel

config/elasticsearch.yml

cluster.name: my_cluster 
node.name: "Franz Kafka"
network.host: localhost

Au demmarage

bin/elasticsearch --cluster.name=my_cluster --network.host=localhost

Variables d'environment

  • ES_HOME
  • ES_HEAP_SIZE
  • ES_JAVA_OPTS

Configuration des chemins

  • path.conf
  • path.data
  • path.work
  • path.logs
  • path.plugins

Structure

├── data
│ ├── my_cluster_1
│ │ ├── nodes
│ │ │ ├── 0
│ ├── my_cluster_2
│ │ ├── nodes
│ │ │ ├── 0
│ │ │ ├── 1

Logging

  • config/logging.yml
  • log4j
index.gateway: DEBUG

indices.recovery: DEBUG

Discovery

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2", "host3" ]

Ports

  • 9200: HTTP protocol (REST / JSON)
  • 9300: Elasticsearch Transport protocol

API

Java API

  • Port 9300
  • Node Client
  • Transport Client
  • /!\ client version == cluster version

HTTP API

  • GET, POST, PUT, HEAD, DELETE
  • tout noeud du cluster
  • request path
  • paramètre query optionnel
  • request body (JSON)

Document

{
    "id" : "1",
    "user" : "vspiewak",
    "birth" : 1984,
    "size" : 185,
    "like" : [ "java", "agile" ]
    "location" : {
        "city" : "Paris",
        "geo" : {
            "lat" : 48.860553,
            "lon" : 2.3404509
        }
    }
}

Types

  • string, integer, float, boolean, datetime, binary, null
  • tableau, objet
  • geo_point, geo_shape, ip, multi-field

Metadonnées

  Nom         |  Défaut     |     Description
--------------+-------------+-----------------------------------------------
              |             |
  _id         |             |  identifiant du document
              |             |
  _type       |             |  type du document  
              |             |   
  _source     |   activé    |  le document original (indexation)
              |             |
  _all        |   activé    |  index les valeures de tous les champs
              |             |
  _timestamp  |  désactivé  |  date associée au document
              |             |
  _ttl        |  désactivé  |  durée avant expiration
              |             |
  _size       |  désactivé  |  taille non compressée du contenu de _source
              |             |

Creation d'index

curl -XPUT 'http://localhost:9200/twitter/' -d '{
    "settings": {
        "index": {
            "number_of_shards": 3,
            "number_of_replicas": 2
        }
    },
    "mappings": {
        "type1": {
            "_source": {
                "enabled": false
            },
            "properties": {
                "field1": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    },
    "warmers": {
        "warmer_1": {
            "source": {
                "query": {
                    "match_all": {}
                },
                "aggs": {
                    "aggs_1": {
                        "terms": {
                            "field": "field1"
                        }
                    }
                }
            }
        }
    },
    "aliases": {
        "alias_1": {},
        "alias_2": {
            "filter": {
                "term": {
                    "user": "kimchy"
                }
            },
            "routing": "kimchy"
        }
    }
}'

Insertion

curl -i -XPUT 'http://localhost:9200/comics/hero/1?pretty' -d '{
   "firstname" : "Bruce",
   "lastname" : "Wayne"
}'

Indexation - Création

curl -i -XPUT 'http://localhost:9200/comics/hero/1?pretty' -d '{
   "firstname" : "Bruce",
   "lastname" : "Wayne"
}'

HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 99

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 1,
  "created" : true
}

Indexation - Ré-indexation

curl -i -XPUT 'http://localhost:9200/comics/hero/1?pretty' -d '{
   "firstname" : "Peter",
   "lastname" : "Parker"
}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 100

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 2,
  "created" : false
}

Indexation - Création explicite

curl -i -XPUT 'http://localhost:9200/comics/hero/1/_create?pretty' -d '{
   "firstname" : "Flash",
   "lastname" : "Gordon"
}'

HTTP/1.1 409 Conflict
Content-Type: application/json; charset=UTF-8
Content-Length: 199

{
  "error" : "RemoteTransportException[[Diamond Lil][inet[/10.253.1.225:9301]][index]]; nested: DocumentAlreadyExistsException[[comics][2] [hero][1]: document already exists]; ",
  "status" : 409
}

Indexation - _id

  • spécifié dans l'url
  • auto-généré (timestamp + incrément)
  • extrait depuis un champ (path)

Indexation - Timestamp

{
    "hero" : {
        "_timestamp" : { "enabled" : true }
    }
}

curl -XPUT 'http://localhost:9200/comics/hero/1?timestamp=1963-03-01T00:00:00' -d '{
   "firstname" : "Tony",
   "lastname" : "Stark"
}'

Indexation - TTL

{
    "hero" : {
        "_ttl" : { "enabled" : true }
    }
}

curl -XPUT 'http://localhost:9200/comics/hero/1?ttl=1d' -d '{
   "firstname" : "Anna",
   "lastname" : "Marie"
}'
préférez un système de crontab si possible

Indexation - Automatique

action.auto_create_index: false

index.mapper.dynamic: false

Indexation - Exécution

  • calcul hash(routing) % number_of_primary_shards
  • éxécute ou redirige la requête vers le shard primary
  • replication sur les shards réplicas

Récupération

curl http://localhost:9200/comics/hero/1?pretty

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
    "firstname" : "Flash",
       "lastname" : "Gordon"
    }
}
temps réel contrairement à la recherche

Récupération - Non trouvé

curl http://localhost:9200/comics/hero/2?pretty

HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 80

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "2",
  "found" : false
}

Récupération - Document original

curl http://localhost:9200/comics/hero/1/_source

{
  "firstname" : "Flash",
  "lastname" : "Gordon"
}

Récupération - Champs Spécifiques

curl 'http://localhost:9200/comics/hero/1?_source=firstname,birth&pretty'

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{"firstname":"Flash"}
}

Récupération - Execution

  • calcul hash(routing) % number_of_primary_shards
  • round-robin la requête vers un shard
  • ?preference=_local
  • ?preference=_primary

Existence

curl -i -XHEAD http://localhost:9200/comics/hero/1?pretty
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

curl -i -XHEAD http://localhost:9200/comics/hero/2?pretty
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

Déletion

curl -i -XDELETE http://localhost:9200/comics/hero/1?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 97

{
  "found" : true,
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 2
}

curl -i -XDELETE http://localhost:9200/comics/hero/2?pretty
HTTP/1.1 404 Not Found
Content-Type: application/json; charset=UTF-8
Content-Length: 98

{
  "found" : false,
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "2",
  "_version" : 1
}

Déletion - Exécution

  • calcul hash(routing) % number_of_primary_shards
  • execute ou redirige vers le shard primaire
  • réplication sur les autres shards

Update - Partial

curl -i -XPOST http://localhost:9200/comics/hero/1/_update?pretty -d '{
  "doc" : {
    "view_count" : 0
  }
}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 79

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 2
}

Update - Scripting

curl -i -XPOST http://localhost:9200/comics/hero/1/_update?pretty -d '{
  "script" : "ctx._source.view_count += 1"
}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 79

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 3
}

Update - Paramètres nommés

curl -i -XPOST http://localhost:9200/comics/hero/1/_update?pretty -d '{
  "script" : "ctx._source.view_count += count",
  "params" : {
    "count" : 4
  }
}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 79

{
  "_index" : "comics",
  "_type" : "hero",
  "_id" : "1",
  "_version" : 4
}

Update - Upsert

curl -XPOST localhost:9200/comics/hero/1/_update -d '{
  "script" : "ctx._source.view_count += 1",
  "upsert" : {
    "view_count" : 1
  }
}'

Update - Activer le scripting

#script.default_lang: mvel
script.default_lang: groovy
script.disable_dynamic: false

Update - Exécution

  • calcul hash(routing) % number_of_primary_shards
  • exécute ou redirige vers le shard primaire
  • get du document (_source), mise à jour, ré-indexation
  • réplication sur les autres shards

Versionning

  • chaque document à une version
  • optimistic concurrency control
  • incrémentation lors de ré-index, update ou delete

Versionning - Interne

curl -XDELETE localhost:9200/comics/hero/1?version=1

HTTP/1.1 409 Conflict
Content-Type: application/json; charset=UTF-8
Content-Length: 123

{"error":"VersionConflictEngineException[[comics][2] [hero][1]: version conflict, current [5], provided [1]]","status":409}

Versionning - Externe

curl -i -XPUT 'localhost:9200/website/blog/1?version=5&version_type=external' -d '
{
  "title": "My first external blog entry",
  "text":  "Starting to get the hang of this..."
}'

HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 73

{"_index":"website","_type":"blog","_id":"1","_version":5,"created":true}

Versionning - Externe (MAJ)

curl -i -XPUT 'localhost:9200/website/blog/1?version=10&version_type=external' -d '
{
  "title": "My first external blog entry",
  "text":  "This is a piece of cake..."
}'

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 75

{"_index":"website","_type":"blog","_id":"1","_version":10,"created":false}

Versionning - Interne VS Externe

  • Interne : _version doit être égale
  • Externe : _version doit être plus petit

Multi GET - Requête

curl 'localhost:9200/_mget?pretty' -d '
{
    "docs": [
        {
            "_index": "comics",
            "_type": "hero",
            "_id": "1"
        },
        {
            "_index": "website",
            "_type": "blog",
            "_id": "1",
            "_source": [ "title" ]
        }
    ]
}

Multi Get - Réponse

{
  "docs" : [ {
    "_index" : "comics",
    "_type" : "hero",
    "_id" : "1",
    "_version" : 5,
    "found" : true,
    "_source":{"firstname":"Bruce","lastname":"Wayne","nickname":"Batman","view_count":2}
  }, {
    "_index" : "website",
    "_type" : "blog",
    "_id" : "1",
    "_version" : 10,
    "found" : true,
    "_source":{"title":"My first external blog entry"}
  } ]
}

Multi Get - Index spécifique

curl localhost:9200/comics/_mget?pretty -d '
{
    "docs": [
        {
            "_type": "hero",
            "_id": 1
        }
    ]
}'

{
  "docs" : [ {
    "_index" : "comics",
    "_type" : "hero",
    "_id" : "1",
    "_version" : 5,
    "found" : true,
    "_source":{"firstname":"Bruce","lastname":"Wayne","nickname":"Batman","view_count":2}
  } ]
}

Multi Get - Type spécifique

curl localhost:9200/comics/hero/_mget?pretty -d '
{
  "ids": [ "1", "2" ]
}'

{
    "docs": [
        {
            "_index": "comics",
            "_type": "hero",
            "_id": "1",
            "_version": 5,
            "found": true,
            "_source": {
                "firstname": "Bruce",
                "lastname": "Wayne",
                "nickname": "Batman",
                "view_count": 2
            }
        },
        {
            "_index": "comics",
            "_type": "hero",
            "_id": "2",
            "_version": 1,
            "found": true,
            "_source": {
                "firstname": "Tony",
                "lastname": "Stark"
            }
        }
    ]
}