How well each document matches the query
By default, Elasticsearch sorts matching results by their relevance score, that is, by how well each document matches the query.Github uses Elasticsearch to search 20TB data, including 1.3 billion files and 130 billion code lines
Relationship databases:With filtering, aggregations, highlightning, pagination...
Count things and summarize your data, lots of data, often on timestamped data!
Logs > Logstash > Elasticsearch > Kibana
Commonly used in addition to another database...
wget https://download.elasticsearch.org/elasticsearch/release/... tar -zxvf elasticsearch-2.2.0.tar.gz cd elasticsearch-2.2.0/bin ./elasticsearch.sh
You can access it at http://localhost:9200 on your web browser, which returns this:
{ "status":200, "name":"Cypher", "cluster_name":"elasticsearch", "version":{ "number":"1.5.2", "build_hash":"62ff9868b4c8a0c45860bebb259e21980778ab1c", "build_timestamp":"2015-04-27T09:21:06Z", "build_snapshot":false, "lucene_version":"4.10.4" }, "tagline":"You Know, for Search" }
JSON documents!
{ "title": "Elasticsearch Worshop", "date": "2016-04-08" }
The act of storing data in Elasticsearch is called indexing.
$curl -X POST localhost:9200/books/computer/1 --data '{ "name": "The Pragmatic Programmer", "category": "Programming", "price": 29.90 }' $curl -X POST localhost:9200/books/computer/2 --data '{ "name": "Clean Code", "category": "Programming", "price": 14.90 }' $curl -X POST localhost:9200/books/computer/3 --data '{ "name": "Working Effectively with Legacy Code", "category": "Refactoring", "price": 45.50 }'It is much like the INSERT keyword in SQL except that, if the document already exists, the new document would replace the old. The second part indicates on which index (an index could be compared to an SQL database, though I don’t like this comparison) your query will be performed, and what is the type (a type could be compared to an SQL table, though I don’t like this comparison either) of the document. From now, I will write indices and types in orange
$curl -X GET localhost:9200/books/computer/1
Result:
{ "_index": "books", "_type": "computer", "_id": "1", "_version": 2, "found": true, "_source": { "name": "The Pragmatic Programmer", "category": "Programming", "price": 29.9 } }
$curl -X PUT localhost:9200/books/computer/1 --data '{ "name":"The Awesome Programmer" }'
Result:
{ "_index":"books", "_type":"computer", "_id":"1", "_version":2, "created":false }
$curl -X DELETE localhost:9200/books/computer/1
Find all books that contains the word "code"
$curl -X GET localhost:9200/books/computer/_search?q=code
Sorted by relevance!
{ "took":6, "timed_out":false, "_shards":{ "total":5, "successful":5, "failed":0 }, "hits":{ "total":2, "max_score":0.15342641, "hits":[ { "_index":"books", "_type":"computer", "_id":"2", "_score":0.15342641, "_source":{ "name":"Clean Code", "category":"Programming", "price":14.9 } }, { "_index":"books", "_type":"computer", "_id":"3", "_score":0.11506981, "_source":{ "name":"Working Effectively with Legacy Code", "category":"Refactoring", "price":45.5 } } ] } }
Mapping is used to define how a document, and the fields it contains, are stored and indexed.
This is similar to a database schema.
Define the data types of the document fields
{ "mappings": { "computer": { "properties": { "name": { "type": "string" }, "category": { "type": "string", "index": "not_analyzed" }, "price": { "type": "float" } } } } }
Find the books with a name that contains the word "code"
$ curl -XGET ‘localhost:9200/books/book/_search’ -d '{ "query": { "match": { "name": "code" } } }'
Find books belonging to the "Programming" category
$ curl -XGET ‘localhost:9200/books/book/_search’ -d '{ "query": { "filtered": { "filter": { "term": { "category": "Programming" } } } } }'
... "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "to" : 10 }, { "from" : 10, "to" : 30 }, { "from" : 30 } ] } } } ...
... "buckets": { "*-10.0": { "to": 10, "doc_count": 0 }, "10.0-30.0": { "from": 10, "to": 30, "doc_count": 2 }, "30.0-*": { "from": 30, "doc_count": 1 } } ...
... "aggs" : { "prices" : { "histogram" : { "field" : "price", "interval" : 15 } } } ...
... "prices" : { "buckets": [ { "key": 0, "doc_count": 1 }, { "key": 15, "doc_count": 1 }, { "key": 30, "doc_count": 0 }, { "key": 45, "doc_count": 1 } ] } } ...
... "aggs" : { "categories" : { "terms" : { "field" : "category" } } } ...
... "buckets": [ { "key": "programming", "doc_count": 2 }, { "key": "refactoring", "doc_count": 1 } ...
... "aggs" : { "min_price" : { "min" : { "field" : "price" } } } ...
... "aggregations": { "min_price": { "value": 14.9 } } ...
... "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } ...
... "aggregations": { "avg_price": { "value": 30.099999999999998 } } ...
... "aggs" : { "price_stats" : { "stats" : { "field" : "price" } } } ...
... "aggregations": { "prices_stats": { "count": 3, "min": 14.9, "max": 45.5, "avg": 30.099999999999998, "sum": 90.3 } } ...
{ "query": { "match": { "name": "legacy code" } }, "highlight": { "fields": { "name": {} } } }
... "highlight": { "name": [ "Working Effectively with <em>Legacy</em> <em>Code</em>" ] } ... "highlight": { "name": [ "Clean <em>Code</em>" ] } ...
19 tasks - learning Query DSL
The data that are used during the workshop is a list of pizzas, with the mapping
Feature: Topic of the task // Use https://www.elastic.co/guide/en/... Scenario: Description of the task Given all pizzas are indexed When I make a query """ { todo } """ Then the response should contain """ { subset }
Total
{ "workshop": "Elasticsearch", "date" : "2016-04-08" }
Subset
{ "date" : "2016-04-08" }