Elasticsearch – at – soysuper.com



Elasticsearch – at – soysuper.com

0 0


soysuper-esearch-talk

Slides for a talk given to elasticsearch barcelona user group

On Github diegok / soysuper-esearch-talk

Elasticsearch

at

soysuper.com

Diego Kuperman | @freekey

What is Soysuper ?

[Supermarket]

Pro tools

Soysuper Visibility

Soysuper Insights

Our data

7

Supermarkets

4476

Zipcodes with delivery

1292

Clusters

250244

Products from origin

181050

Clustered products

112173

Available

8761707

Active prices

194000

Daily price updates (avg)

Using elasticsearch since 0.13.1

Elastic::Model

Thanks Clint ;-)

Products search

...and navigation

Most users has zipcode and supermarket

(warehouse)

Each product is available in one or more warehouse...

Each warehouse may have different price and/or deal

One product

One ES doc

Basic data

name: "Cerveza",
brand: "Estrella Damm",
variant: "Pack 12x25 cl",
v: { measure: "cl", quantifier: "25", container: "botellin", multiplier: "12" },
keywords: [ "bebida", "cerveza", "nacional", "rubia", "extra" ],
category_path: [ "bebidas/cerveza/nacional" ],
brands: [ ... ],
...

Price object

price: {
  condis: {
    511e1b30a2c8bc21740006aa: [ 5.39 ],
  },
  carrefour: {
    4ff30e32c27e95590200001d: [ 5.56, 'Deal' ],
    4ff30e2fc27e95590200000c: [ 5.35 ],
    ...
  },
  mercadona: {
    53d838fa9717d5d27d000000: [ 5.39 ],
    53a2c4f78e6fd283ac000000: [ 4.3 ],
    ...
  },
  ...
  _mean: 5.34
}

Pros

  • No term weight deviation (IDF)
  • No need for grouping (worked on 0.13.x)
  • Easy to fetch partial

Cons

price: {
  type: "object",
  enabled: false
}

Cons

  • Can't filter
  • Can't sort

:-(

Filter

warehouse: [ 
  "511e1b30a2c8bc21740006aa",
  "4ff30e6a5832e48b02000000",
  "5440df745488b4a35500003c",
  ...
],

supermarket: [ "condis", "mercadona", "corteingles", "carrefour", "eroski" ],

deal: [ "carrefour", "511e1b30a2c8bc21740006aa", ... ],

Sort

mean_price: {
  condis: 5.39,
  mercadona: 5.34,
  corteingles: 5.25,
  carrefour: 5.52,
  mean: 5.29,
  eroski: 4.95
}

:-)

Query string query

{
...
  "query" : {
    "query_string" : {
      "fields" : [
        "name^40", "name.stem^35",
        "brands^10", "brands.stem^2",
        "variant^5", "variant.stem",
        "keywords", "tags",
        "category_name^4",
      ],
      "query" : "cerveza",
      "default_operator" : "AND"
    }
...
   "_source" : {
      "include" : [
         "price.mercadona.51577e06e9725936090417ea",
         ...
      ]
   },
...
}

Segmented ads

Real-time event indexing

Real time conversion funnel

Real time segmented stats

It was so easy!

Real-time indexing (again)

MongoDB capped collection

Sync Daemon: mongo to elastic

(Inflate and add to index in bulk)

Elasticsearch index aliases

1 index per log type and month

Alias = Many indexes

Brand market share

Share evolution

Segmented by anything

Real-time analysis

Market share

Price elasticity

Deal/campaign performance

Buyers profile

Search keywords

Display/selling rate

System stats and failure detection

...

...

Thanks!

Questions?