Solr – Structure, Environment, Optimize – Solr - Java Product



Solr – Structure, Environment, Optimize – Solr - Java Product

0 1


lzd-solr-tune


On Github nizsheanez / lzd-solr-tune

Solr

Structure, Environment, Optimize

Author - Alex Sharov

What is it?

  • Search engine.
  • Can to store documents with schema and schemaless mode.
  • Can to find documents with values in fields:
  • compare
  • fulltext
  • range
  • multi value
  • geo

How Lazada use Solr

Solr - Java Product

It's mean that we must have another world environment

App server - Tomcat / Jetty

Tomcat

  • Runing, controll and support Java app
  • Http Server
  • Catalina - part of Tomcat - servlet container (needn't to know what is it)

Jetty

Same

How to run

  • /etc/init.d/solr-live
  • /opt/solr/live/bin/startup.sh
  • /opt/solr/live/bin/catalina.sh
  • exec "$_RUNJDB" "$LOGGING_CONFIG" $JAVA_OPTS $CATALINA_OPTS \
                    -Djava.endorsed.dirs="$JAVA_ENDORSED_DIRS" \
                    -classpath "$CLASSPATH" \
                    -sourcepath "$CATALINA_HOME"/../../java \
                    -Djava.security.manager \
                    -Djava.security.policy=="$CATALINA_BASE"/conf/catalina.policy \
                    -Dcatalina.base="$CATALINA_BASE" \
                    -Dcatalina.home="$CATALINA_HOME" \
                    -Djava.io.tmpdir="$CATALINA_TMPDIR" \
                    org.apache.catalina.startup.Bootstrap "$@" start
  • A lot of Java shit

Solr - Inside

Query live cycle

Solr - Inside

Indexing live cycle. Transactional, Rest API

Cores - use in sandbox

Request Handler

  • Routes - with predefined for different goals.
  • Suggest/Ping/Spellcheck/Update/AdminPanel/debug

Searchers

Objects which handle requests, contain cache etc

Solr kill Searcher object after requests limit or ttl

Indexer

Object which handle inserts and updates. Transactional

Query Cache

  • Key - query
  • Value - N first ID's of documents

Filter Cache

  • If we already have ordered list, than we can to apply filters to this cache, without disturb HDD.
  • Must be greater than count of all variants of facets combinations.
  • Eat a lot of memory

Filter Cache - problems

  • What about price??
  • 3500 categories * 1700 brands * 5 rating * ?? soon will have different filters in different categories)
  • Soon multiple cats and multiple brands

Don't worry

  • 512 it’s enough for 95% hit rate. People don’t so much use filters :-(
  • But it can to eat 8000 fully (because of price, need to exclude)

Document Cache

  • Just store documents in RAM.
  • Must be greater then
    max_results * max_concurrent_queries
    otherwise will need refetch documents during the request.
  • Invalidate On Update

Go to optimize!

Start from Hit Rate

Before

1 - Query Cache

Just up the cache Size: Hit Rate 54% -> 65% Can't increase more Every time have falling down

AutoWarming

After change Searcher we lose cache Need to keep :-) With AutoWarming we can push up Hit Rate to 98% AutoWarming reduce update speed

Why not 100%?

Solr use LFU/LRU algorithms from cache invalidating. When you increase cache size you lose speed on Cache invalidation support

% - it's empy shit, show miliseconds!

We must use cache when our data bigger than RAM Our data is less, we don't disturb HDD anyway. 2-20 ms before and after - depend of venture and weather

2 - Document Cache

Just up the cache Size: 25% -> 60% Can't increase more, every time have falling down

Common!

It's again AutoWarming

Easy!

From Solr docs:

Note: This cache cannot be used as a source for autowarming because document IDs will change when anything in the index changes so they can't be used by a new searcher.

What we can to do?

  • Change Cache algorigthm to LFU. We will have less invalidates.
  • It can to give us 5-10%
  • At night we have much better digits
  • Disable this cache - our data less than RAM

3 - Filter Cache

  • Needn't to cache range filters by float field.

Ok, Hit Rate is ok. What next?

Java and CPU

Java world usually don't use 'fork' command (From POSIX world). Java use MultiThreading.

Threads Count

N_threads = N_cpu * U_cpu * (1 + W / C)

N_threads - the optimal number of threads N_cpu - the number of prcessors; U_cpu - the target CPU utilization (1 if you want to use the full available resources) W / C - the ratio of wait time to compute time (0 for CPU-bound task, maybe 10 or 100 for slow I/O tasks)

Current Threads

  • Jetty - 200
  • Tomcat - 200
  • Indexer - 8

Current CPU's

  • Sandbox - 8
  • Live - 24

JVM configuring

  • Move GarbageCollector to separate thread
    -XX:+UseParallelGC and XX:ParallelGCThreads=n
  • Can to save memory
    -XX:+UseCompressedStrings
  • Can to speed up GC on 64bit
    -XX:+UseCompressedOops

Anything else?

Move Solr consumers to same server with Solr instance Set 'cache=false' to range by float filters More threads for indexing If you have a lot of CPU and big response - turn on traffic compression(Tomcat/Jetty)

Enough

Slides - https://github.com/nizsheanez/lzd-solr-tune.git

Author - Alex Sharov

Base Links