Solr

Structure, Environment, Optimize

Author - Alex Sharov

What is it?

Search engine.
Can to store documents with schema and schemaless mode.
Can to find documents with values in fields:
compare
fulltext
range
multi value
geo

How Lazada use Solr

Solr - Java Product

It's mean that we must have another world environment

App server - Tomcat / Jetty

Tomcat

Runing, controll and support Java app
Http Server
Catalina - part of Tomcat - servlet container (needn't to know what is it)

Jetty

Same

How to run

/etc/init.d/solr-live
/opt/solr/live/bin/startup.sh
/opt/solr/live/bin/catalina.sh

exec "$_RUNJDB" "$LOGGING_CONFIG" $JAVA_OPTS $CATALINA_OPTS \
                -Djava.endorsed.dirs="$JAVA_ENDORSED_DIRS" \
                -classpath "$CLASSPATH" \
                -sourcepath "$CATALINA_HOME"/../../java \
                -Djava.security.manager \
                -Djava.security.policy=="$CATALINA_BASE"/conf/catalina.policy \
                -Dcatalina.base="$CATALINA_BASE" \
                -Dcatalina.home="$CATALINA_HOME" \
                -Djava.io.tmpdir="$CATALINA_TMPDIR" \
                org.apache.catalina.startup.Bootstrap "$@" start

A lot of Java shit

Solr - Inside

Query live cycle

Solr - Inside

Indexing live cycle. Transactional, Rest API

Cores - use in sandbox

Request Handler

Routes - with predefined for different goals.
Suggest/Ping/Spellcheck/Update/AdminPanel/debug

Searchers

Objects which handle requests, contain cache etc

Solr kill Searcher object after requests limit or ttl

Indexer

Object which handle inserts and updates. Transactional

Query Cache

Key - query
Value - N first ID's of documents

Filter Cache

If we already have ordered list, than we can to apply filters to this cache, without disturb HDD.
Must be greater than count of all variants of facets combinations.
Eat a lot of memory

Filter Cache - problems

What about price??
3500 categories * 1700 brands * 5 rating * ?? soon will have different filters in different categories)
Soon multiple cats and multiple brands

Don't worry

512 it’s enough for 95% hit rate. People don’t so much use filters :-(
But it can to eat 8000 fully (because of price, need to exclude)

Document Cache

Just store documents in RAM.
Must be greater then
```
max_results * max_concurrent_queries
```
otherwise will need refetch documents during the request.
Invalidate On Update

Go to optimize!

Start from Hit Rate

Before

1 - Query Cache

Just up the cache Size: Hit Rate 54% -> 65% Can't increase more Every time have falling down

AutoWarming

After change Searcher we lose cache Need to keep :-) With AutoWarming we can push up Hit Rate to 98% AutoWarming reduce update speed

Why not 100%?

Solr use LFU/LRU algorithms from cache invalidating. When you increase cache size you lose speed on Cache invalidation support

% - it's empy shit, show miliseconds!

We must use cache when our data bigger than RAM Our data is less, we don't disturb HDD anyway. 2-20 ms before and after - depend of venture and weather

2 - Document Cache

Just up the cache Size: 25% -> 60% Can't increase more, every time have falling down

Common!

It's again AutoWarming

Easy!

From Solr docs:

Note: This cache cannot be used as a source for autowarming because document IDs will change when anything in the index changes so they can't be used by a new searcher.

What we can to do?

Change Cache algorigthm to LFU. We will have less invalidates.
It can to give us 5-10%
At night we have much better digits
Disable this cache - our data less than RAM

3 - Filter Cache

Needn't to cache range filters by float field.

Ok, Hit Rate is ok. What next?

Java and CPU

Java world usually don't use 'fork' command (From POSIX world). Java use MultiThreading.

Threads Count

N_threads = N_cpu * U_cpu * (1 + W / C)

N_threads - the optimal number of threads N_cpu - the number of prcessors; U_cpu - the target CPU utilization (1 if you want to use the full available resources) W / C - the ratio of wait time to compute time (0 for CPU-bound task, maybe 10 or 100 for slow I/O tasks)

Current Threads

Jetty - 200
Tomcat - 200
Indexer - 8

Current CPU's

Sandbox - 8
Live - 24

JVM configuring

Move GarbageCollector to separate thread

-XX:+UseParallelGC and XX:ParallelGCThreads=n

Can to save memory
```
-XX:+UseCompressedStrings
```
Can to speed up GC on 64bit
```
-XX:+UseCompressedOops
```

Anything else?

Move Solr consumers to same server with Solr instance Set 'cache=false' to range by float filters More threads for indexing If you have a lot of CPU and big response - turn on traffic compression(Tomcat/Jetty)

Enough

Slides - https://github.com/nizsheanez/lzd-solr-tune.git

Author - Alex Sharov

Solr – Structure, Environment, Optimize – Solr - Java Product

nizsheanez

Solr – Structure, Environment, Optimize – Solr - Java Product

0 1 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

lzd-solr-tune

Solr

Structure, Environment, Optimize

What is it?

How Lazada use Solr

Solr - Java Product

App server - Tomcat / Jetty

Tomcat

Jetty

How to run

Solr - Inside

Solr - Inside

Cores - use in sandbox

Request Handler

Searchers

Indexer

Query Cache

Filter Cache

Filter Cache - problems

Don't worry

Document Cache

Go to optimize!

Before

1 - Query Cache

AutoWarming

Why not 100%?

% - it's empy shit, show miliseconds!

2 - Document Cache

Common!

It's again AutoWarming

Easy!

What we can to do?

3 - Filter Cache

Ok, Hit Rate is ok. What next?

Java and CPU

Threads Count

Current Threads

Current CPU's

JVM configuring

Anything else?

Enough

Base Links

0 1