Search Theory – Internal WordPress Search & The tools to make it better – Search Analytics



Search Theory – Internal WordPress Search & The tools to make it better – Search Analytics

0 0


thoronas.github.io

Github page for presentation

On Github thoronas / thoronas.github.io

Search Theory

Internal WordPress Search & The tools to make it better

By Flynn O’Connor - @thoronas

My name is Flynn O’Connor

  • Developed WordPress themes for 5 years.
  • Lead developer at Forge and Smith
  • Spent the last year investigating ways to improve internal search of WordPress
A large part of this talk will be about Elasticsearch but I think we need to talk about the theory behind search that can make it useful in specific situations before we look at the tools we will use to solve each sites unique search needs. So in part one I will be talking about search patterns and anti patterns and some of the plugins and solutions that are available for improving WordPress default search and in the second part of the presentation we’ll look at some of the basics of Elasticsearch and how it can be used to improve WordPress search using some of the concepts we’re going to cover.

In the beginning...

  • WordPress search does fulltext search on post content and post title.
  • Before WordPress 3.7 search results were ordered by post date
  • 3.7 added basic relevant post ordering on search results.
WordPress built in search is simple. Enter a keyword and a full text search is performed on post titles and post content. Before all posts were sorted by date posted. It was improved in 3.7. Now they have basic relevance ordering. This is an improvement but it's still very basic search functionality that is best suited for smaller sites.

WordPress is growing

  • text search in mySQL scales poorly
  • Performance takes a major hit as you get up to 10,000 posts. Becomes practically unusable at that point.
  • Users need more fine grained control
  • Search is an important part of content strategy.
As the amount of content inputed into WordPress scales the search functionality does not scale with it. The solution for finding content when you have 500 blog posts is not going to be as useful or as effective as when you have 5,000 or 50,000 posts. We need to start looking at how users are searching and interacting with your content. As developers we need to understand user behaviour and content strategy in order to formulate proper technology solutions for improving search functionality.

What are your users doing?

Tracking and analyzing user search behaviour we can learn:

  • Where our site navigation is failing.
  • What people are interested in.
  • How we can improve site profitability (if eCommerce)
By tracking what people are searching our site for we can get some understanding of their intent while on our site. If we can determine where we our sites are failing our users and what they are interested in we can improve user engagement. When ecommerce is involved you can also increase your sites profitiability. If people can’t find what they want, they’re not going to give you money.

Search Analytics

Cheapest and easiest way is via Google analytics

There is a lot of things search analytics can provide beyond just basic search queries. (Discuss bounce rates, ) Search analytics are relatively easy to set up for your existing site, especially if you're already using Google Analytics. There are other

Where to set search analytics

Turn on site search under admin > view settings.

What to set

WordPress search query by default use "s" as a parameter. That is the search parameter in wp_query.

Provide GA with the search query parameter which for WordPress is "s" 99.9% of the time. Look at commonly submitted queries and determine if the results are what you are intending. E-commerce sites can determine if top results are leading to pages that you aren't able to monetize and in turn losing money. It can also show you how much money people are spending on your site when using site search and when they aren’t.

Google analytics provides fantastic in-depth search stats.

For more information about how to analyze site search data read Internal Site Search Analysis: Simple, Effective, Life Altering! by Avinash Kaushik

I highly recommend you read Internal Site Search Analysis on a list apart. It gives you a good overview of how to utilize the google analytics data, even if it’s 5 years old the information is still very relevant and will probably make you really start to appreciate what an amazing tool internal site search analysis can be.

User Behaviour

  • Search functionality can differ from site to site.
  • This will depend on your content.
  • It will also depend on your users.
    • Are your users technologically savvy?
    • Are they familiar with your content?
    • How did they find your site in the first place?

User Goals

Dr. Andrei Broder wrote a report called a Taxonomy of Web Search. He determined there are three broad "types" of search queries.

  • Navigational searches
  • Information searches
  • Transactional searches
Navigational search - City of Calgary invested approximately 2 million dollars in improving their site and a large part of that was Google Custom Search. Informational search - Wikipedia or IMDB lots of informational posts. Transactional search - Amazon product search, users intend to purchase

Precision or Recall?

  • Precision: Finding the most relevant documents
  • Recall: Find all the relevant documents
If we have an understanding of our users we can tune our search engine to respond in a way that better matches their goals. Precision - better for Navigational searches

Search Anti Patterns

Common patterns that result in counterproductive search experiences.

Search in WordPress doesn't offer a lot to improve the situaion. Sometimes as developers and designers we excacerbate the situation with our decisions or lack of thought about the patters our users go through.

Thrashing

If a user is unfamiliar with your content they can enter key words that might not reflect their intentions. Trying variations of flawed search terms return poor results, leads to frustrated users.

Pogo Sticking

When a user is constantly jumping back and forth between SERP and individual results.

Search Patterns

Your search engine results page (SERP) needs to be structured in a way that helps users find the results they're looking for.

  • Ensure relevant info is returned per post.
  • Style visited links
  • Highlight search term

Auto Complete

Help users find exactly what they are looking for faster.

WP Search Suggest by Konstantin Obenland

Suggest Alternative Queries

Guide the user when they are exploring your content.

Relevanssi by Ville Saari

Advanced Functionality

Control weighting of your content relevance.

Search meta, taxonomies, custom post types.

Keyword stemming

Search WP by Jonathan Christopher

What are Stemmers?

Creating tokens out of the root of words

"Computers, Computing, Computes"

[Comput]

Facets

Filter search results by taxonomies or meta data.

Facet WP by Matt Gibbs

Growing beyond internal solutions

Eventually performance issues become too difficult to overcome.

Third party solutions that hijack WordPress search functionality are a better solution

Resources

Questions?

Before we dive into Elasticsearch, are there questions so far?

Why Use Elasticsearch

  • Very fast
  • Scales well
  • Advanced queries
  • Robust faceting
  • Great geo search!

How to integrate Elasticsearch to WordPress

  • Create an index and map fields
  • Index WordPress posts in Elasticsearch
  • Hijack WordPress default search
    • Query Elasticsearch
    • Replace WordPress search query with results

Connecting to Elasticsearch

The HTTP API

Make remote requests from WordPress to Elasticsearch

$url = 'http://local.wordpress.dev:9200/{index}/{type}/{action}';
$args = array('method' => 'GET');
$response = wp_remote_request( $url, $args );

Let's create an index

Creating a basic index in Elasticsearch.

$url = 'http://local.wordpress.dev:9200/wp-index';
$args = array('method' => 'PUT');
$response = wp_remote_request( $url, $args );

Mapping WordPress post data

Specify a document type called posts in our index.

$mapping = array(
  'mappings' => array(
    'post' => array( // post property field mappings go here. )
  ),
  'settings' => array( // custom settings & analysis goes here )
);							
						

Inside the post mappings

'post' => array( 
  'properties' => array(
    'post_title' => array( 'type' => 'string' ),
    'post_content' => array( 'type' => 'sting' ),
    'post_id' => array( 'type' => 'long' ),
    'post_date' => array(
	  'type' => 'date',
	  'format' => 'YYYY-MM-dd HH:mm:ss',
    )
  )
)

Elasticsearch Core Field Types

  • string - text
  • integer - 32 bit integers
  • long - 64 bit integers
  • float - IEEE floats
  • double - double precision floats
  • boolean - true/false
  • date - UTC date/time
  • geo_point - Lat/Long

Advanced Field Types

Elasticsearch also supports array, object, and multi-field types.

'post_author' => array( 
  'type' => 'multi_field',
    'fields' => array(
      'author' => array( 'type' => 'string' ),
      'author_raw' => array(
        'type' => 'string',
        'index' => 'not_analyzed'
      )
    )	
  )
)

Dynamic templates!

What happens if you add new data fields?

Example: Custom Taxonomies

"dynamic_templates" => array(
  array(
    "template_terms" => array(
      "path_match" => "terms.*",
      "mapping" => array(
        "type" => "object",
          "properties" => array(
            "name" => array( "type" => "string" ),
            "term_id" => array( "type" => "long" )
          )
        )
      )
    )
  )
)

Passing Post Content to Elasticsearch

Index & mapping done we can now populate with content

To do so we need to do the following:

Get the posts within WordPress JSON encode the post data Send the post data to Elasticsearch

Getting the posts

Match the post data to our mapping.

$post_for_ES = array(   
  'post_title' => get_the_title(),
  'post_content' => get_the_content(),
  'post_id' => get_the_ID(),
  'post_date' => get_the_date()
);

Once the posts have been encoded we have two options for sending them to posts

  • Single Posts - Use HTTP verb PUT
  • Multiple Posts - Use elasticsearch Bulk API

Single Post

$url = 'http://local.wordpress.dev:9200/wp-index/post/1';
$post_content = json_encode($post_for_ES);
$args = array('method' => 'PUT', 'body' => $post_content);
$response = wp_remote_request( $url, $args );

Searching!

In order to hijack WordPress default search functionality we need to do the following:

  • Grab the search query before querying the Database
  • Query Elasticsearch instead
  • Parse the post ids from the results
  • Replace the search query with ES results

pre_get_posts to the rescue!

Use pre_get_posts to capture search query.

Query Elasticsearch and return array of post id's

function search_filter($query) {
  if ( !is_admin() && $query->is_main_query() && $query->is_search ) {
    $search_query = stripslashes( get_search_query( false ) );
    $elasticsearch_posts = elasticsearch_function($search_query);
    set_query_var( 'post__in', $elasticsearch_posts);
  }
}
add_action('pre_get_posts','search_filter');

Make sure to nuke the default WordPress search.

function clear_sql_search_clause( $search ) {
  if( is_search() && ! is_admin() ) {
    $search = '';
  }
  return $search;
}
add_filter( 'posts_search', 'clear_sql_search_clause');

Querying Elasticsearch

Querying documents in Elasticsearch utilizes several APIs that are nested in the Search API:

  • Query DSL - 39 different query types
  • Filter API - 27 different filter types
  • Aggregations API - 20 different facet types

Querying WordPress data

Querying multiple fields is a common requirement.

Example: query post title and post content.

$ES_query = array(
  'query' => array(
    // specify query type
    'multi_match' => array(
      // the query term
      'query' => 'beer',
      // what fields to search through
      'fields' => array('post_title^2', 'post_content')
    )
  )
);

Take the constructed query, pass it to HTTP API.

//search our wp-index
$url = "http://local.wordpress.dev:9200/wp-index/_search";
$method = "POST";
//pass the query we constructed in the previous slide
$body = json_encode($ES_query);
$arg = array ( 'method' => $method, 'body' => $body);
$request = wp_remote_request ($url, $arg);

Filter Queries

Run a nested query after applying filters

$ES_query = array(
  'query' => array(
    'filtered' => array(
      'query' => array(
        'multi_match' => array(
          'query' => 'beer',
            'fields' => array('post_title^2', 'post_content')
          )
        ),
        'filter' => array(
          'term' => array(
            "post_author.author_raw" => "Flynn"
          )
        )
      ))));

Parse the results

We've queried Elasticsearch and got results!

We need to parse the results.

//our Elasticsearch query from previous slides
$request = wp_remote_request ($url, $arg); 

//grab the body of request which has the found posts
$results = json_decode(wp_remote_retrieve_body($request));

//pass the results into variable.
$hits = $results->hits->hits;

// do what you want with the data from here. 

Just the beginning

Some examples of Elasticsearch functionality

  • Improve e-commerce product searching with price ranges
  • Search for posts within a google maps boundaries
  • real time auto complete of almost any data type.
  • Create custom analyzers for unqiue search applications

Aggregations

When querying Elasticsearch you can specify particular data to be returned as facets or aggregations.

Useful for getting aggragate date on:

  • Posts in Categories
  • Number of products within price ranges
  • Geo distance from a location
  • Combine aggregations to make custom aggregates

Third Party API's

WordPress.com related posts available through Jetpack.

Swiftype provides managed Elasticsearch functionality.

Customize either using Elasticsearch API's

Useful Tools

Resources

Thank you for listening!

Questions?