heweb14-aim7



heweb14-aim7

0 0


heweb14-aim7

HighEdWeb 2014 - Don't like your Google Search Interface? Make your Own!

On Github cdchase / heweb14-aim7

Don't like your Google Search Interface? Make your Own!

C. Daniel Chase — @cdchase

The University of Tennessee at Chattanooga

#aim7 #heweb14

Puzzle Parts

  • Comparisons
  • Search Form
  • Search Request Processing
  • Search API
  • Result Processing
  • Customizing Output
  • Integrating into Website
  • Page Not Found (404) Handling

Comparisons

  • Google Custom Search Engine
  • Google Site Search
  • Google Search Appliance (GSA)

Google Custom Search Engine

  • Free
  • Cannot Customize Results
  • Ads on results pages (can be disabled for non-profit)
  • Google Branded

Google Site Search

  • Formerly Google Custom Search Business Edition
  • NOT Free!
  • Licensed by Number of Searches
  • Indexes 13 file formats
  • Our Search count 934,000+/year = Over $2,000 for license
  • Larger license is for off-line engine
  • XML Results Query Reference

Google Search Appliance (GSA)

  • Hardware
  • Licensed by Document Count
  • Indexes over 220 file formats
  • Can index sites requiring authentication

Unmodified GSA Search Interface

Unmodified GSA Search Engine Results Page (SERP)

UT System GSA Search Interface

UT System GSA Search Engine Results Page

UTC Customized Search Engine Results Page

Making a Search Query

    
        if(isset($_POST['q']) && $_POST['q'] != '') {
        $url = "http://google.tennessee.edu/search?"
        . "client=utk_frontend&"
        . "output=xml_no_dtd&"
        . "sort=date:D:L:d1&"
        . "entqr=3&"
        . "ie=UTF-8&"
        . "ud=1&"
        . "site=Chattanooga&"
        . "start=0&"
        . "q=" . urlencode(stripslashes($_POST['q']));
        $q = html_entity_decode(strip_tags($_POST['q']));
        }
    
http://google.tennessee.edu/search?client=utk_frontend&output=xml_no_dtd&
    sort=date:D:L:d1&entqr=3&ie=UTF-8&ud=1&site=Chattanooga&
    start=0&q=university%20web%20services

Search Query - Response

        
            0.256643university web services
                1940/search?q=university+web+services&site=Chattanooga&lr=&ie=UTF-8&output=xml_no_dtd&client=utk_frontend&access=p&sort=date:D:L:d1&start=10&sa=Nhttp://www.utc.edu/university-web-services/
                        http://www.utc.edu/university-web-services/http://www.utc.edu/university-web-services/<b>University Web Services</b>10T4-ALRXDAUDJC2WK<b>...</b> <b>University Web Services</b>. <b>...</b> Remember, this list only goes to the <b>University Web</b><br> <b>Services</b> team, not all editors as it did previously. Managing Websites. <b>...</b>  
                        en
    

Search API

Bookmark the Reference documentation!

https://support.google.com/gsa/answer/3890846?hl=en&ref_topic=2709671

More specifically, the Search Protocol Reference:

http://www.google.com/support/enterprise/static/gsa/docs/admin/72/gsa_doc_set/xml_reference/

Required Search Parameters

site Limits search results to the contents of the specified collection.
site=Chattanooga
client A string that indicates a valid front end and the policies defined for it, including KeyMatches, related queries, filters, remove URLs, and OneBox Modules.
client=utk_frontend
output Selects the format of the search results.
output=xml_no_dtd
q Search query as entered by the user.
q=university%20web%20services

Preference Search Parameters

sort Results can be sorted by relevance, date or metadata.
sort=date:D:L:d1
entqr This parameter sets the query expansion policy. 3 is Full: Uses both standard and local synonym files.
entqr=3
ie Sets the character encoding that is used to interpret the query.
ie=UTF-8
ud Specifies whether results include ud tags. A ud tag contains internationalized domain name (IDN) encoding for a result URL.
ud=1

More Search Parameters

start Specifies the index number of the first entry in the result set that is to be returned. (Use with num.)
start=0

Result Processing

We base our search result handling on the same template provided with GSA — Customized. But, you can build your own.

  • Remove SERP <head> content to wrap with your template.
  • Replace references to search in links and form action to point at your new page.
  • Review settings in top of GSA default XSL for configurable options.
  • Remove or fine-tune page top & bottom content.
  • Remove conflicting CSS.

Default Header

Default Footer

Customizing Output

  • Start with built-in options
    • Replaced the Google logo
    • Added the header used on the organization's web site.
    • Changed search button text
    • Changed the advanced search anchor text
    • ...
  • Review output for other changes
  • Don't be afraid of (do not customize)

UTC Search Header

UTC Search Footer

Integrating into Website

  • Every page should have search form!
  • Customize page content to improve search (SEO)
  • Add standard description & keyword meta tags
  • Add custom meta tags

Custom Meta Tags

Page Not Found (404) Handling

  • Don't redirect directly to search page!
  • Must send 404 Error to search engines crawlers
  • Be nice to people — Do a search for them!
  • Historic page redirects
  • Parse requested URL and use it to search!
  • The Trick: Plain HTML 404 page with JavaScript redirect

404 Example

Old URL: http://www.utc.edu/Administration/UniversityRelations/staff.php

Questions?

C. Daniel Chase — @cdchase

Dan-Chase@UTC.edu

The University of Tennessee at Chattanooga

#aim7 #heweb14