SlideDeck.io – A repository of great HTML presentations
Search Engine with Lucene – Summary – Clean Dump XML
View Github Repository
Open presentation in a new window
macilry
See all presentation from macilry
Search Engine with Lucene – Summary – Clean Dump XML
0
0
slide-Search-Engine
On Github
macilry / slide-Search-Engine
Search Engine with Lucene
Summary
Clean Dump XML
Create index with Lucene
Request on index
Display result
Clean Dump XML
Extract title and entities
Tool used : SAX (parser XML for JAVA) Regex
For not overload the memory :
Write in file at each node page
Write stream with JAVA
Create index with Lucene
Read file clean XML
For each page, create document (Lucene's class used)
Content of this document : Concatenation entities, title, id
Increment (+1) boost title field
An other point :
Specify french analyzer in Lucene configuration for accented character
Request on index
Search on multiple fields
Use french analyzer for parsing request
Request return the first 20 results
Create objet for each result and sort entities by occurrences
Display result
Technologies Used
JAVA Enterprise Edition ( servlet, jsp, jslt )
Apache Tomcat
Librairy jqCloud for generate cloud tag in JS