Data Science Capstone – Text mining



Data Science Capstone – Text mining

0 0


capstone

Slide Deck for Capstone - Final Project

On Github olalie57 / capstone

Data Science Capstone

Text mining

Final Project - May 2016

Presented by Ola Lie

Explore

Blogs, news and twitter

The content in these texts is explored in this Milestone Report

Algorithm

using tm and RWeka

Create corpus and clean data with tm Create bi-, tri- and tetragrams with RWeka In server.R (shiny)
  • Strip user input to last three (two, one) words
  • Search first three words of tetragrams
  • If no matches, search first two words of trigrams
  • If no matches, search first word of bigrams
  • Calculate percentages for matches

 

Performance

Less than five seconds response time

The first search might take a bit longer when the app is awakening

 

 

Visit the

Web App