Capstone Project: – Text Prediction Shiny App - Dec 2014



Capstone Project: – Text Prediction Shiny App - Dec 2014

0 0


Capstone


On Github adnarel / Capstone

Capstone Project:

Text Prediction Shiny App - Dec 2014

Author: adnarel Class: dsscapstone-002 Johns Hopkins Data Science Specialization

Text Prediction Problem

The picture shows an example of a text prediction app. The suggested next words are "degree", "specialization", and "skills". The top predicted word is "specialization". How can we use our Data Science skills to create a text prediction app?

Build Corpus

We are provided with large files of text from Blogs, Twitter, and News sources.

We use a sample of text drawn from each of these three sources so that we have about 1,000,000 total words, with approximately one-third from each text source.

We use specialized text-mining libraries in R ("tm" and "RWeka") to clean and summarize the text data and build a 'Corpus' - a collection of text in a format ready for analysis.

We build a "Document Term Matrix" from the Corpus - this is a collection of phrases found in the Corpus and their frequencies. This matrix will be the cornerstone of our predictive text model.

Build Prediction Model

We are concerned with fast execution, so we use this algorithm for text prediction:

Read the user input, clean and validate (2 words or more).

Extract the last two words of the user input.

Search the Document Term Matrix for the most frequent 3-word phrase that begins with these two words.

The third word of the most frequent 3-word phrase is the predicted word.

If there is no matching phrase, select a random word from among a list of very common words.

Shiny App: Text Prediction

Use the Text Box on the left side to enter a phrase (2 words or more). Click "Submit". The predicted word is displayed on the right side. Have fun with the app and have a "Happy New Year!"

Note: Github Repository with R scripts and data used for app.