On Github outrigger / DSSCapstoneDeck
The goal of this project is to develop a predictive keyboard application that takes in a phrase and outputs a prediction of the next word.
The training dataset comes from a corpus called HC Corpora. It consists of three files containing unstructured data from blogs, news articles and tweets from Twitter.
This dataset was first sampled and processed to generate the n-grams required.
The n-grams were then used to build a prediction engine for the app.
Summary of workflow: