Coursera Capstone Project – Final Data Product

View Github Repository Open presentation in a new window

billxiao0

See all presentation from billxiao0

Coursera Capstone Project – Final Data Product

0 0

courseracapstone

On Github billxiao0 / courseracapstone

Coursera Capstone Project

Final Data Product

Created by Bill Xiao

Background of Project

Final project of Data Science CapStone
Main task is to develop a data prediction model
Source data is from Cousera Dataset
The prediction model will predict the next word after user has input a few words

Tasks of the Project

There are 8 tasks of the project:
- Understanding the problem
- Data acquisition and cleaning
- Exploratory analysis
- Statistical modeling
- Predictive modeling
- Creative exploration
- Creating a data product
- Creating a short slide deck pitching your product

Description of Algorithm

Based on Natural Language Processing
All 3 sets of data were used to train the model:
- blogs
- news
- twitter
N-Gram Back Off model was used for development
Data was cleaned and processed into data frames with corresponding sorted cumulative frequencies
- 1-gram
- 2-gram
- 3-gram
- 4-gram
Katz Back-off algorithm was used for the word prediction

Description of Shiny App

Access the shiny app here
The Shiny App takes words input and predict the next word
Enter the words in the input box and press the "Predict" button
The Next Word will be shown in the output box
The Note will explain which N-gram algorithm was used to predict the next word
Thank you for taking time to evaluate my app and have a good day :)