Coursera Capstone Project – Final Data Product



Coursera Capstone Project – Final Data Product

0 0


courseracapstone


On Github billxiao0 / courseracapstone

Coursera Capstone Project

Final Data Product

Created by Bill Xiao

Background of Project

  • Final project of Data Science CapStone

  • Main task is to develop a data prediction model

  • Source data is from Cousera Dataset

  • The prediction model will predict the next word after user has input a few words

Tasks of the Project

  • There are 8 tasks of the project:

    • Understanding the problem
    • Data acquisition and cleaning
    • Exploratory analysis
    • Statistical modeling
    • Predictive modeling
    • Creative exploration
    • Creating a data product
    • Creating a short slide deck pitching your product

Description of Algorithm

  • Based on Natural Language Processing

  • All 3 sets of data were used to train the model:

    • blogs
    • news
    • twitter
  • N-Gram Back Off model was used for development

  • Data was cleaned and processed into data frames with corresponding sorted cumulative frequencies

    • 1-gram
    • 2-gram
    • 3-gram
    • 4-gram
  • Katz Back-off algorithm was used for the word prediction

Description of Shiny App

  • Access the shiny app here

  • The Shiny App takes words input and predict the next word

  • Enter the words in the input box and press the "Predict" button

  • The Next Word will be shown in the output box

  • The Note will explain which N-gram algorithm was used to predict the next word

  • Thank you for taking time to evaluate my app and have a good day :)