Dumbbell curl performed in 5 different ways:
Use machine learning to create a classifier:
12Mb Data set ~20k observations of 160 variables Initially just threw it all into a randomForest model Not Good!
Cut down the amount of variables from 160 to 53:
Used Random Forests Partitioned training data further:
Model was incredibly accurate (0.5% Error rate) Overfit Tried boosted decision trees, similar accuracy
Random Forest model accurate on validation set 99.35% Accuracy (Close to previous error rate)
Suspected overfit Full marks for test
No idea
Came back to this another time Read documentation of the data Weight lifting coach!
Cluster your viewers Build a classifier to identify other people into clusters