Predicting Car Acceleration

A Random Forest Prediction Exercise

for the Johns Hopkins Bloomberg School of Public Health

Data Science Specialization -- Developing Data Products

via Coursera.org

May 2015

What does it do?

This Shiny App uses the Auto dataset in the ISLR package. It sets up a Random Forest prediction model to determine a car's acceleration based on several attributes.

Data Exploration - Auto dataset

  1. 392 observations (vs. 32--mtcars), 9 variables (vs. 11--mtcars).
  2. 2 numeric variables (year, origin) that should be ordinal
  3. 3 variables have relatively higher correlation to acceleration
    plot of chunk unnamed-chunk-1

The prediction model

Feature Selection and Cross Validation Fine-Tuning

  • horsepower
  • displacement
  • weight
  • cylinders

     
    Fine-tune with 10-fold cross validation

Sample Error Rate

Random Forest has lower RMSE and explains more of the data variability.

Model RMSE $R^2$ RMSE sd $R^2$ sd
Random Forest 1.4171 0.7508 0.2087 0.0730
Bayesian Generalized Linear Model 1.5208 0.7210 0.2411 0.0970
Generalized Additive Model 1.4838 0.7121 0.1437 0.0850