Predicting Car Acceleration

A Random Forest Prediction Exercise

for the Johns Hopkins Bloomberg School of Public Health

Data Science Specialization -- Developing Data Products

via Coursera.org

May 2015

What does it do?

This Shiny App uses the Auto dataset in the ISLR package. It sets up a Random Forest prediction model to determine a car's acceleration based on several attributes.

Data Exploration - Auto dataset

392 observations (vs. 32--mtcars), 9 variables (vs. 11--mtcars).
2 numeric variables (year, origin) that should be ordinal
3 variables have relatively higher correlation to acceleration

The prediction model

Feature Selection and Cross Validation Fine-Tuning

horsepower
displacement
weight
cylinders

Fine-tune with 10-fold cross validation

Sample Error Rate

Random Forest has lower RMSE and explains more of the data variability.

Model	RMSE	$R^2$	RMSE sd	$R^2$ sd
Random Forest	1.4171	0.7508	0.2087	0.0730
Bayesian Generalized Linear Model	1.5208	0.7210	0.2411	0.0970
Generalized Additive Model	1.4838	0.7121	0.1437	0.0850