ACTL3142 & ACTL5110 Statistical Machine Learning for Risk Applications
Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani
Source: Actuaries Institute
Lecture Outline
Statistical learning
Assessing model accuracy
Prediction
Inference
Statistical Learning | Machine Learning | |
---|---|---|
Origin | Statistics | Computer Science |
f(X) | Model | Algorithm |
Emphasis | Interpretability, precision and uncertainty | Large scale application and prediction accuracy |
Jargon | Parameters, estimation | Weights, learning |
Confidence interval | Uncertainty of parameters | No notion of uncertainty |
Assumptions | Explicit a priori assumption | No prior assumption, we learn from the data |
See Breiman (2001) and Why a Mathematician, Statistician, & Machine Learner Solve the Same Problem Differently
Recall that in regression, we model an outcome against the factors which might affect it
Y = f(X) + \epsilon
\Rightarrow Our objective is to find an appropriate f for the problem at hand. Harder than it sounds
Parametric
Non-parametric
income
dataUsing Education and Seniority to explain Income:
income
dataSuppose you are interested in prediction. Everything else being equal, which types of methods would you prefer?
Supervised
Unsupervised
Regression
Classification
Lecture Outline
Statistical learning
Assessing model accuracy
What are some potential problems with using the training MSE to evaluate a model?
Consider the example below. The true model is black, and associated ‘test’ data are identified by circles. Three different fitted models are illustrated in blue, green, and orange. Which would you prefer?
The following are the training and test errors for three different problems:
The expected test MSE can be written as:
\mathbb{E}\bigl(y_0 - \hat{f}(x_0)\bigr)^2 = \text{Var}(\hat{f}(x_0)) + [\text{Bias}(\hat{f}(x_0))]^2 + \text{Var}(\epsilon)
There is often a tradeoff between Bias and Variance
The following are the Bias-Variance tradeoff for three different problems:
Objective
Bayes’ Classifier
K-nearest neighbours
(purple is the Bayes boundary, black is the KNN boundary with K=10)