ACTL3142 & ACTL5110 Statistical Machine Learning for Risk Applications
Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani
Reading
James et al. (2021), Chapters 1 and 2
Lecture Outline
Data Science vs Actuarial Science
Overview of Statistical Learning
Model Accuracy in Regression Problems
Classification Problems
Lecture Outline
Data Science vs Actuarial Science
Overview of Statistical Learning
Model Accuracy in Regression Problems
Classification Problems
Prediction
Inference
Statistical Learning | Machine Learning | |
---|---|---|
Origin | Statistics | Computer Science |
f(X) | Model | Algorithm |
Emphasis | Interpretability, precision and uncertainty | Large scale application and prediction accuracy |
Jargon | Parameters, estimation | Weights, learning |
Confidence interval | Uncertainty of parameters | No notion of uncertainty |
Assumptions | Explicit a priori assumption | No prior assumption, we learn from the data |
See Breiman (2001)
Many models have the following general form:
Y = f(X) + \epsilon
\Rightarrow Our objective is to find an appropriate f for the problem at hand. If Y is quantitative, we call this a regression problem, which can be non-trivial.
Parametric
Non-parametric
income
dataUsing Education and Seniority to explain Income:
income
dataSuppose you are interested in prediction. Everything else being equal, which types of methods would you prefer?
Supervised
Unsupervised
Regression
Classification
Lecture Outline
Data Science vs Actuarial Science
Overview of Statistical Learning
Model Accuracy in Regression Problems
Classification Problems
Do you see potential problems with using the training MSE to evaluate a model?
The following are the training and test errors for three different problems:
Consider x_0, the predictor(s) of an observation that is not in the training data. The expected test MSE at x_0 can be written as:
\mathbb{E}\bigl(y_0 - \hat{f}(x_0)\bigr)^2 = \text{Var}(\hat{f}(x_0)) + [\text{Bias}(\hat{f}(x_0))]^2 + \text{Var}(\epsilon)
In machine learning problems, there is usually a tradeoff between Bias and Variance (i.e., you can’t improve both simultaneously). In general, “as we use more flexible methods, the variance will increase and the bias will decrease” (James et al., 2021).
The following are the Bias-Variance tradeoff for three different problems:
Lecture Outline
Data Science vs Actuarial Science
Overview of Statistical Learning
Model Accuracy in Regression Problems
Classification Problems
(purple is the Bayes boundary, black is the KNN boundary with K=10)