Examinable Topics for Revision

Statistical Learning

What is statistical (machine) learning?
prediction and inference
interpretability and flexibility
supervised and unsupervised learning

regression and classification problems
mean squared error
bias-variance trade-off
K-nearest neighbors

Linear Regression

Linear Regression
Simple Linear Regression
Assumptions of the Model
Least Squares Estimates (LSE)
LSE Properties & Uncertainty
Maximum Likelihood Estimates
Confidence Intervals
Multiple Linear Regression
Qualitative predictors
Linear Algebra / Matrix Approach

Residual standard error (RSE)
R-squared (R^2)
Linear model as an approximation
Random error \epsilon
dummy variables
Interaction terms (X_iX_j)
Quadratic terms (X_i^2)

Linear Regression II

Linear Model Selection
- Subset selection
- Indirect methods
Subset Selection
- Best subset
- Forward stepwise
- Backwards stepwise
- Hybrid stepwise
C_p, AIC, BIC, and Adjusted R^2

Problems with Linear Regression
- Non-linearity of response-predictor relationships
- Correlation of error terms
- Non-constant variance of error terms
- Outliers, high-leverage points, collinearity, and confounding effects

Logistic Regression

binary classification
Logistic regression
odds, log-odds
logits
error rate (misclassification)

confusion matrix
thresholds
ROC curve
AUC
Poisson regression

Generalised Linear Models

generalised linear models
link functions
distributions
exponential family
mean function
variance function

canonical link functions
null model
full model
deviance & scaled deviance
residuals
degrees of freedom

Cross-validation and regularisation

Train, validation & test sets
Validation Set Approach
k-fold Cross-Validation
Leave-One-Out Cross-Validation
Test error

Regularisation (shrinkage)
- ridge
- lasso
- ~~elastic net~~

Moving Beyond Linearity

interpolation & extrapolation
polynomial regression
- monomials
- orthogonal polynomials
step functions
- basis function expansion
- piecewise polynomial functions

regression splines
- knots
- natural splines
- cubic splines
smoothing splines
local regression
generalised additive models (GAMs)

Tree-based Methods

decision trees
classification trees
regression trees
pruning
ensemble methods

bagging
random forests
boosting
~~gradient boosting~~

Unsupervised Learning

k-means clustering
hierarchical clustering
principal components analysis