\star (Question #9, ACTL3003/5106 Final Exam 2005)
A random variable Y is said to have an exponential dispersion model if its density can be expressed in the form
f_Y(y; \theta, \psi) = \exp\left(\frac{y\theta-b(\theta)}{\psi}+c(y;\psi)\right)
where \theta and \psi are parameters and b(\cdot) and c(\cdot;\cdot) are both functions
Automobile insurance claims experience data of a French insurance company for a two-year period, beginning January 2001 and ending December 2002, is being modeled using a Generalized Linear Model (GLM) framework.
Assume that the number of claims for risk class i, Y_i, has a Poisson distribution with probability density function of the form
f_Y(y;\mu) = \frac{\mathrm{e}^{-\mu}\mu^y}{y!}, \text{ for } y = 0,1,2, \dots
where its mean \mu>0 is related to the variables SEX
, VEH_AGE
, AGE
, and LOYALTY
as
\log{\left(\mu_i\right)}=\log{(\texttt{EXP}_i)}+\beta_0+\beta_1\texttt{SEX}_i+\beta_2\texttt{VEH}\_\texttt{AGE}_i(1)+\beta_3\texttt{VEH}\_\texttt{AGE}_i(2)+\beta_4\texttt{AGE}_i+\beta_5\texttt{LOYALTY}_i
where detailed description of variables and their definitions are found below:
SEX |
1 = male; 2 = female. |
VEH_AGE |
1 = less than 1 year; 2 = between 1-2 years; 3 = 2 years and more. |
AGE |
1 = 20 years and below; 2 = above 20 years. |
LOYALTY |
1 = has been client for past 36 months; 0 = otherwise. |
Y |
total number of claims |
EXP |
total number of policies exposed to claims |
Note that the variables VEH_AGE
(1) and VEH_AGE
(2) in the regression equation are the respective indicator variables for VEH_AGE
of types 1 and 2.
SAS output for running PROC GENMOD
on the data is provided below.
Show that the Poisson distribution can be written in exponential dispersion form. Identify the dispersion and canonical parameters, \psi and \theta respectively, in terms of \mu, to the extent possible.
Derive the expression for the deviance of the Poisson GLM model.
Based on the deviances provided in the SAS output, analyze the adequacy of the model
Explain the meaning of overdispersion in the context of a Poisson GLM model.
Solution
An insurance company has a set of n risks (i=1,2,\dots,n) for which it has recorded the number of claims per month, Y_{ij}, for m months (j=1,2,\dots,m). It is assumed that the number of claims for each risk, for each month, are independent Poisson random variables with \mathbb{E}(Y_{ij}) = \mu_{ij}. These random variables are modelled using a Generalized Linear Model, with \log{\mu_{ij}} = \beta_i,\ \text{for} \ i = 1,2,\dots,n.
Derive the maximum likelihood estimator of \beta_i
Show that the deviance for this model is
2\sum_{i=1}^{n}{\sum_{j=1}^{m}{\left(y_{ij}\log{\frac{y_{ij}}{\bar{y_i}}{}}-(y_{ij}-\bar{y_i})\right)}} where \bar{y_i}=\frac{1}{m}\sum_{j=1}^{n}{y_{ij}}.
A company has data for each month over a 2 year period. For one risk, the average risk of claims per month was 17.45. In the most recent month for this risk, there were 9 claims. Calculate the contribution that this observations makes to the deviance.
Solution
There are m male drivers in each of three age groups, and data on the number of claims made during the last year are available. Assume that the numbers of claims are independent Poisson random variables. If Y_{ij} is the number of claims for the jth male driver in group i (i = 1,2,3; j = 1,2,\dots,m), let \mathbb{E}(Y_{ij}) = \mu_{ij} and suppose \log{(\mu_{ij})} = \alpha_i.
Show that this is a Generalized Linear Model, identifying the link function and the linear predictor.
Determine the log-likelihood, and the maximum likelihood estimators of \alpha_i for i = 1,2,3.
For a particular data set with 20 observations in each group, several models are fitted, with deviances as shown below:
Link function Deviance
Model 1 \log{(\mu_{ij})}=\alpha_i 60.40
Model 2 \log{(\mu_{ij})}= \begin{cases} \alpha ,& \text{if} \ i=1,2 \\ \beta ,& \text{if} \ i=3 \end{cases} 61.64
Model 3 \log{(\mu_{ij})}=\alpha 72.53
Determine whether or not model 2 is a significant improvement over model 3, and whether or not model 1 is a significant improvement over model 2
Interpret these three models
Solution
An insurance company tested for claim sizes under two factors, i.e. **CAR**, the insurance group into which the car was placed, and **AGE**, the age of the policyholder (i.e. two-way contingency table). It was assumed that the claim size y_i follows a gamma distribution i.e.
f(y_i)=\frac{1}{\Gamma(\nu_i)\:y_i}\left(\frac{y_i\:\nu_i}{\mu_i}\right)^{\nu_i}\exp{\left(-\frac{y_i\:\nu_i}{\mu_i}\right)\;\; \text{for}\;\;y_i\geq0,\; \mu_i >0,\;\nu_i=1}
with a log-link function. Analysis of a set of data for which n=8 provided the following SAS output:
1 |
27 |
1 |
1 |
25.53 |
3.24 |
0.30 |
2 |
16 |
1 |
2 |
24.78 |
3.21 |
-1.90 |
3 |
36 |
1 |
1 |
|
3.41 |
1.03 |
4 |
45 |
1 |
2 |
38.09 |
3.64 |
1.11 |
5 |
38 |
2 |
1 |
40.85 |
3.71 |
-0.46 |
6 |
27 |
2 |
2 |
36.97 |
3.61 |
-1.73 |
7 |
14 |
2 |
1 |
|
2.45 |
0.69 |
8 |
6 |
2 |
2 |
14.59 |
2.68 |
-2.55 |
Calculate the fitted claim sizes missing in the table.
Solution