set.seed(1)
<- rnorm(100)
x <- 2 * x + rnorm(100) y
Lab 2: Linear Regression I
Questions
Conceptual Questions
Simple linear regression questions
\star Prove that the Least Squared coefficient estimates (LSE) for \widehat{\beta}_0 and \widehat{\beta}_1 are:
\widehat{\beta}_0 = \overline{y} - \widehat{\beta}_1 \overline{x}
\widehat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i-\overline{x}) \cdot (y_i-\overline{y})}{\sum_{i=1}^{n} (x_i-\overline{x})^2} = \frac{S_{xy}}{S_{xx}}
Prove that the estimates in Q1 are unbiased.
Prove that the MLE estimates of \widehat{\beta}_0 and \widehat{\beta}_1 are equal to the ones given by LSE (from Q1).
Prove SST=SSE+SSM
Express SSM in terms of a) \beta_1 and b) \beta_1^2
Prove the following variance formulas:
\begin{aligned} \mathbb{V}\left( \widehat{\beta}_0 |X\right) &= \sigma^2\left(\frac{1}{n}+\frac{\overline{x}^2}{S_{xx}}\right)\\ \mathbb{V}\left( \widehat{\beta}_1 |X\right) &= \frac{\sigma^2}{S_{xx}} \\%= \frac{n\sigma^2}{nS_{xx}-S_{x}^2}\\ \text{Cov}\left( \widehat{\beta}_0, \widehat{\beta}_1 | X\right) &= -\frac{ \overline{x} \sigma^2}{S_{xx}} %s_\epsilon^2 = \mathbb{V}\left(\widehat{\epsilon}\right) &= \frac{nS_{yy}-S_y^2-\widehat{\beta}_1^2(nS_{xx}-S_x^2)}{n(n-2)} \end{aligned}
Prove \mathbb{V}(\widehat{y}_0 | X ) = \left( \dfrac{1}{n}+\frac{\left( \overline{x}-x_{0}\right) ^{2}}{S_{xx}}\right) \sigma ^{2}, where \widehat{y}_0 = \widehat{\beta}_0 + \widehat{\beta}_1 x_0.
Remember that (x_0, y_0) is a new (but fixed) observation, i.e. not in the training set used to find \widehat{\beta}_0 and \widehat{\beta}_1.
Prove:
\mathbb{E}[{Y}_0-{\widehat{y}}_0 | X ] = 0
\mathbb{V}({Y}_0-{\widehat{y}}_0 | X ) = \sigma^2\left(1+{1\over n}+{(\overline{x} - x_i)^2\over S_{xx}}\right)
Forensic scientists use various methods for determining the likely time of death from post-mortem examination of human bodies. A recently suggested objective method uses the concentration of a compound (3-methoxytyramine or 3-MT) in a particular part of the brain. In a study of the relationship between post-mortem interval and the concentration of 3-MT, samples of the approximate part of the brain were taken from coroners cases for which the time of death had been determined form eye-witness accounts. The intervals (x; in hours) and concentrations (y; in parts per million) for 18 individuals who were found to have died from organic heart disease are given in the following table. For the last two individuals (numbered 17 and 18 in the table) there was no eye-witness testimony directly available, and the time of death was established on the basis of other evidence including knowledge if the individuals’ activities.
Observation number Interval (x) Concentration (y) 1 5.5 3.26 2 6.0 2.67 3 6.5 2.82 4 7.0 2.80 5 8.0 3.29 6 12.0 2.28 7 12.0 2.34 8 14.0 2.18 9 15.0 1.97 10 15.5 2.56 11 17.5 2.09 12 17.5 2.69 13 20.0 2.56 14 21.0 3.17 15 25.5 2.18 16 26.0 1.94 17 48.0 1.57 18 60.0 0.61 \sum x =337, \sum x^2 = 9854.5, \sum y = 42.98, \sum y^2 = 109.7936, \sum x y = 672.8
In this investigation you are required to explore the relationship between concentration (regarded the responds/dependent variable) and interval (regard as the explanatory/independent variable).
Construct a scatterplot of the data. Comment on any interesting features of the data and discuss briefly whether linear regression is appropriate to model the relationship between concentration of 3-MT and the interval from death.
Calculate the correlation coefficient for the data, and use it to test the null hypothesis that the population correlation coefficient is equal to zero.
Calculate the equation of the least-squares fitted regression line and use it to estimate the concentration of 3-MT:
after 1 day and
after 2 days.
Comment briefly on the reliability of these estimates.
Calculate a 99% confidence interval for the slope of the regression line. Using this confidence interval, test the hypothesis that the slope of the regression line is equal to zero. Comment on your answer in relation to the answer given in part (2) above.
\star A university wishes to analyse the performance of its students on a particular degree course. It records the scores obtained by a sample of 12 students at the entry to the course, and the scores obtained in their final examinations by the same students. The results are as follows:
Student A B C D E F G H I J K L Entrance exam score x (%) 86 53 71 60 62 79 66 84 90 55 58 72 Final paper score y (%) 75 60 74 68 70 75 78 90 85 60 62 70 \sum x = 836, \sum y = 867, \sum x^2 = 60,016, \sum y^2 = 63,603, \sum (x-\overline{x})(y-\overline{y}) = 1,122.
Calculate the fitted linear regression equation of y on x.
Assuming the full normal model, calculate an estimate of the error variance \sigma^2 and obtain a 90% confidence interval for \sigma^2.
By considering the slope parameter, formally test whether the data is positively correlated.
Find a 95% confidence interval for the mean finals paper score corresponding to an individual entrance score of 53.
Test whether this data come form a population with a correlation coefficient equal to 0.75.
Calculate the proportion of variance explained by the model. Hence, comment on the fit of the model.
\star Complete the following ANOVA table for a simple linear regression with 60 observations:
Source D.F. Sum of Squares Mean Squares F-Ratio Regression ____ ____ ____ ____ Error ____ ____ 8.2 Total ____ 639.5 \star Suppose you are interested in relating the accounting variable EPS (earnings per share) to the market variable STKPRICE (stock price). Then, a regression equation was fitted using STKPRICE as the response variable with EPS as the regressor variable. Following is the computer output from your fitted regression. You are also given that: \overline{x}=2.338, \overline{y}=40.21, s_{x}=2.004, and s_{y}=21.56. (Note that: s_x^2=\frac{S_{xx}}{n-1} and s_y^2=\frac{S_{yy}}{n-1})
Regression Analysis The regression equation is STKPRICE = 25.044 + 7.445 EPS Predictor Coef SE Coef T p Constant 25.044 3.326 7.53 0.000 EPS 7.445 1.144 6.51 0.000 Analysis of Variance SOURCE DF SS MS F p Regression 1 10475 10475 42.35 0.000 Error 46 11377 247 Total 47 21851
Calculate the correlation coefficient of EPS and STKPRICE.
Estimate the STKPRICE given an EPS of $2. Provide a 95% confidence interval of your estimate.
Provide a 95% confidence interval for the slope coefficient \beta.
Compute s and R^{2}.
Describe how you would check if the errors have constant variance.
Perform a test of the significance of EPS in predicting STKPRICE at a level of significance of 5%.
Test the hypothesis H_{0}:\beta =24 against H_{a}:\beta >24 at a level of significance of 5%.
(Modified from an Institute of Actuaries exam problem) An insurance company issues house buildings policies for houses of similar size in four different post-code regions A, B, C, and D. An insurance agent takes independent random samples of 10 house buildings policies for houses of similar size in each of the four regions. The annual premiums (in dollars) were as follows:
Region A: 229 241 270 256 241 247 261 243 272 219 \left( \sum x=2,479\text{, \ }\sum x^{2}=617,163\right) Region B: 261 269 284 268 249 255 237 270 269 257 \left( \sum x=2,619\text{, \ }\sum x^{2}=687,467\right) Region C: 253 247 244 245 221 229 245 256 232 269 \left( \sum x=2,441\text{, \ }\sum x^{2}=597,607\right) Region D: 279 268 290 245 281 262 287 257 262 246 \left( \sum x=2,677\text{, \ }\sum x^{2}=718,973\right) Perform a one-way analysis of variance at the 5\% level to compare the premiums for all four regions. In order words, test whether the mean of each 4 region are significantly different to each other. State briefly the assumptions required to perform this analysis of variance.
(Past Institute Exam) As part of an investigation into health service funding a working party was concerned with the issue of whether mortality could be used to predict sickness rates. Data on standardised mortality rates and standardised sickness rates collected for a sample of 10 regions and are shown in the table below:
Region Mortality rate m (per 100,000) Sickness rate s (per 100,000) 1 125.2 206.8 2 119.3 213.8 3 125.3 197.2 4 111.7 200.6 5 117.3 189.1 6 100.7 183.6 7 108.8 181.2 8 102.0 168.2 9 104.7 165.2 10 121.1 228.5 Data summaries: \sum m=1136.1, \sum m^2=129,853.03, \sum s=1934.2, \sum s^2=377,700.62, and \sum ms=221,022.58.
Calculate the correlation coefficient between the mortality rates and the sickness rates and determine the probability-value for testing whether the underlaying correlation coefficient is zero against the alternative that it is positive.
Noting the issue under investigation, draw an appropriate scatterplot for these data and comment on the relationship between the two rates.
Determine the fitted linear regression of sickness rate on mortality rate and test whether the underlaying slope coefficient can be considered to be as large as 2.0.
For a region with mortality rate 115.0, estimate the expected sickness rate and calculate 95% confidence limits for this expected rate.
(Past institute Exam) Consider the following data, which comprise of four groups sizes (y), each comprising four observations. In scenario I, information is also given on the sum assured under the policy concerned - the sum assured is the same for all four policies in a group. In scenario II, we regard the policies in the different groups as having been issued by four different companies - the policies in a group are all issued the same company.
All monetary amounts are in units of £10,000. Summaries of the claim sizes in each group are given in a second table.
Group 1 2 3 4 Claim sizes y 0.11 0.46 0.52 1.43 1.48 2.05 1.52 2.36 0.71 1.45 1.84 2.47 2.38 3.31 2.95 4.08 I: sum assured x 1 2 3 4 II: Company A B C D Summaries of claim sizes:
Group 1 2 3 4 \sum y 2.73 6.26 9.22 10.91 \sum y^2 2.8303 11.8018 23.0134 33.2289 In scenario I, suppose we adopt the linear regression model Y_i = \alpha + \beta x_i + \epsilon_i where Y_i is the i^{\text{th}} claim size and x_i is the corresponding sum assured, i=1,\ldots,16.
Calculate the total sum of squares and its partition into the regression (model) sum of squares and the residual (error) sum of squares.
Fit the model and calculate the fitted values for the first claim size of group 1 (namely 0.11) and the last claim size of group 4 (namely 4.08).
Consider a test of the hypothesis H_0:\beta=0 against a two-sided alterative. By preforming appropriate calculations, assess the strength of the evidence against this “no linear relationship” hypothesis.
In scenario II, suppose we adopt the analysis of variance model Y_{ij} = \mu + \tau_i +e_{ij} where Y_{ij} is the j^{\text{th}} claim size for company i and \tau_i is the i^{\text{th}} company effect, i=1,2,3,4 and j=A,B,C,D.
Calculate the partition of the total sum of squared into the “between companies” (model) sum of squares and the “within companies” (residual/error) sum of squares.
Fit the model.
Calculate the fitted values for the first claim size of group 1 and the last claim size of group 4.
Consider a test of hypothesis H_0: \tau_i=0, i=A,B,C,D against a general alternative. By preforming appropriate calculations, assess the strength of the evidence against this “no company effects” hypothesis.
Multiple linear regression questions
Describe the null hypotheses to which the p-values given in Table 3.4 correspond. Explain what conclusions you can draw based on these p-values. Your explanation should be phrased in terms of
sales
,TV
,radio
, andnewspaper
, rather than in terms of the coefficients of the linear model.Suppose we have a data set with five predictors, X_1 = GPA, X_2 = IQ, X_3 = Level (1 for College and 0 for High School), X_4 = Interaction between GPA and IQ, and X_5 = Interaction between GPA and Level. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get \beta_0 = 50, \beta_1 = 20, \beta_2 = 0.07, \beta_3 = 35, \beta_4 = 0.01, \beta_5 = -10.
Which answer is correct, and why?
For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates.
For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates.
For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.
For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates provided that the GPA is high enough.
Predict the salary of a college graduate with IQ of 110 and a GPA of 4.0.
True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.
\star I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. Y =\beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \epsilon.
Suppose that the true relationship between X and Y is linear, i.e. Y = \beta_0 + \beta_1 X + \epsilon. Consider the training residual sum of squares (RSS) for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.
Answer (a) using test rather than training RSS.
Suppose that the true relationship between X and Y is not linear, but we don’t know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.
Answer (c) using test rather than training RSS.
Write down the design matrix for the simple linear regression model.
Write out the matrix \boldsymbol{X}^\top\boldsymbol{X} for the simple linear regression model.
Write out the vector \boldsymbol{X}^\top\boldsymbol{y} for the simple linear regression model.
Write out the matrix (\boldsymbol{X}^\top\boldsymbol{X})^{-1} for the simple linear regression model.
Calculate \widehat{\boldsymbol{\beta}}=(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{y} using your results above.
Where \boldsymbol{y} is the vector of the response variable and \boldsymbol{\widehat{\beta}} is the vector of coefficients.
\star The following model was fitted to a sample of supermarkets in order to explain their profit levels: y=\beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3} + \varepsilon where
y= profits, in thousands of dollars
x_{1}= food sales, in tens of thousands of dollars
x_{2}= nonfood sales, in tens of thousands of dollars, and
x_{3}= store size, in thousands of square feet.
The estimated regression coefficients are given below: \widehat{\beta}_{1} = 0.027 \text{ and } \widehat{\beta}_{2} = -0.097 \text{ and } \widehat{\beta}_{3} = 0.525.
Which of the following is TRUE?
A dollar increase in food sales increases profits by 2.7 cents.
A 2.7 cent increase in food sales increases profits by a dollar.
A 9.7 cent increase in nonfood sales decreases profits by a dollar.
A dollar decrease in nonfood sales increases profits by 9.7 cents.
An increase in store size by one square foot increases profits by 52.5 cents.
\star In a regression model of three explanatory variables, twenty-five observations were used to calculate the least squares estimates. The total sum of squares and regression sum of squares were found to be 666.98 and 610.48, respectively. Calculate the adjusted coefficient of determination (i.e adjusted R^2).
89.0%
89.4%
89.9%
90.3%
90.5%
\star In a multiple regression model given by: y=\beta _{0}+\beta _{1}x_{1}+\ldots +\beta_{p-1}x_{p-1}+\varepsilon, which of the following gives a correct expression for the coefficient of determination (i.e R^2)?
\frac{\text{SSM}}{\text{SST}}
\frac{\text{SST}-\text{SSE}}{\text{SST}}
\frac{\text{SSM}}{\text{SSE}}
Options:
I only
II only
III only
I and II only
I and III only
The ANOVA table output from a multiple regression model is given below:
Source D.F. SS MS F-Ratio Prob(>F) Regression 5 13326.1 2665.2 13.13 0.000 Error 42 8525.3 203.0 Total 47 21851.4 Compute the adjusted coefficient of determination (i.e adjusted R^2).
52%
56%
61%
63%
68%
\star You have information on 62 purchases of Ford automobiles. In particular, you have the amount paid for the car y in hundreds of dollars, the annual income of the individuals x_1 in hundreds of dollars, the sex of the purchaser (x_2, 1=male and 0=female) and whether or not the purchaser graduated from college (x_3, 1=yes, 0=no). After examining the data and other information available, you decide to use the regression model: y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3} + \varepsilon . You are given that: \left( \boldsymbol{X}^{\top}\boldsymbol{X}\right) ^{-1} = \left[ \begin{array}{rrrr} 0.109564 & -0.000115 & -0.035300 & -0.026804 \\ -0.000115 & 0.000001 & -0.000115 & -0.000091 \\ -0.035300 & -0.000115 & 0.102446 & 0.023971 \\ -0.026804 & -0.000091 & 0.023971 & 0.083184 \end{array} \right] and the mean square error for the model is s^{2}=30106. Calculate \text{SE}(\widehat{\beta}_{2}).
0.17
17.78
50.04
55.54
57.43
Suppose in addition to the information in question 9., you are given: \boldsymbol{X}^{\top}\boldsymbol{y} = \left[ \begin{array}{r} 9\,558 \\ 4\,880\,937 \\ 7\,396 \\ 6\,552 \end{array} \right]. Calculate the expected difference in the amount spent to purchase a car between a person who graduated from college and another one who did not.
Possible answers:
- 233.5
- 1,604.3
- 2,195.3
- 4,920.6
- 6,472.1
\star A regression model of y on four independent variables x_{1},x_{2},x_{3} and x_{4} has been fitted to a data consisting of 212 observations and the computer output from estimating this model is given below:
Regression Analysis The regression equation is y = 3894 - 50.3 x1 + 0.0826 x2 + 0.893 x3 + 0.137 x4 Predictor Coef SE Coef T Constant 3893.8 409.0 9.52 x1 -50.32 9.062 -5.55 x2 0.08258 0.02133 3.87 x3 0.89269 0.04744 18.82 x4 0.13677 0.05303 2.58
Which of the following statement is NOT true?
All the explanatory variables have insignificant influence on \boldsymbol{y}.
The variable \boldsymbol{x}_{1} is a significant variable.
The variable \boldsymbol{x}_{2} is a significant variable.
The variable \boldsymbol{x}_{3} is a significant variable.
The variable \boldsymbol{x}_{4} is a significant variable.
Where \boldsymbol{x}_i’s are vectors of explanatory variables and \boldsymbol{y} is the vector of response variable.
The estimated regression model of fitting life expectancy from birth (LIFE_EXP) on the country’s gross national product (in thousands) per population (GNP) and the percentage of population living in urban areas (URBAN%) is given by: \text{LIFE\_EXP} = \text{ 48.24 }+\text{ 0.79 GNP }+\text{ 0.154 URBAN\%.} For a particular country, its URBAN% is 60 and its GNP is 3.0. Calculate the estimated life expectancy at birth for this country.
49
50
57
60
65
What is the use of the scatter plot of the fitted values and the residuals?
to examine the normal distribution assumption of the errors
to examine the goodness of fit of the regression model
to examine the constant variation assumption of the errors
to test whether the errors have zero mean
to examine the independence of the errors
KNN question
Consider a k-nearest neighbours model where Y = f(X) + \epsilon, \mathbb{E}(\epsilon) = 0, \mathbb{V}(\epsilon) = \sigma^2, and the estimated model is \widehat{f}(x). The weight function is \frac{1}{k}. Show that \text{EPE}_k(x_0) = \sigma^2 + \left[f(x_0) - \frac{1}{k}\sum_{l \in N(x_0)} f(x_{(l)}) \right]^2 + \frac{\sigma^2}{k} Where N(x_0) are x_0’s k-nearest neighbours. Note that: \text{EPE}_k(x_0) = \mathbb{E}[(Y-\widehat{f}(x_0))^2|X=x_0]
Applied Questions
\star (ISLR2, Q3.8) This question involves the use of simple linear regression on the Auto data set.
Use the
lm()
function to perform a simple linear regression withmpg
as the response andhorsepower
as the predictor. Use thesummary()
function to print the results. Comment on the output.
For example:Is there a relationship between the predictor and the response?
How strong is the relationship between the predictor and the response?
Is the relationship between the predictor and the response positive or negative?
What is the predicted
mpg
associated with ahorsepower
of 98? What are the associated 95% confidence and prediction intervals?
Plot the response and the predictor. Use the
abline()
function to display the least squares regression line.Use the
plot()
function to produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.
(ISLR2, Q3.11) In this problem we will investigate the t-statistic for the null hypothesis H_0:\beta = 0 in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows.
Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate \widehat{\beta}, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis H_0:\beta = 0. Comment on these results. (You can perform regression without an intercept using the command
lm(y ~ x+0)
.)Now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis H_0:\beta = 0. Comment on these results.
What is the relationship between the results obtained in (a) and (b)?
For the regression of Y onto X without an intercept, the t-statistic for H_0:\beta = 0 takes the form \widehat{\beta}/\text{SE}(\widehat{\beta}), where \widehat{\beta} is given by (3.38), and where \text{SE}(\widehat{\beta}) = \sqrt{\frac{\sum_{i}^n(y_i - x_i\widehat{\beta})^2}{(n-1)\sum_{i'=1}^nx_{i'}^2}} (These formulas are slightly different from those given in Sections 3.1.1 and 3.1.2, since here we are performing regression without an intercept.) Show algebraically, and confirm numerically in R, that the t-statistic can be written as \frac{(\sqrt{n-1})\sum_{i=1}^nx_i y_i}{\sqrt{(\sum_{i=1}^nx_i^2)(\sum_{i'=1}^n y_{i'}^2) - (\sum_{i'=1}^n x_{i'}y_{i'})^2}}
Using the results from (d), argue that the t-statistic for the regression of y onto x is the same as the t-statistic for the regression of x onto y.
In R, show that when regression is performed with an intercept, the t-statistic for H_0: \beta_1 = 0 is the same for the regression of y onto x as it is for the regression of x onto y.
Solutions
Conceptual Questions
Simple linear regression questions
We determine \widehat{\beta}_0 and \widehat{\beta}_1 by minimizing the error. Hence, we use least squares estimates (LSE) for \widehat{\beta}_0 and \widehat{\beta}_1: \underset{\beta_0,\beta_1}{\min}\left\{S\left( \widehat{\beta}_0, \widehat{\beta}_1 \right)\right\}%=\underset{\beta_0,\beta_1}{\min}\left\{\sum_{i=1}^{n}\epsilon_i^2 \right\} =\underset{\beta_0,\beta_1}{\min}\left\{\sum_{i=1}^{n}\left( y_{i}-\left(\widehat{\beta}_0 + \widehat{\beta}_1 x_{i}\right) \right)^{2}\right\}. The minimum is obtained by setting the first order condition (FOC) to zero: \begin{aligned} \frac{\partial S\left( \widehat{\beta}_0,\widehat{\beta}_1\right)}{\partial \widehat{\beta}_0}&=-2\sum_{i=1}^{n} y_{i}-\left(\widehat{\beta}_0+\widehat{\beta}_1x_{i}\right)\\ \frac{\partial S\left( \widehat{\beta}_0,\widehat{\beta}_1\right)}{\partial \widehat{\beta}_1}&=-2\sum_{i=1}^{n}x_{i}\left(y_{i}-\left( \widehat{\beta}_0+\widehat{\beta}_1x_{i}\right) \right). \end{aligned}
The LSE \widehat{\beta}_0 and \widehat{\beta}_1 are given by setting the FOC equal to zero: \begin{aligned} \sum_{i=1}^{n}y_{i} =& n\widehat{\beta}_{0}+\widehat{\beta}_{1}\sum_{i=1}^{n}x_{i}\\ \sum_{i=1}^{n}x_{i}y_{i} =& \widehat{\beta}_{0}\sum_{i=1}^{n}x_{i}+\widehat{% \beta }_{1}\sum_{i=1}^{n}x_{i}^{2}. \end{aligned}
So we have
\begin{aligned} \widehat{\beta}_{0}=& \frac{\sum_{i=1}^{n}y_i-\widehat{\beta}_{1} \sum_{i=1}^{n}x_i}{n}=\overline{y} - \widehat{\beta}_1\overline{x}, \text{ and} \\ \widehat{\beta}_{1}=& \frac{\sum_{i=1}^{n}x_iy_i-\widehat{\beta}_{0} \sum_{i=1}^{n}x_i}{\sum_{i=1}^{n}x^2_i} \,. \end{aligned}
Next step: Rearranging so that \widehat{\beta}_0 and \widehat{\beta}_1 become functions of \sum_{i=1}^ny_i, \sum_{i=1}^nx_i, \sum_{i=1}^nx^2_i, and \sum_{i=1}^nx_iy_i
\begin{aligned} \widehat{\beta}_{0}=& \frac{\sum_{i=1}^{n}y_i-\left(\frac{\sum_{i=1}^{n}x_iy_i-\widehat{\beta}_{0} \sum_{i=1}^{n}x_i}{\sum_{i=1}^{n}x^2_i}\right) \sum_{i=1}^{n}x_i}{n}\\ \hskip-10mm\left(1-\frac{\left(\sum_{i=1}^{n}x_i\right)^2}{n\sum_{i=1}^{n}x^2_i}\right)\widehat{\beta}_{0}=&\frac{\sum_{i=1}^{n}x^2_i\sum_{i=1}^{n}y_i-\left(\sum_{i=1}^{n}x_iy_i\right) \sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i}\\ \widehat{\beta}_{0}\overset{*}{=}& \frac{\sum_{i=1}^{n}y_i\left(\sum_{i=1}^{n}x_i^2\right)-\sum_{i=1}^{n}x_iy_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i -\left(\sum_{i=1}^{n}x_i\right)^2}. \end{aligned}
*(1-a/b)c = d/b \rightarrow (bc-ac)/b=d/b \rightarrow c=d/(b-a).
And \widehat{\beta}_0’s in line (6) was subbed into \widehat{\beta}_1 in line (7). At this point, \widehat{\beta}_0 is done. So we’ll continue with \widehat{\beta}_1.
From the previous steps we have: \begin{aligned} \widehat{\beta}_{0}=& \frac{\sum_{i=1}^{n}y_i-\widehat{\beta}_{1} \sum_{i=1}^{n}x_i}{n}\\ \widehat{\beta}_{1}=& \frac{\sum_{i=1}^{n}x_iy_i-\widehat{\beta}_{0} \sum_{i=1}^{n}x_i}{\sum_{i=1}^{n}x^2_i}. \end{aligned} thus: \begin{aligned} \widehat{\beta}_{1}=& \frac{n\sum_{i=1}^{n}x_iy_i-\left(\sum_{i=1}^{n}y_i-\widehat{\beta}_{1} \sum_{i=1}^{n}x_i\right)\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i}\\ \left(1-\frac{\left(\sum_{i=1}^nx_i\right)^2}{n\sum_{i=1}^{n}x_i^2}\right)\widehat{\beta}_{1}=& \frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i}\\ \widehat{\beta}_{1}\overset{*}{=}& \frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}. \end{aligned}
*(1-a/b)c = d/b \rightarrow (bc-ac)/b=d/b \rightarrow c=d/(b-a).
Using the notations, we have an easier way to write \widehat{\beta}_{1}: \begin{aligned} \widehat{\beta}_{1}=& \frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2} \\%= \frac{nS_{xy}-S_xS_y}{nS_{xx}-S^2_x}\\ =&\frac{n\left(\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i\cdot\frac{n}{n^2}\right)}{n\left(\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2\cdot\frac{n}{n^2}\right)}\\ =&\frac{\sum_{i=1}^{n}x_iy_i-n\overline{x} \,\overline{y}}{\sum_{i=1}^{n}x^2_i-n\overline{x}^2}\\ \overset{*}{=}&\frac{\sum_{i=1}^{n}x_iy_i- \sum_{i=1}^{n}x_i\overline{y} -\sum_{i=1}^{n}y_i\overline{x} +n \overline{x}\,\overline{y}}{\sum_{i=1}^{n}x^2_i+\sum_{i=1}^{n}\overline{x}^2-2\sum_{i=1}^{n}x_i\overline{x}}\\ =&\frac{\sum_{i=1}^{n}(x_i-\overline{x})\cdot(y_i-\overline{y})}{\sum_{i=1}^{n}(x_i-\overline{x})^2} =\frac{S_{xy}}{S_{xx}} \,. \end{aligned}
*\sum_{i=1}^{n}x_i\overline{y}=\sum_{i=1}^{n}x_i\frac{\sum_{i=1}^{n}y_i}{n}=\sum_{i=1}^{n}y_i\frac{\sum_{i=1}^{n}x_i}{n}=\sum_{i=1}^{n}y_i\overline{x}=n\frac{\sum_{i=1}^{n}x_i}{n}\frac{\sum_{i=1}^{n}y_i}{n}=n\overline{x}\overline{y}.
For \widehat{\beta}_0, using the equation in line (10) in Q1:
\mathbb{E} \left[ \widehat{\beta}_{0}| X\right] = \mathbb{E}\left[\frac{\sum_{i=1}^{n}y_i\left(\sum_{i=1}^{n}x_i^2\right)-\sum_{i=1}^{n}x_iy_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i -\left(\sum_{i=1}^{n}x_i\right)^2}\right] \begin{aligned} &= \frac{\sum_{i=1}^{n}\mathbb{E}\left[y_i\right]\left(\sum_{i=1}^{n}x_i^2\right)-\sum_{i=1}^{n}x_i\mathbb{E}\left[y_i\right]\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i -\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \frac{\sum_{i=1}^{n}\left(\beta_0+\beta_1x_i\right)\left(\sum_{i=1}^{n}x_i^2\right)-\sum_{i=1}^{n}x_i\left(\beta_0+\beta_1x_i\right)\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i -\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \frac{n\beta_0\left(\sum_{i=1}^{n}x_i^2\right)-\sum_{i=1}^{n}\left(\beta_0x_i\right)\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i -\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \beta_0. \end{aligned}
For \widehat{\beta}_1, using equation in line (13) from Q1:
\begin{aligned} \mathbb{E} \left[\widehat{\beta}_{1}| X\right] &= \mathbb{E}\left[\frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}\right] \\ &= \frac{n\sum_{i=1}^{n}x_i\mathbb{E}\left[y_i\right]-\sum_{i=1}^{n}\mathbb{E}\left[y_i\right]\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \frac{n\sum_{i=1}^{n}x_i\left(\beta_0+\beta_1x_i\right)-\sum_{i=1}^{n}\left(\beta_0+\beta_1x_i\right)\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \frac{\beta_1\sum_{i=1}^{n}x_i^{2}+n\sum_{i=1}^{n}x_i\beta_0-\sum_{i=1}^{n}\beta_0\sum_{i=1}^{n}x_i-\beta_1\left(\sum_{i=1}^{n}x_i\right)\left(\sum_{i=1}^{n}x_i\right)}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}\\ &= \beta_1. \end{aligned}
In the regression model there are three parameters to estimate: \beta_0, \beta_1, and \sigma^{2}.
Joint density of Y_{1},Y_{2},\ldots,Y_{n} — under the (strong) normality assumptions — is the product of their marginals (independent by assumption) so that the likelihood is: \begin{aligned} L\left(\beta_0 ,\beta_1 ,\sigma; \{ (x_i, y_i) \}_{i=1}^n \right) &= \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma} \exp \left( -\frac{\left(y_{i}-\left( \beta_0 + \beta_1 x_i \right) \right)^2}{2\sigma^2}\right) \\ &= \frac{1}{\left( 2\pi \right) ^{n/2}\sigma^n} \exp \left( -\frac{1}{2\sigma ^{2}}\sum_{i=1}^{n}\left( y_i -\left( \beta_0 + \beta_1 x_i \right) \right)^2 \right) \\ \ell\left(\beta_0 ,\beta_1 ,\sigma; \{ (x_i, y_i) \}_{i=1}^n \right) &= -n\log\left( \sqrt{2\pi} \sigma\right) -\frac{1}{2\sigma^2}\sum_{i=1}^{n}\left( y_i - \left( \beta_0 + \beta_1 x_i \right) \right)^2. \end{aligned}
Taking partial derivatives of the log-likelihood with respect to \beta_0: \begin{aligned} \frac{\partial l}{\partial \beta_0} &= \sum_{i=1}^{n} (y_i-\beta_0-\beta_1x_i)\\ &= \sum_{i=1}^{n} y_i-n\beta_0-\beta_1\sum_{i=1}^{n}x_i \,. \end{aligned} Equate the above to 0 and solve for \beta_0 should give \widehat{\beta}_0=\overline{y}-\widehat{\beta}_1\overline{x} \,.
Similarly, taking partial derivatives of the log-likelihood with respect to \beta_1: \begin{aligned} \frac{\partial l}{\partial \beta_1}&=\sum_{i=1}^{n}2x_i(y_i-(\beta_0+\beta_1x_i))\\ &=2\left(\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}x_i\beta_0-\sum_{i=1}^{n}\beta_1x_i^2)\right)\\ &=2\left(\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}x_i(\overline{y}-\beta_1\overline{x})-\sum_{i=1}^{n}\beta_1x_i^2\right)\\ &=\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}x_i\overline{y}-\beta_1\sum_{i=1}^{n}x_i^2+\beta_1\sum_{i=1}^{n}x_i\overline{x}\\ &=\sum_{i=1}^{n}x_iy_i-\frac{1}{n}\sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i-\beta_1\sum_{i=1}^{n}x_i^2+\beta_1\frac{1}{n}\sum_{i=1}^{n}x_i\sum_{i=1}^{n}x_i \,. \end{aligned}
The last line was derived using the fact that \overline{x}=\frac{\sum_{i=1}^{n}x_i}{n} \,.
Equate the above equation to 0 and solve for \beta_1, we’ll get:
\widehat{\beta}_1= \frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2} \,.
The rest is the same as the derivation from Q1
We have that \begin{aligned} \text{SST} =& \sum_{i=1}^{n} (y_i-\overline{y})^2 = \sum_{i=1}^{n} y_i^2+\overline{y}^2-2\overline{y}y_i \,. \\ \text{SSE}+\text{SSM} =& \sum_{i=1}^{n} (y_i-\widehat{y}_i)^2 + \sum_{i=1}^{n} (\widehat{y}_i-\overline{y})^2 \\ =&\sum_{i=1}^{n} \left(y_i^2+\widehat{y}_i^2-2y_i\widehat{y}_i + \widehat{y}_i^2+\overline{y}^2-2\overline{y}\widehat{y}_i \right)\\ =&\sum_{i=1}^{n}\left( y_i^2+2\widehat{y}_i^2-2y_i\widehat{y}_i +\overline{y}^2-2\overline{y}\widehat{y}_i \right) \\ \overset{*}{=}&\sum_{i=1}^{n} \left( y_i^2+2\widehat{y}_i^2-2(\widehat{y}_i+\widehat{\epsilon}_i)\widehat{y}_i +\overline{y}^2-2\overline{y}(y_i-\widehat{\epsilon}_i) \right) \end{aligned} * using \widehat{\epsilon}_i=y_i-\widehat{y}_i, continue:
\begin{aligned} \text{SSE}+\text{SSM} =& \sum_{i=1}^{n} \left( y_i^2+2\widehat{y}_i^2-2(\widehat{y}_i+\widehat{\epsilon}_i)\widehat{y}_i +\overline{y}^2-2\overline{y}(y_i-\widehat{\epsilon}_i) \right) \\ =& \sum_{i=1}^{n} \left( y_i^2-2\widehat{y}_i\widehat{\epsilon}_i+\overline{y}^2-2\overline{y}y_i+2\overline{y}\widehat{\epsilon}_i \right)\\ \overset{**}{=}& \sum_{i=1}^{n} \left( y_i^2+\overline{y}^2-2\overline{y}y_i \right)= \text{SST} \,. \end{aligned} ** uses \sum\widehat{\epsilon}_i=0 (which is self-explanatory) and \sum x_i\widehat{\epsilon}_i=0 (We’ll prove this at the end of this question), we have the following results \sum_{i=1}^{n}2\overline{y}\widehat{\epsilon}_i=2\overline{y}\sum_{i=1}^{n}\widehat{\epsilon}_i=0 \,, \sum_{i=1}^{n}2\widehat{y}_i\widehat{\epsilon}_i=\sum_{i=1}^{n}2(\widehat{\beta}_0+\widehat{\beta}_1x_i)\widehat{\epsilon}_i \,.
*Proof of \sum x_i\widehat{\epsilon}_i=0:*
Using the estimates of \widehat{\beta}_0 and \widehat{\beta}_1, we have: \begin{aligned} \sum_{i=1}^{n}x_i\widehat{e_i}&=\sum_{i=1}^{n}x_i\left(y_i-(\widehat{\beta}_0-\widehat{\beta}_1x_1)\right)\\ &=\sum_{i=1}^{n}x_iy_i-\widehat{\beta}_0\sum_{i=1}^{n}x_i-\widehat{\beta}_1\sum_{i=1}^{n}x_i^2\\ &=\sum_{i=1}^{n}x_iy_i-\left(\sum_{i=1}^{n}\frac{y_i}{n}-\widehat{\beta}_1\sum_{i=1}^{n}\frac{x_i}{n}\right)\sum_{i=1}^{n}x_i-\widehat{\beta}_1\sum_{i=1}^{n}x_1^2\\ &=\sum_{i=1}^{n}x_iy_i-\frac{\sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i}{n}+\widehat{\beta}_1\left(\frac{\left(\sum_{i=1}^{n}x_i\right)^2}{n}-\sum_{i=1}^{n}x_i^2\right)\\ \overset{***}{=}&\sum_{i=1}^{n}x_iy_i-\frac{\sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i}{n}+ \frac{n\sum_{i=1}^{n}x_iy_i-\sum_{i=1}^{n}y_i\sum_{i=1}^{n}x_i}{n\sum_{i=1}^{n}x^2_i-\left(\sum_{i=1}^{n}x_i\right)^2}\left(\frac{\left(\sum_{i=1}^{n}x_i\right)^2}{n}-\sum_{i=1}^{n}x_i^2\right)\\ &=\sum_{i=1}^{n}x_iy_i-\frac{\sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i}{n}-\left(\sum_{i=1}^{n}x_iy_i-\frac{\sum_{i=1}^{n}x_i\sum_{i=1}^{n}y_i}{n}\right)\\ &=0 \,. \end{aligned}
*** uses equation (13) from Q1.
The SSM is \begin{aligned} \text{SSM} =& \sum_{i=1}^{n}\left(\widehat{y}_i-\overline{y}\right)^2\\ =& \sum_{i=1}^{n}\left(\widehat{\beta}_0+\widehat{\beta}_1\cdot x_i-\overline{y}\right)^2\\ =& \sum_{i=1}^{n}\left( (\overline{y}-\widehat{\beta}_1\cdot \overline{x}) +\widehat{\beta}_1\cdot x_i-\overline{y}\right)^2\\ =& \sum_{i=1}^{n}\widehat{\beta}_1^2\cdot\left(x_i-\overline{x}\right)^2\\ \overset{b)}{=}& \widehat{\beta}_1^2\cdot S_{xx} \overset{a)}{=} \widehat{\beta}_1\cdot S_{xy} \end{aligned} using \widehat{\beta}_1=\frac{S_{xy}}{S_{xx}}.
We first consider \mathbb{V}\left( \widehat{\beta}_{1}|X\right).
Note that we have:
\widehat{\beta}_{1} = \frac{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) \left( y_{i}-\overline{y}\right) }{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}} \overset{*}{=} \frac{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) y_{i}}{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}}.
*uses:
\begin{aligned} \sum_{i=1}^{n}\left(x_i-\overline{x}\right)\overline{y}=&\overline{y}\sum_{i=1}^{n}x_i-\sum_{i=1}^{n}\overline{x}.\overline{y}\\ =& \overline{y}.\overline{x}n-\overline{y}.\overline{x}n=0 . \end{aligned}
Therefore
\begin{aligned} \mathbb{V}\left( \widehat{\beta}_{1}|X\right) &= \mathbb{V}\left( \left. \frac{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) y_{i}}{\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right) ^{2}}\right\vert X \right) \\ &= \frac{\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}\mathbb{V}\left(y_{i}|X\right) }{\left(\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}\right)^{2}} \\ &= \frac{\sigma ^{2}\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}}{\left(\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}\right)^{2}} =\frac{\sigma^2}{S_{xx}} \,. %\\ %&= \sigma ^{2}\left/\sum_{i=1}^{n}\left( x_{i}-\overline{x}\right) ^{2}\right.. \end{aligned}
This uses \mathbb{V}(y_i|X)=\sigma^2 because y_i=\beta_0+\beta_1x_i+\epsilon_i, where the \beta’s are constant and x_i is given, hence \mathbb{V}(y_i|X)=\mathbb{V}(\epsilon|X)=\sigma^2.
We next consider \mathbb{V}\left(\widehat{\beta}_0|X\right).
Using that: \widehat{\beta}_0 = \overline{y} - \widehat{\beta}_1\overline{x},
\begin{aligned} \mathbb{V}\left(\widehat{\beta}_0|X\right) =& \mathbb{V}\left(\overline{y} -\widehat{\beta}_1\overline{x}|X\right)\\ =& \mathbb{V}\left(\frac{1}{n} \sum_{i=1}^ny_i \mid X\right) +\overline{x}^2\mathbb{V}\left(\widehat{\beta}_1|X\right)\\ =& \frac{1}{n^2}\sum_{i=1}^n\mathbb{V}\left(y_i |X \right)/n^2 + \overline{x}^2 \frac{\sigma^{2}}{S_{xx}} \\ =& \sigma^2\left(\frac{1}{n}+\frac{\overline{x}^2}{S_{xx}}\right). \end{aligned}
Finally, we consider \text{Cov}\left(\widehat{\beta}_0,\widehat{\beta}_1|X\right).
\widehat{\beta}_0 = \overline{y} -\widehat{\beta}_1\overline{x},
we have: \begin{aligned} \text{Cov}\left(\widehat{\beta}_0,\widehat{\beta}_1|X\right) &= \text{Cov}\left(\overline{y} -\widehat{\beta}_1\overline{x},\widehat{\beta}_1|X\right)\\ &= \text{Cov}\left(-\widehat{\beta}_1\overline{x},\widehat{\beta}_1|X\right)\\ &= -\overline{x}\cdot \text{Cov}\left(\widehat{\beta}_1,\widehat{\beta}_1|X\right)\\ &= -\overline{x}\cdot \mathbb{V}\left(\widehat{\beta}_1|X\right)\\ &= -\frac{ \overline{x} \sigma^2}{S_{xx}} . \end{aligned}
We have that \begin{aligned} \mathbb{V}\left( \widehat{y}_{0} | X \right) &= \mathbb{V}\left( \widehat{\beta}_0+\widehat{\beta}_1x_{0} | X \right) \\ &= \mathbb{V}\left( \widehat{\beta}_0 | X \right) +x_{0}^{2}\mathbb{V}\left( \widehat{\beta}_1 | X \right) +2x_{0}\text{Cov}\left( \widehat{\beta}_0,\widehat{\beta}_1 | X \right) \\ &= \left( \dfrac{1}{n}+\frac{\overline{x}^{2}}{S_{xx}}\right) \sigma ^{2}+x_{0}^{2}\frac{\sigma ^{2}}{S_{xx}}+2x_{0}\left( \frac{-\overline{x}\sigma ^{2}}{S_{xx}}\right) \\ &= \left( \dfrac{1}{n}+\frac{\overline{x}^{2}-2x_{0}\overline{x}+x_{0}^{2}}{S_{xx}}\right) \sigma ^{2} \\ &= \left( \dfrac{1}{n}+\frac{\left( \overline{x}-x_{0}\right) ^{2}}{S_{xx}}\right) \sigma ^{2}. \end{aligned}
Expectation: \begin{aligned} \mathbb{E}\left[ Y_0 - \widehat{y}_0 | X \right] &= \mathbb{E}\left[Y_0 | X \right] - \mathbb{E}\left[\widehat{y}_0 | X \right] \\ &= \mathbb{E}\left[ \beta_0 + \beta_1 x_0 + \epsilon_i \right]-\mathbb{E}\left[({\widehat{\beta}}_0+{\widehat{\beta}}_1 x_0) | X \right] \\ &\overset{*}{=} \beta_0 +\beta_1 x_0 - (\beta_0 + \beta_1 x_0) \\ &= 0. \end{aligned}
*uses the fact that the expected value of the random error \epsilon is 0.
Variance:
\begin{aligned} \mathbb{V}\left(Y_0-\widehat{y}_0 | X \right) &= \mathbb{V}\left(Y_0 | X \right) + \mathbb{V}\left(\widehat{y}_i | X \right) -2 \, \text{Cov}\left(Y_0, \widehat{y}_0 | X \right) \\ &= \mathbb{V}\left(\beta_0 +\beta_1 x_0 + \epsilon_0 | X \right) + \sigma^2\left({1\over n}+{(\overline{x} - x_i)^2\over S_{xx}}\right) - 0 \\ &\overset{**}{=} \sigma^2 + \sigma^2\left({1\over n}+{(\overline{x} - x_i)^2\over S_{xx}}\right) \\ &= \sigma^2\left(1+{1\over n}+{(\overline{x} - x_i)^2\over S_{xx}}\right). \end{aligned}
** uses \mathbb{V}(\beta_0 + \beta_1 x_0 | X)=0 as it contains only constants, and the covariance is 0 because the observed point is not used in making predictions, and should be independent of the predicted point, and the conditional variance of \widehat{y}_0 was derived in earlier questions.
Interesting features are that, in general, the concentration of 3-MT in the brain seems to decrease as the post mortem interval increases. Another interesting feature is that we observe two observations with a much higher post mortem interval than the other observations.
The data seems to be appropriate for linear regression. The linear relationship seems to hold,especially for values of interval between 5 and 26 (we have enough observations for that). Care should be taken into account when evaluating y for x lower than 5 and larger than 26 (only two observations) because we do not know whether the linear relationship between x and y still holds then.We test: H_0: \rho=0 \hskip3mm \text{v.s.} \hskip3mm H_1: \rho\neq0 The corresponding test statistic is given by:
T=\frac{R\sqrt{n-2}}{\sqrt{1-R^2}} \sim t_{n-2}. We reject the null hypothesis for large and small values of the test statistic.
We have n=18 and the correlation coefficient is given by: \begin{aligned} r =& \frac{\sum x_i\cdot y_i - n \overline{x}\overline{y} }{ \sqrt{(\sum x_i^2 - n\overline{x}^2) (\sum y_i^2 - n\overline{y}^2)} }\\ =& \frac{672.8 - 18\cdot337/18\cdot42.98/18}{\sqrt{(9854.5-337^2/18)\cdot(109.7936-42.98^2/18)} } = -0.827 \end{aligned} Thus, the value of our test statistic is given by: T=\frac{-0.827\sqrt{16}}{\sqrt{1-(-0.827)^2}}=-5.89. From Formulae and Tables page 163 we observe \mathbb{P}(t_{16}\leq-4.015)\overset{*}{=}\mathbb{P}(t_{16}\geq4.015)=0.05\%, * using symmetry property of the student-t distribution. We observe that the value of our test statistic (-5.89) is smaller than -4.015, thus our p-value should be smaller than 2\cdot0.05\%=0.1\%. Thus, we can reject the null hypothesis even at a significance level of 0.1%, hence we can conclude that there is a linear dependency between interval and concentration. Note that the alternative hypothesis is here a linear dependency and not negative linear dependency, so you do accept the alternative by rejecting the null hypothesis. Although, when you would use as alternative hypothesis negative dependency, you would accept this alternative, due to the construction of the test we have to use the phrase “a linear dependency” and not “a negative linear dependency”.The linear regression model is given by: y=\alpha + \beta x +\epsilon The estimate of the slope is given by: \begin{aligned} \widehat{\beta} =& \frac{ \sum x_iy_i -n\sum x_i/n\sum y_i/n }{\sum x_i^2 -n(\sum x_i/n)^2}\\ =& \frac{ 672.8 - 337\cdot42.98/18 }{9854.4 -334^2/18} = -0.0372008 \end{aligned} The estimate of the intercept is given by: \begin{aligned} \widehat{\alpha}=& \overline{y}-\widehat{\beta}\overline{x}\\ =& 42.98/18 + 0.0372008 \cdot337/18 = 3.084259 \end{aligned} Thus, the estimate of y given a value of x is given by: \begin{aligned} \widehat{y}=&\widehat{\alpha} + \widehat{\beta} x\\ =&3.084259 - 0.0372008 x \end{aligned}
One day equals 24 hours, i.e., x=24, thus \widehat{y}=\widehat{\alpha} + \widehat{\beta}24=3.084259 - 0.0372008 \cdot 24 = 2.19
Two day equals 48 hours, i.e., x=48, thus \widehat{y}=\widehat{\alpha} + \widehat{\beta}24=3.084259 - 0.0372008 \cdot 48 = 1.30
The data set contains accurate data up to 26 hours, as for observations 17 and 18 (at 48 hour and 60 hours respectively) there was no eye-witness testimony direct available. Predicting 3-MT concentration after 26 hours may not be advisable, even though x=48 is within the range of the x-values (5.5 hours to 60 hours).
The pivotal quantity is given by: \frac{\beta-\widehat{\beta}}{\text{SE}(\widehat{\beta})} \sim t_{n-2}. First, we calculate \begin{aligned} \widehat{\sigma}^2 &= \frac{1}{n-2}\left(\sum y^2_i- (\sum y_i)^2/n - \frac{(\sum x_iy_i - \sum x_i\sum y_i/n)^2}{\sum x_i^2 - (\sum x_i)^2/n}\right) \\ &= \frac{1}{16}\left(109.7936- 42.98^2/18 - \frac{(672.8 - 337\cdot42.98/18)^2}{9854.5 - 337^2/18}\right) = 0.1413014 , \end{aligned} then the standard error is \text{SE}(\widehat{\beta}) = \sqrt{\frac{\widehat{\sigma}^2}{\sum x_i^2 - n\overline{x}^2}} = \sqrt{\frac{0.1413014 }{9854.5 - 337^2/18}} = 0.00631331. From Formulae and Tables page 163 we have t_{16,1-0.005} = 2.921. Using the test statistic, the 99% confidence interval of the slope is given by: \begin{aligned} \widehat{\beta}-t_{16,1-\alpha/2}\text{SE}(\widehat{\beta})<&\beta<\widehat{\beta}+t_{16,1-\alpha/2}\text{SE}(\widehat{\beta})\\ - 0.0372008 -2.921\cdot 0.00631331<&\beta<- 0.0372008 +2.921\cdot 0.00631331\\ -0.055641979<&\beta<-0.0188 . \end{aligned} Thus the 99% confidence interval of \beta is given by: (-0.055641979, -0.0188). Note that \beta=0 in not within the 99% confidence interval, therefore we would reject the null hypothesis that \beta equals zero and accept the alternative that \beta\neq0 at a 1% level of significance. This confirms the result in (2) where the correlation coefficient was shown to not equal zero at the 1% significance level.
The linear regression model is given by: y_i =\alpha + \beta x_i +\epsilon_i, where \epsilon_i \sim N(0,\sigma^2) i.i.d. distributed for i=1,\ldots,n.
The fitted linear regression equation is given by:\widehat{y} = \widehat{\alpha} + \widehat{\beta}x. The estimated coefficients of the linear regression model are given by (see Formulae and Tables page 25): \begin{aligned} \widehat{\beta} &= \frac{s_{xy}}{s_{xx}} = \frac{1122}{\sum_{i=1}^n x_i^2 -n \overline{x}^2} \\ &= \frac{1122}{60016 - 12 \cdot \left(\frac{836}{12}\right)^2} = \frac{1122}{1774.67} = 0.63223 \\ \widehat{\alpha} &= \overline{y} - \widehat{\beta} \overline{x} = \frac{\sum_{i=1}^n y_i}{n} - \widehat{\beta}\frac{\sum_{i=1}^n x_i}{n} \\ &= \frac{867}{12} - 0.63223\cdot \frac{836}{12} = 28.205. \end{aligned} Thus, the fitted linear regression equation is given by: \widehat{y} = 28.205 + 0.63223\cdot x.
The estimate for \sigma^2 is given by: \begin{aligned} \widehat{\sigma}^2 &= \frac{1}{n-2} \sum_{i=1}^n (y_i-\widehat{y}_i)^2 \\ &= \frac{1}{n-2} \text{SSE} \\ &= \frac{1}{n-2} (\text{SST}-\text{SSM}) \\ &\overset{*}{=} \frac{1}{n-2} \left(\sum_{i=1}^n (y_i-\overline{y})^2 - \widehat{\beta}_1^2 \cdot S_{xx} \right) \\ &= \frac{1}{n-2} \left(\sum_{i=1}^n y^2_i-n\cdot \overline{y}^2 - \frac{\left(\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y}) \right)^2}{\sum_{i=1}^n x_i^2 - n \overline{x}^2 }\right) \\ &= \frac{1}{10} \cdot \left(63603 - 12 \cdot \left(\frac{867}{12}\right)^2 - \frac{1122^2}{60016 - 836^2/12}\right) = 25.289 \,. \end{aligned}
* uses the result from Q5 for SSM.
We know the pivotal quantity: \frac{s^2}{\sigma^2/(n-2)} \sim \chi^2_{n-2} \,. Note: we have n-2 degrees of freedom because we have to estimate two parameters form the data (\widehat{\alpha} and \widehat{\beta}). We have that s^2=\widehat{\sigma}^2. Thus we have that the 90% confidence interval is given by: \begin{aligned} \frac{10\widehat{\sigma}^2}{\chi^2_{0.95,10}} &< \sigma^2 < \frac{10\widehat{\sigma}^2}{\chi^2_{0.05,10}} \\ \frac{10\cdot 25.289}{18.3} &< \sigma^2 < \frac{10\cdot 25.289}{3.94} \\ 13.8 &< \sigma^2 < 64.2 \end{aligned} Thus the 90% confidence interval of \sigma^2 is given by (13.8, 64.2).
- We test the following:
H_0:\beta=0 \hskip3mm \text{v.s.} \hskip3mm H_1:\beta> 0,
with a level of significance \alpha=0.05.
- The test statistic is: T = \frac{\widehat{\beta}-\beta}{\sqrt{\widehat{\sigma^2}/\left(\sum_{i=1}^n (x_i-\overline{x})^2\right) }} \sim t_{n-2}
- The rejection region of the test is given by: C= \{(X_1,\ldots,X_n): T\in(t_{10,1-0.05},\infty) \} = \{(X_1,\ldots,X_n): T\in(1.812,\infty) \}
- The value of the test statistic is given by: T = \frac{0.63223-0}{\sqrt{25.289/(\sum_{i=1}^nx^2_i-n\overline{x}^2) }} =\frac{0.63223-0}{\sqrt{25.289/(60016-836^2/12 )}} = 5.296.
- The value of the test statistic is in the rejection region, hence we reject the null hypothesis of a zero correlation.
- We test the following:
H_0:\beta=0 \hskip3mm \text{v.s.} \hskip3mm H_1:\beta> 0,
with a level of significance \alpha=0.05.
We have that \frac{(y_i|x_i)-(\widehat{y}|x_i)}{\sqrt{\mathbb{V}(y_i|x_i)}} has a student-t distribution: \frac{y_i|x_i-\widehat{y}|x_i}{\sqrt{\mathbb{V}(y_i|x_i)}} \sim t_{n-2} The predicted value is given by:
\widehat{y}|x_i = \widehat{\alpha} +\widehat{\beta}x_i = 28.205 + 0.63223\cdot 53 = 61.713. The estimated variance of the observation x=53 is give by: \begin{aligned} \mathbb{V}(y_i|x_i=53) =& \left(\frac{1}{n} + \frac{(x-\overline{x})^2}{\sum_{i=1}^n(x_i-\overline{x})^2} \right)\widehat{\sigma^2}\\ =& \left(\frac{1}{12} + \frac{(53-836/12)^2}{60016 -836^2/12 } \right) 25.289 = 6.0657. \end{aligned} Thus, the 95% confidence interval for the value of y given that x=53 is given by: \begin{aligned} \widehat{y}-t_{1-0.05/2}\cdot\sqrt{\mathbb{V}(y_i|x_i=53)}&<y|x=53<\widehat{y}+t_{1-0.05/2}\cdot\sqrt{\mathbb{V}(y_i|x_i=53)}\\ 61.713-2.228\cdot\sqrt{6.0657}&<y|x=53<61.713+2.228\cdot\sqrt{6.0657}\\ 56.2&<y|x=53<67.2 \end{aligned} Thus the 95% confidence interval of y given x=53 is (56.2 , 67.2).
- We test the following hypothesis: H_0: \rho=0.75 \hskip3mm \text{v.s.} \hskip3mm H_1:\rho\neq0.75
- The test statistic is given by: T=\frac{Z_r-z_\rho}{\sqrt{\frac{1}{n-3}}} \sim N(0 , 1)
- The critical region is given by: C = \{(X_1,\ldots,X_n):T\in \{(-\infty, -z_{1-\alpha/2}) \cup (z_{1-\alpha/2} , \infty)\} \}
- The value of the test statistic is given by: \frac{Z_r-z_\rho}{\sqrt{\frac{1}{9}}} = 3(z_r-z_\rho) = 3( 1.2880 - 0.97296 ) = 0.94512 where \begin{aligned} z_r =& \frac{1}{2} \log \left( \frac{1+0.85860}{1-0.85860}\right)=1.2880 \\ z_\rho & = \frac{1}{2} \log\left(\frac{1+0.75}{1-0.75}\right) =0.97296\\ r=&\frac{\sum_{i=1}^n(x_i-\overline{x})(x_i-\overline{y})}{\sqrt{\sum_{i=1}^n(x_i-\overline{x})^2 \sum_{i=1}^n(y_i-\overline{y})^2 }}\\ =& \frac{1122}{(\sum_{i=1}^ny^2_i-n\overline{y}_i^2)(\sum_{i=1}^nx_i^2-n\overline{x}^2)}\\ =&\frac{1122}{\sqrt{ 962.25\cdot 1774.667}} = 0.85860 \end{aligned}
- We have that z_{0.82894}=0.95. Thus, the p-value is given by 2\cdot(1-0.82894)=0.34212. The value of the test statistic is not in the critical region if the level of significance is lower than 0.34212 (which is normally the case). Hence, for reasonable values of the level of significance we would not reject the null hypothesis.
The proportion of the variability explained by the model is given by: \begin{aligned} R^2 =& \frac{\text{SSM}}{\text{SST}} = 1-\frac{\text{SSE}}{\text{SST}}\\ =& 1-\frac{\sum_{i=1}^n(y_i-\widehat{y}_i)^2}{\sum_{i=1}^n(y_i-\overline{y}_i)^2}\\ =& 1-\frac{\sum_{i=1}^ny^2_i-n\overline{y}^2 - \frac{\left(\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y})\right)^2}{\sum_{i=1}^nx_i^2-n\overline{x}^2}}{\sum_{i=1}^ny^2_i-n\overline{y}_i^2}\\ =& \frac{\left(\sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y})\right)^2}{(\sum_{i=1}^ny^2_i-n\overline{y}_i^2)(\sum_{i=1}^nx_i^2-n\overline{x}^2)}\\ =& \frac{1122^2}{ 962.25\cdot 1774.667} = 0.737193. \end{aligned} Hence, a large proportion of the variability of Y is explained by X.
The completed ANOVA table is given below:
Source D.F. Sum of Squares Mean Squares F-Ratio Regression 1 639.5-475.6=163.9 163.9 \frac{163.9}{8.2}=19.99 Error 58 8.2 \times 58=475.6 8.2 Total 59 639.5 A simple linear regression problem:
Since we know that \widehat{\beta}=r\frac{s_{y}}{s_{x}}, then r=\widehat{\beta}\frac{s_{x}}{s_{y}}=7.445(2.004/21.56)=69.2\%. where s_{x},s_{y} are sample standard deviations. Alternatively, you can use the fact that R^{2}=r^{2}, so that from 4. below, r^{2}=0.4794\Longrightarrow r=+\sqrt{0.4794}=69.2\%. You take the positive square root because of the positive sign of the coefficient of EPS.
Given EPS=2, we have: \widehat{STKPRICE}=25.044+7.445\left( 2\right) =39.934. A 95% confidence interval of this estimate is given by: \begin{aligned} & \left( \widehat{\alpha }+\widehat{\beta}x_{0}\right) \pm t_{1-\alpha/2,n-2}\times s\sqrt{\left( \frac{1}{n}+\frac{\left( \overline{x}-x_{0}\right) ^{2}}{\left( n-1\right) s_{x}^{2}}\right) } \\ &= \left( 39.934\right) \pm \underbrace{t_{1-0.025,46}}_{=2.012896}\times \sqrt{247}\sqrt{\left( \frac{1}{48}+\frac{\left( 2.338-2\right) ^{2}}{\left( 47\right) \left( 2.004^{2}\right) }\right) } \\ &= 39.934\pm 4.636=\left( 35.298,44.570\right) . \end{aligned} where s_{x}^{2} is the sample variance of X.
A 95% confidence interval for \beta is: \begin{aligned} \widehat{\beta}\pm t_{1-\alpha /2,n-2}\cdot \text{SE}( \widehat{\beta} ) &= 7.445\pm 2.0147\times \frac{\sqrt{247}}{2.004\sqrt{47}} \\ &= 7.445\pm 2.305 \\ &= \left( 5.14,9.75\right) . \end{aligned}
s=\sqrt{247}=15.716 and R^{2}=\frac{\text{SSM}}{\text{SST}}=\frac{10475}{21851}=47.94\%.
A scatter plot or diagram of the fitted values against the residuals (standardised) will provide us an indication of the constancy of the variation in the errors.
To test for the significance of the variable EPS, we test H_{0}:\beta=0 against H_{a}:\beta \neq 0. The test statistic is: t( \widehat{\beta} ) = \frac{\widehat{\beta}}{\text{SE}( \widehat{\beta} ) } = \frac{7.445}{1.144} = 6.508. This is larger than t_{1-\alpha /2,n-2}=2.0147 and therefore we reject the null. There is evidence to support the fact that the EPS variable is a significant predictor of stock price.
To test H_{0}:\beta =24 against H_{a}:\beta >24, the test statistic is given by: t( \widehat{\beta} ) = \frac{\widehat{\beta}-\beta_{0}}{\text{SE}(\widehat{\beta}) } = \frac{7.445-24}{1.144} = -14.47. Thus, since this test statistic is smaller than t_{1-\alpha,n-2}=t_{0.95,46}=1.676, do not reject the null hypothesis.
The grand total sum is \sum x=2479+2619+2441+2677=10216 so that the grand mean is \overline{\overline{x}}=10216/40=255.4. Also, \sum x^{2}=617163+687467+597607+718973=2621210. Therefore the total sum of squares is: \begin{aligned} \text{SST} &= \sum \left( x-\overline{\overline{x}}\right) ^{2}=\sum x^{2}-N\overline{\overline{x}}^{2} \\ &= 2621210-(40)(255.4)^{2}=12043.6. \end{aligned} The sum of squares between the regions is: \begin{aligned} \text{SSM} &= \sum n_{i}\left( \overline{x}_{i.}-\overline{\overline{x}}\right)^{2} \\ &= 10\left( \left( 247.9-255.4\right) ^{2}+\left( 261.9-255.4\right)^{2}+\left( 244.1-255.4\right) ^{2}+\left( 267.7-255.4\right) ^{2}\right) \\ &= 3774.8. \end{aligned} The difference gives the sum of squares within the regions: \text{SSE}=\text{SST}-\text{SSM}=12043.6-3774.8=8268.8. The one-way ANOVA table is then summarised below:
ANOVA Table for the One-Way Layout Source d.f. Sum of Squares Mean Square F-Statistic Between 3 3774.8 1258.27 \frac{1258.27}{229.69}=5.478 Within 36 8268.8 229.69 Total 39 12043.6 Thus, to test the equality of the mean premiums across the regions, we test: H_{0}:\alpha _{A}=\alpha _{B}=\alpha _{C}=\alpha _{D}=0 \hskip5mm \text{all variances are equal} against the alternative: H_{a}:\text{at least one }\alpha \text{ is not zero} \hskip5mm \text{all variances are equal} using the F-test. Since F=5.478>F_{0.95}\left( 3,36\right) =2.9 (approximately), we therefore reject H_{0}. There is evidence to support a difference in the mean premiums across regions. The one-way ANOVA model assumptions are as follows: each random variable x_{ij} is observed according to the model x_{ij}=\mu +\alpha _{i}+\varepsilon _{ij},\text{ for }i=1,\ldots,I\text{, and }j=1,2,\ldots,n_{i} where \varepsilon_{ij} refers to the random error in the j^{\text{th}} observation of the i^{\text{th}} treatment which satisfies:
\mathbb{E}\left[ \varepsilon _{ij}\right] =0 and \mathbb{V}\left( \varepsilon_{ij}\right) =\sigma ^{2} for all i,j.
The \varepsilon _{ij} are independent and normally distributed (normal errors), and where \mu is the overall mean and \alpha _{i} is the effect of the i^{\text{th}} treatment with: \sum_{i=1}^{I}\alpha _{i}=0.
We have the estimated correlation coefficient: \begin{aligned} r =& \frac{s_{ms}}{\sqrt{s_{mm}\cdot s_{ss}}}\\ =& \frac{\sum ms - n\overline{ms}}{\sqrt{(\sum m^2 -n\overline{m}^2) \cdot (\sum s^2 -n\overline{s}^2) }}\\ =& \frac{221,022.58 - 1136.1\cdot1934.2/10}{\sqrt{(129,853.03 -1136.1^2/10) \cdot (377,700.62 -1934.2^2/10) }} = 0.764. \end{aligned}
- We have the hypothesis: H_0:\rho=0 \hskip3mm \text{v.s.} \hskip3mm H_1:\rho>0
- The test statistic is: T=\frac{r \sqrt{n-2}}{\sqrt{1-r^2}} \sim t_{n-2}
- The critical region is given by: C = \{(X_1,\ldots,X_n):T\in (t_{n-2,1-\alpha},\infty)\}
- The value of the test is: T=\frac{r \sqrt{n-2}}{\sqrt{1-r^2}} = \frac{0.764 \sqrt{10-2}}{\sqrt{1-0.764^2}} = 3.35
- We have t_{8,1-0.005}=3.35. Thus the p-value is 0.005 and we reject the null hypothesis of a zero correlation for level of significance less than 0.005 (usually it is larger, thus then we reject the null).
Given the issue of whether mortality can be used to predict sickness, we require a plot of sickness against mortality:
There seems to be an increase linear relationship such that mortality could be used to predict sickness.
We have the estimates: \begin{aligned} \widehat{\beta} =& \frac{s_{ms}}{s_{mm}} = \frac{\sum ms - n\overline{ms}}{\sum m^2 -n\overline{m}^2}\\ =& \frac{221,022.58 - 1136.1\cdot1934.2/10}{ 129,853.03 -1136.1^2/10} = 1.6371\\ \widehat{\alpha} =& \overline{y} - \widehat{\beta}\overline{x} = \frac{1934.2}{10} - 1.6371 \frac{1136.1}{10} =7.426\\ \widehat{\sigma^2} =& \frac{1}{n-2} \sum_{i=1}^{n}(y_i-\widehat{y}_i)^2 =\frac{1}{n-2}\left(s_{ss} - \frac{s^2_{ms}}{s_{mm}}\right)\\ =&\frac{1}{8}\left((\sum s^2 -n\overline{s}^2) - \frac{(\sum ms -n\overline{ms})^2}{(\sum m^2 -n\overline{m}^2)}\right)\\ =&\frac{1}{8}\left(3587.656 - \frac{(1278.118)^2}{780.709}\right) =186.902\\ \mathbb{V}(\widehat{\beta}) =& \widehat{\sigma}^2/s_{mm} = 186.902/780.709 = 0.2394 \end{aligned}
- Hypothesis: H_0:\beta=2 \hskip3mm \text{v.s.} \hskip3mm H_1:\beta<2
- Test statistic: T=\frac{\widehat{\beta}-\beta}{\sqrt{\widehat{\sigma^2}/s_{xx}}} \sim t_{n-2}
- Critical region: C = \{(X_1,\ldots,X_n):T\in (-\infty,-t_{n-2,1-\alpha})\}
- Value of statistic: T=\frac{\widehat{\beta}-\beta}{\sqrt{\widehat{\sigma^2}/s_{xx}}} = \frac{1.6371-2}{\sqrt{0.2394}} = -0.74
- We have from Formulae and Tables page 163: t_{8,1-0.25}=0.7064 and t_{8,1-0.20}=0.8889. Thus the p-value (using symmetry) is between 0.2 and 0.25. Thus, we accept the null hypothesis if the level of significance is smaller than the p-value (which is usually the case). Note: exact p-value using computer package is 0.2402.
For a region with m=115 we have the estimated value:
\widehat{s} = 7.426 + 1.6371 \cdot 115 = 195.69 with corresponding variance: \widehat{\sigma^2}\left(\frac{1}{n} + \frac{(x_0-\overline{x})^2}{s_{mm}}\right) = 186.902\left(\frac{1}{10} + \frac{(115-113.61)^2}{780.709}\right) =19.1528 The corresponding 95% confidence limits are 195.69-t_{8,1-0.025}\cdot \text{SE} (s|m=115)=195.69-2.306\cdot\sqrt{19.1528} = 185.60 and 195.69+t_{8,1-0.025}\cdot \text{SE} (s|m=115)=195.69+2.306\cdot\sqrt{19.1528} = 205.78.
We have: \begin{aligned} \text{SST} &= \sum y^2 - \left(\sum y\right)^2/n = 70.8744 - 29.12^2/16 = 17.8760 \,, \\ \sum x &= 4 \cdot (1+2+3+4) = 40 \sum x^2 = 4 \cdot (1^2+2^2+3^2+4^2) = 120 \,, \\ \sum xy &= 1 \cdot 2.73 + 2 \cdot 6.26 + 3 \cdot 9.22 + 4 \cdot 10.91 = 86.55 \,, \\ s_{xy} &= \sum xy - \sum x\sum y/n = 86.55-40 \cdot 29.12/16 = 13.75 \,, \\ \text{SSM} &= \widehat{\beta}_1^2 \cdot s_{xx} = \left(\frac{13.75}{20}\right)^2 \cdot 20 = 9.453125 \,, \\ \text{SSE} &= \text{SST} - \text{SSM} = 17.8760 - 9.453125 = 8.422875. \end{aligned}
We have: \begin{aligned} \widehat{\beta} &= \frac{s_{xy}}{s_{xx}} = \frac{13.75}{20} = 0.6875 \\ \widehat{\alpha} &= \overline{y} - \widehat{\beta}\overline{x} = (29.12-0.6875\cdot40)/16 = 0.1012 \,. \end{aligned} Thus, the fitted model is given by \widehat{y} = \widehat{\alpha} + \widehat{\beta}x = 0.1012+0.6875x.
For x=1 we have: \widehat{y} = \widehat{\alpha} + \widehat{\beta}x = 0.1012+0.6875\cdot1 = 0.7887
For x=4 we have: \widehat{y} = \widehat{\alpha} + \widehat{\beta}x = 0.1012+0.6875\cdot4 = 2.8512We have \text{SE}(\widehat{\beta}) = \sqrt{\frac{8.4229/14}{20}}=0.1734.
i) Hypothesis: H_0: \beta=0 \hskip3mm \text{v.s.} \hskip3mm H_1: \beta\neq 0 ii) Test statistic: T=\frac{\widehat{\beta}-\beta}{\text{SE}(\widehat{\beta})} \sim t_{n-2} iii) Critical region: C = \{(X_1,\ldots,X_n):T\in \{(-\infty,-t_{n-2,1-\alpha/2}) \cup (t_{n-2,1-\alpha/2},\infty)\} \} iv) Value of statistic: T=\frac{\widehat{\beta}-\beta}{\text{SE}(\widehat{\beta})} = \frac{0.6875-0}{0.1734}=3.965 v) We have t_{14,1-0.001}=3.787 and t_{14,1-0.0005}=4.140. Thus the p-value is between 0.1% and 0.2%. Accept the null hypothesis if the level of significance is lower than the p-value (which is usually not the case). Hence, we have strong evidence against the “no linear relationship” hypothesis. Note: exact p-value using computer package is 0.00070481.
Calculating the sums of squares in this question is done similarly to question 13. We have: \begin{aligned} \text{SST} &= 17.8760 \,, \\ \text{SSM} &= \sum n_i \left( \overline{y}_{i\cdot}-\overline{\overline{y}}\right)^{2}= 4\sum\left(\overline{y}_{i}-\overline{\overline{y}}\right)^2=9.6709 \,, \\ \text{SSE} &= \text{SST} - \text{SSM} = 17.8760-9.6709 = 8.2051 \,. \end{aligned}
\begin{aligned} \widehat{\mu} =& 29.12/16=1.82\\ \widehat{\tau}_1 =& 2.73/4-1.82 = -1.1375\\ \widehat{\tau}_2 =& 6.26/4-1.82 = -0.255\\ \widehat{\tau}_3 =& 9.22/4-1.82 = 0.485\\ \widehat{\tau}_4 =& 10.91/4-1.82= 0.9075\\ \end{aligned}
Company A: fitted value =2.73/4=0.6825
Company D: fitted value =10.91/4=2.7275Observed F statistic is (9.6709/3)/(8.2051/12)=4.715 on (3,12) d.f..
From Formulae and Tables page 173 and 174 we observe that F_{3,12} (4.474) = 2.5\% and F_{3,12} (5.953) = 1\%. Thus the p-value is between 0.025 and 0.01, so we have some evidence against the “no company effects” hypothesis. Note: exact p-value using computer package is 0.0213.
Multiple linear regression questions
In Table 3.4, the null hypothesis for
TV
is that in the presence of radio ads and newspaper ads, TV ads have no effect on sales. Similarly, the null hypothesis forradio
is that in the presence of TV and newspaper ads, radio ads have no effect on sales. (And there is a similar null hypothesis fornewspaper
.) The low p-values of TV and radio suggest that the null hypotheses are false for TV and radio. The high p-value of newspaper suggests that the null hypothesis is true for newspaper.The fitted model is given by Y = 50 + 20 \ \mathrm{GPA} + 0.07 \ \mathrm{IQ} + 35 \ \mathrm{Gender} + 0.01\mathrm{GPA \times IQ} - 10 \ \mathrm{GPA \times Gender}. For males, Gender = 0, so Y = 50 + 20 \ \mathrm{GPA} + 0.07 \ \mathrm{IQ} + 0.01\mathrm{GPA \times IQ}. For females, Gender = 1, so Y = 85 + 10 \ \mathrm{GPA} + 0.07 \ \mathrm{IQ} + 0.01\mathrm{GPA \times IQ}.
False. For a fixed value of IQ and GPA, if GPA <3.5, males earn less on average than females.
False. For a fixed value of IQ and GPA, if GPA >3.5, females earn less on average than males.
True. For a fixed value of IQ and GPA, if GPA >3.5, males earn more on average than females.
False. See above.
I would expect the polynomial regression to have a lower training RSS than the linear regression because it could make a tighter fit against data that matched with a wider irreducible error \mathbb{V}(\epsilon).
I would expect the polynomial regression to have a higher test RSS as the overfit from training would have more error than the linear regression.
Polynomial regression has lower train RSS than the linear fit because of higher flexibility: no matter what the underlying true relationship is the more flexible model will closer follow points and reduce train RSS. An example of this behaviour is shown on Figure 2.9 from Chapter 2.
There is not enough information to tell which test RSS would be lower for either regression given the problem statement is defined as not knowing “how far it is from linear” If it is closer to linear than cubic, the linear regression test RSS could be lower than the cubic regression test RSS. Or, if it is closer to cubic than linear, the cubic regression test RSS could be lower than the linear regression test RSS. It is dues to bias-variance trade-off: it is not clear what level of flexibility will fit data better.
The design matrix is \boldsymbol{X} = [\boldsymbol{1}_n \hskip2mm \boldsymbol{x}] = \left[ \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \\ \end{array} \right]
The matrix \boldsymbol{X}^\top \boldsymbol{X} is \hskip-5mm \boldsymbol{X}^\top \boldsymbol{X} = \left[ \begin{array}{cccc} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_n \\ \end{array} \right] \left[ \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \\ \end{array} \right] = \left[ \begin{array}{cc} n & \sum_{i=1}^{n} x_i \\ \sum_{i=1}^{n} x_i & \sum_{i=1}^{n} x^2_i \\ \end{array} \right] = n\left[ \begin{array}{cc} 1 & \overline{x} \\ \overline{x} & \frac{1}{n}\sum_{i=1}^{n} x^2_i \\ \end{array} \right]
The matrix \boldsymbol{X}^\top \boldsymbol{y} is \boldsymbol{X}^\top \boldsymbol{y} = \left[ \begin{array}{cccc} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_n \\ \end{array} \right] \left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \\ \end{array} \right] = \left[ \begin{array}{c} \sum_{i=1}^{n} y_i \\ \sum_{i=1}^{n} x_iy_i \\ \end{array} \right]
Note: the inverse of a 2\times 2 matrix is given by: M^{-1} = \left[ \begin{array}{cc} a & b \\ c & d \\ \end{array} \right]^{-1} = \frac{1}{\det(M)}\cdot \left[ \begin{array}{cc} d & -b \\ -c & a \\ \end{array} \right]= \frac{1}{ad-bc}\cdot \left[ \begin{array}{cc} d & -b \\ -c & a \\ \end{array} \right] Using this and the result from 2. we have: (\boldsymbol{X}^\top \boldsymbol{X})^{-1} = \frac{1}{n\sum_{i=1}^{n} x_i^2 - n^2 \overline{x}^2}\cdot \left[ \begin{array}{cc} \sum_{i=1}^{n} x_i^2 & -n\overline{x} \\ -n\overline{x} & n \\ \end{array} \right]= \frac{1}{s_{xx}}\cdot \left[ \begin{array}{cc} \frac{1}{n}\sum_{i=1}^{n} x_i^2 & -\overline{x} \\ -\overline{x} & 1 \\ \end{array} \right]
Using the result of 3. and 4. we have: \begin{aligned} \widehat{\boldsymbol{\beta}} =& (\boldsymbol{X}^\top \boldsymbol{X})^{-1} \boldsymbol{X}^\top \boldsymbol{y}= \frac{1}{s_{xx}}\cdot \left[ \begin{array}{cc} \frac{1}{n}\sum_{i=1}^{n} x_i^2 & -\overline{x} \\ -\overline{x} & 1 \\ \end{array} \right] \left[ \begin{array}{c} \sum_{i=1}^{n} y_i \\ \sum_{i=1}^{n} x_iy_i \\ \end{array} \right] \\=& \frac{1}{s_{xx}} \left[ \begin{array}{c} \sum_{i=1}^{n} y_i \cdot\frac{1}{n}\sum_{i=1}^{n} x_i^2 - \sum_{i=1}^{n} x_iy_i \cdot \overline{x}\\ -\sum_{i=1}^{n} y_i \cdot \overline{x} + \sum_{i=1}^{n} x_iy_i \cdot 1\\ \end{array} \right] = \left[ \begin{array}{c} \overline{y}\sum_{i=1}^{n} x_i^2 - \sum_{i=1}^{n} x_iy_i \cdot \overline{x}\\ \sum_{i=1}^{n} x_iy_i - n\overline{x}\overline{y} \\ \end{array} \right] \\ =& \left[ \begin{array}{c} \overline{y}\left(\sum_{i=1}^{n} x_i^2-n\overline{x}^2\right) - \left(\sum_{i=1}^{n} x_iy_i \cdot \overline{x} - n\overline{x}^2\overline{y}\right)\\ \sum_{i=1}^{n} x_iy_i - n\overline{x}\overline{y} \\ \end{array} \right] \\=& \left[ \begin{array}{c} \overline{y}\left(\sum_{i=1}^{n} x_i^2-n\overline{x}^2\right) - \overline{x}\left(\sum_{i=1}^{n} x_iy_i - n\overline{x}\overline{y}\right)\\ \sum_{i=1}^{n} x_iy_i - n\overline{x}\overline{y} \\ \end{array} \right] =\left[ \begin{array}{c} \overline{y}-\frac{s_{xy}}{s_{xx}} \overline{x}\\ \frac{s_{xy}}{s_{xx}}\\ \end{array} \right] \end{aligned}
Statement (E) is correct. Note that statement (A) is incorrect because, if food sales increases with one, the expected profit increases with \widehat{\beta}_1\cdot 10 (note the difference in the scale of profit (thousands) and food sales (in ten thousands). Similarly, (B), (C) and (D) are incorrect.
Statement (D) is correct. We have n=25 observations, p=3+1=4 parameters (three explanatory variables and the constant), \text{SST}=666.98, and \text{SSM}=610.48. Thus we have: \begin{aligned} \text{SSE} &= \text{SST} - \text{SSM} = 666.98 - 610.48 = 56.5 \\ R^2_a &= 1 - \frac{\text{SSE}/(n-p)}{\text{SST}/(n-1)} = 1-\frac{56.5/(25-4)}{666.98/(25-1)} = 1-\frac{56.5/21}{666.98/24} = 0.903 \,. \end{aligned}
Statement (D) is correct. \begin{aligned} R^2 &\overset{*}{=} \frac{\text{SSM}}{\text{SST}} \overset{**}{=} \frac{\text{SST}-\text{SSE}}{\text{SST}} \hskip3mm \text{I and II correct}\\ &\overset{*}{=} \frac{\text{SSM}}{\text{SST}} \overset{**}{=} \frac{\text{SSM}}{\text{SSM}+\text{SSE}} \neq \frac{\text{SSM}}{\text{SSE}} \hskip3mm \text{because SSM$>0$, III incorrect} \end{aligned} * using definition of R^2 and ** using SST=SSM+SSE.
Statement (B) is correct. R^2_a = 1- \frac{\text{SSE}/(n-p)}{\text{SST}/(n-1)} = 1-\frac{8525.3/(47-5)}{21851.4/(48-1)} = 1-\frac{8525.3/42}{21851.4/46} = 0.563
Statement (D) is correct. Let \boldsymbol{C}=(\boldsymbol{X}^\top \boldsymbol{X})^{-1} and c_{33} the third diagonal element of the matrix \boldsymbol{C}. We have: \text{SE}\left(\widehat{\beta}_2\right) = \sqrt{c_{33}\cdot s^2} = \sqrt{0.102446 \cdot 30106} = 55.535928
Statement (C) is correct. We have: \widehat{\boldsymbol{\beta}} = (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\boldsymbol{X}^\top \boldsymbol{y} In order to find the estimate of the parameter related to x_3 (having graduated from college) we need the fourth (note \beta_1 corresponds to the constant) row of the matrix (\boldsymbol{X}^\top \boldsymbol{X})^{-1} and multiply that with the vector \boldsymbol{X}^\top \boldsymbol{y}. We have: \widehat{\beta}_3 = \left[ \begin{array}{cccc} -0.026804 & -0.000091 & 0.023971 & 0.083184 \\ \end{array} \right] \left[ \begin{array}{c} 9,558 \\ 4,880,937 \\ 7,396 \\ 6,552 \\ \end{array} \right] = 21.953 Note that y is in hundreds of dollars, so having a graduated from college leads to 21.953\cdot100=2,195.3 on the amount paid for a car.
Statement (A) is correct. We have that the distribution of \widehat{\beta}_k for k=1,\ldots,p is given by: \frac{\widehat{\beta}_k-\beta_k}{\text{SE}\left(\widehat{\beta}_k\right)} \sim t_{n-p} We have p=5, and n=212. Note, n-p is large, thus the standard normal approximation for the student-t is appropriate (Formulae and Tables page 163 only shows a table for degrees of freedom up to 120 and \infty = standard normal). We have z_{1-0.05/2}=1.96. This provides the well-known rule of thumb that the absolute value of the T value should be larger than 2 for parameter estimates to be significant (|T|>2). This is the case for all parameters.
Statement (D) is correct. \begin{aligned} \text{LIFE\_EXP }=&\text{ 48.24 }+\text{ 0.79 GNP }+\text{ 0.154 URBAN\%}\\ =&\text{ 48.24 }+\text{ 0.79 }\cdot 3+\text{ 0.154 } \cdot 60\\ =&\text{ 59.85 } \end{aligned}
Statement (C) is correct.
Can be done by the scatterplot, but a QQ-plot is better.
Can be done by the scatterplot, but R^2 is better method.
Is the correct one, need both the errors and the corresponding value of the endogenous variable.
Should be by definition by selecting the LS estimator, so does not need to be tested.
Errors should be independent of X not Y.
KNN question
- \begin{aligned} \text{EPE}_k(x_0) &= \mathbb{E}[(Y-\widehat{f}(x_0))^2|X=x_0] \\ &= \mathbb{E}[(\epsilon + f(X) - \mathbb{E}(\widehat{f}(x_0)) + \mathbb{E}(\widehat{f}(x_0)) - \widehat{f}(x_0) )^2 | X = x_0] \\ &= \mathbb{E}[\epsilon^2 + (f(x_0) - \mathbb{E}(\widehat{f}(x_0)))^2 + (\mathbb{E}(\widehat{f}(x_0)) - \widehat{f}(x_0) )^2 - 2\epsilon(f(x_0) - \widehat{f}(x_0)) + \\ & 2(\mathbb{E}(\widehat{f}(x_0)) - \widehat{f}(x_0) )(f(x_0) - \mathbb{E}(\widehat{f}(x_0))) ] \\ &= \mathbb{E}[\epsilon^2 + (f(x_0) - \mathbb{E}(\widehat{f}(x_0)))^2 + (\mathbb{E}(\widehat{f}(x_0)) - \widehat{f}(x_0) )^2] \\ &= \sigma^2 + \left[f(x_0) - \frac{1}{k}\sum_{l \in N(x_0)} f(x_{(l)}) \right]^2 + \frac{\sigma^2}{k} \end{aligned} Note that \epsilon is independent zero-mean noise, and: \begin{aligned} \mathbb{E}(\mathbb{E}(\widehat{f}(x_0)) - \widehat{f}(x_0) )(f(x_0) - \mathbb{E}(\widehat{f}(x_0))) &= \mathbb{E}[\mathbb{E}(\widehat{f}(x_0))f(x_0) - \widehat{f}(x_0)f(x_0) - (\mathbb{E}(\widehat{f}(x_0)))^2 + \\ & \widehat{f}(x_0)\mathbb{E}(\widehat{f}(x_0))] \\ &= 0 \quad \text{since \(f(x_0)\)'s true value constant for fixed \(X=x_0\)} \end{aligned}
Applied Questions
Please install the package and load the data by the following command first.
install.packages("ISLR2")
library(ISLR2)
<- lm(mpg ~ horsepower, data = Auto) fit summary(fit)
Call: lm(formula = mpg ~ horsepower, data = Auto) Residuals: Min 1Q Median 3Q Max -13.5710 -3.2592 -0.3435 2.7630 16.9240 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 39.935861 0.717499 55.66 <2e-16 *** horsepower -0.157845 0.006446 -24.49 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.906 on 390 degrees of freedom Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049 F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Yes
Very significant (p-value of < 2.10^{-16})
Negative
predict(fit, newdata = data.frame(horsepower = c(98)), interval = "confidence")
fit lwr upr 1 24.46708 23.97308 24.96108
predict(fit, newdata = data.frame(horsepower = c(98)), interval = "prediction")
fit lwr upr 1 24.46708 14.8094 34.12476
plot(Auto$horsepower, Auto$mpg) abline(a = fit$coefficients[1], b = fit$coefficients[2])
par(mfrow = c(2, 2)) plot(fit)
There appears to be some trend in the residuals, indicating a linear fit is not appropriate.
set.seed(1) <- rnorm(100) x <- 2 * x + rnorm(100) y
summary(lm(y ~ x + 0))
Call: lm(formula = y ~ x + 0) Residuals: Min 1Q Median 3Q Max -1.9154 -0.6472 -0.1771 0.5056 2.3109 Coefficients: Estimate Std. Error t value Pr(>|t|) x 1.9939 0.1065 18.73 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9586 on 99 degrees of freedom Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776 F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
Result is fairly close to what’s expected (2).
summary(lm(x ~ y + 0))
Call: lm(formula = x ~ y + 0) Residuals: Min 1Q Median 3Q Max -0.8699 -0.2368 0.1030 0.2858 0.8938 Coefficients: Estimate Std. Error t value Pr(>|t|) y 0.39111 0.02089 18.73 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4246 on 99 degrees of freedom Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776 F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
Result is a bit far from what is expected (0.5), and it doesn’t land in its 95% confidence interval.
The estimate in (a) is about 5 times the estimate in (b). The t-statistics, however, are identical.
See: \begin{aligned} t &= \frac{\sum_{i} x_i y_i}{\sum_{j} x_j^2} \times \sqrt{\frac{(n-1)\sum_{j} x_j^2}{\sum_i (y_i - x_i \widehat{\beta})^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sum_j x_j^2} \times \sqrt{\frac{\sum_j x_j^2}{\sum_i (y_i - x_i \widehat{\beta})^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sqrt{\sum_j x_j^2}} \times \sqrt{\frac{1}{\sum_i (y_i - x_i \widehat{\beta})^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sqrt{\sum_j x_j^2}} \times \sqrt{\frac{1}{\sum_i y_i^2 - 2y_i x_i \widehat{\beta} + x_i^2 \widehat{\beta}^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sqrt{\sum_j x_j^2}} \times \sqrt{\frac{1}{\sum_i y_i^2 -2y_i x_i \frac{\sum_j x_j y_j}{\sum_k x_k^2} + x_i^2 (\frac{\sum_j x_j y_j}{\sum_k x_k^2})^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sqrt{(\sum_i y_i^2)(\sum_j x_j^2) -2(\sum_i x_i y_i)^2 + (\sum_i x_i y_i)^2}} \\ &= \frac{\sqrt{n-1} \sum_i x_i y_i}{\sqrt{(\sum_i y_i^2)(\sum_j x^2_j) - (\sum_i x_i y_i)^2}} \end{aligned}
In R, this is written as
sqrt(100 - 1) * sum(x * y)) / sqrt(sum(x^2) * sum(y^2) - sum(x * y)^2) (
[1] 18.72593
This returns the same value as the t-statistic.
Due to the symmetry of x and y, we find we have the same formula as above. Hence the t-statistic is the same.
<- lm(y ~ x) fit <- lm(x ~ y) fit2 summary(fit)
Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -1.8768 -0.6138 -0.1395 0.5394 2.3462 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.03769 0.09699 -0.389 0.698 x 1.99894 0.10773 18.556 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9628 on 98 degrees of freedom Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762 F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
summary(fit2)
Call: lm(formula = x ~ y) Residuals: Min 1Q Median 3Q Max -0.90848 -0.28101 0.06274 0.24570 0.85736 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.03880 0.04266 0.91 0.365 y 0.38942 0.02099 18.56 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4249 on 98 degrees of freedom Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762 F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16