Below are the solutions to these exercises on Multiple Regression (part 3).

data(state) state77 <- as.data.frame(state.x77) names(state77)[4] <- "Life.Exp" names(state77)[6] <- "HS.Grad" #################### # # # Exercise 1 # # # #################### #a. library(car) m1 <- lm(Life.Exp ~ HS.Grad+Murder, data=state77) avPlots(m1)

#Note that the slope of the line is positive in the HS.Grad plot, and negative in the Murder plot, as expected. #b. avPlots(m1,id.method=list("mahal"),id.n=2)

#################### # # # Exercise 2 # # # #################### #a. with(state77,avPlot(lm(Life.Exp ~ HS.Grad+Murder+Illiteracy),variable=Illiteracy))

#Note that the slope is positive, contrary to what is expected #b. avPlots(lm(Life.Exp ~ .,data=state77), terms= ~ Population+Area)

#################### # # # Exercise 3 # # # #################### crPlots(lm(Life.Exp ~ HS.Grad+Murder+Income+Area,data=state77))

#We see that there seems to be a problem with linearity for Income and Area (which could be due to the outlier in the lower right corner in both plots). #################### # # # Exercise 4 # # # #################### ceresPlots(lm(Life.Exp ~ HS.Grad+Murder+Income+Area,data=state77))

#Here, there is not much difference with the plots in Exercise 3 (although, in general, CERES plots are "less prone to leakage of nonlinearity among the predictors.") #################### # # # Exercise 5 # # # #################### vif(lm(Life.Exp ~ .,data=state77))

## Population Income Illiteracy Murder HS.Grad Frost ## 1.499915 1.992680 4.403151 2.616472 3.134887 2.358206 ## Area ## 1.789764

#Some authors advocate that a vif>2.5 is a cause for concern, while others mention vif>4 or vif>10. According to these criteria, Illiteracy, Murder, and HS.Grad are the most problematic (in the presence of all the other predictors). #################### # # # Exercise 6 # # # #################### library(lmtest) bptest(m1)

## ## studentized Breusch-Pagan test ## ## data: m1 ## BP = 2.9728, df = 2, p-value = 0.2262

#There is no evidence of heteroscedasticity (of the type that depends on a linear combination of the predictors). #################### # # # Exercise 7 # # # #################### ncvTest(m1)

## Non-constant Variance Score Test ## Variance formula: ~ fitted.values ## Chisquare = 0.01065067 Df = 1 p = 0.9178026

#Note that the results are different to Exercise 6 because bptest (by default) uses studentized residuals (which is preferred for robustness) and assumes the error variance depends on a linear combination of the predictors, whereas ncvTest (by default) uses regular residuals and assumes the error variance depends on the fitted values. #ncvTest(m1) is equivalent to bptest(m1,varformula= ~ m1$fitted,studentize=F,data=state77) #################### # # # Exercise 8 # # # #################### bptest(m1,varformula= ~ I(HS.Grad^2)+I(Murder^2)+HS.Grad*Murder,data=state77)

## ## studentized Breusch-Pagan test ## ## data: m1 ## BP = 6.7384, df = 5, p-value = 0.2408

#################### # # # Exercise 9 # # # #################### #a. ks.test(m1$residuals,"pnorm")

## ## One-sample Kolmogorov-Smirnov test ## ## data: m1$residuals ## D = 0.15546, p-value = 0.1603 ## alternative hypothesis: two-sided

#There is no evidence that the residuals are not Normal. #b. shapiro.test(m1$residuals)

## ## Shapiro-Wilk normality test ## ## data: m1$residuals ## W = 0.96961, p-value = 0.2231

#Again, there is no evidence of nonnormality. #################### # # # Exercise 10 # # # #################### durbinWatsonTest(m1)

## lag Autocorrelation D-W Statistic p-value ## 1 0.04919151 1.8495 0.582 ## Alternative hypothesis: rho != 0

```
#There is no evidence of lag-1 autocorrelation in the residuals.
```

## Leave a Reply