library( pander ) # formatting tables
library( dplyr ) # data wrangling
library( stargazer ) # regression tables
# CHANGING FROM:
#URL <- "https://raw.githubusercontent.com/DS4PS/cpp-523-fall-2019/master/labs/data/predicting-subjective-wellbeing.csv"
#dat <- read.csv( URL, stringsAsFactors=F )
#TO
<- read.csv("/Users/Tyler/Desktop/thatsallfolks/TIY/Website_V1/WebsiteAssets/Data/predicting-subjective-wellbeing.csv",
dat stringsAsFactors = FALSE)
head(dat)
Consider a three-variable regression of Subjective Well-Being (SWB), Network Diversity (ND) and Perceived Social Support (PSS):
\(SWB=β_0+β_1 \cdot ND+β_2 \cdot PSS+e\)
Calculate the omitted variable bias on the Network Diversity (ND) coefficient that results from omitting the Perceived Social Support (PSS) variable from the regression.
\(SWB=b_0+b_1 \cdot ND+e\)
# Html viz problems
# Graph changes when knitted vs executed in console?
# Necessary to make third code chunk Eval = F
<- lm( SWB ~ ND + PSS, data=dat )
m.full <- lm( SWB ~ ND, data=dat )
m.naive
stargazer( m.naive, m.full,
type = "html", digits=2,
dep.var.caption = "DV: Subjective Well-Being",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses")
DV: Subjective Well-Being | ||
SWB | ||
(1) | (2) | |
ND | 0.52*** | -0.05 |
(0.20) | (0.19) | |
PSS | 0.32*** | |
(0.03) | ||
Constant | 20.99*** | -2.35 |
(1.19) | (2.59) | |
Observations | 389 | 389 |
Adjusted R2 | 0.02 | 0.21 |
Standard errors in parentheses | p<0.1; p<0.05; p<0.01 |
Results from above chunk:
<- -0.05 # replace with the appropriate value from the table above
B1 <- 0.52 # replace with the appropriate value from the table above
b1 <- b1 - B1
bias bias
## [1] 0.57
#= 0.57
Run the auxiliary regression to estimate \(\alpha_1\).
Calculate the bias using the omitted variable bias equation and show your math. You can check your results by comparing your answer to bias calculation from Variables.
#Html viz problems
# Eval = F /T to knit (html/text)
<- lm( PSS ~ ND, data=dat )
m.auxiliary
stargazer( m.auxiliary,
type = "html", digits=2,
dep.var.caption = "DV: PSS",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses")
DV: PSS | |
PSS | |
ND | 1.81*** |
(0.28) | |
Constant | 73.88*** |
(1.69) | |
Observations | 389 |
Adjusted R2 | 0.10 |
Standard errors in parentheses | p<0.1; p<0.05; p<0.01 |
summary(m.auxiliary)
Call: lm(formula = PSS ~ ND, data = dat)
Residuals: Min 1Q Median 3Q Max -32.322 -4.765 1.420 6.235 14.678
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 73.8780 1.6928 43.642 < 2e-16 ND 1.8146
0.2802 6.476 2.86e-10 — Signif. codes: 0 ‘’
0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 7.98 on 387 degrees of freedom Multiple R-squared: 0.09777, Adjusted R-squared: 0.09544 F-statistic: 41.94 on 1 and 387 DF, p-value: 2.862e-10
#x_2=a_0+a_1X+E_2
# (b1-B1) / B1
<- 1.81
a1 <- 0.32
B2 <- a1*B2
bias bias
[1] 0.5792
How good is the naive estimation of β1, the impact of network diversity on our happiness, in this case?
#bias / B1 # rough measure of the magnitude of bias
52 - (-0.05))/ (-0.05) (.
## [1] -11.4
#= -11.4
#This is very far off from our previous estimates and our true model.
Possible Std error increase (as well as network diversity loosing significance) possibly due to original estimatino of slope being too large.
In the previous lecture we saw how the Class Size model lost significance when SES was added as a result of an increase in the standard errors. In this model Network Diversity also loses significance. Explain why.
How does our need for approval of others (contingent self-esteem) impact our happiness (SWB)?
What happens to our inferences if we estimate the impact of CSE on happiness without accounting for baseline self-esteem (RSE)?
To examine this we will regress SWB onto RSE, SWB onto CSE.
SWB=b0+b1⋅CSE+e1 SWB=b0+b2⋅RSE+e2
We can then compare these two bivariate regressions to the results in the full model:
SWB=β0+β1⋅CSE+β2⋅RSE+ϵ
#Hdoes function in the console despite html viz
.01 <- lm( SWB ~ CSE, data=dat )
m.02 <- lm( SWB ~ RSE, data=dat )
m.03 <- lm( SWB ~ CSE + RSE, data=dat )
m
stargazer( m.01, m.02, m.03,
type = "text", digits=2,
dep.var.caption = "DV: Subjective Well-Being",
omit.stat = c("rsq","f","ser"),
notes.label = "Standard errors in parentheses")
##
## ============================================================
## DV: Subjective Well-Being
## -----------------------------
## SWB
## (1) (2) (3)
## ------------------------------------------------------------
## CSE -0.11*** 0.09***
## (0.03) (0.03)
##
## RSE 0.55*** 0.61***
## (0.04) (0.04)
##
## Constant 29.29*** 1.62 -4.82*
## (1.65) (1.54) (2.68)
##
## ------------------------------------------------------------
## Observations 389 389 389
## Adjusted R2 0.02 0.36 0.37
## ============================================================
## Standard errors in parentheses *p<0.1; **p<0.05; ***p<0.01
#### Analysis
The slope of CSE flipped from negative to positive, and it is highly statistically significant in both cases, but now with the added variable, our slope has changed meaning that it is correlated with the variableand that our standard error is now larger as well.
The control variable caused the slope estimate for Network Diversity to shift to the left toward the null hypothesis (slope=0, no impact) and as a result it lost statistical significance.
Our slope was corrected downward, The standard error typically becomes larger and did here, as we saw our model become less accurate #with less significance.
What happens in this case? Why did that happen? Drawing the coefficient plot might help.
\(bias = \alpha_1 \cdot B_2\)
\(\alpha_1\) | \(B_2\) | Sign of Bias |
---|---|---|
(+) | (+) | (+) |
(-) | (-) | (+) |
(+) | (-) | (-) |
(-) | (+) | (-) |
\(b_1 = B_1 + bias\)
Therefore:
If bias (+) then \(b_1 > B_1\)
If bias (-) then \(b_1 < B_1\)