Econometrics Midterm 1 Group 1

Published on June 2016 | Categories: Documents | Downloads: 83 | Comments: 0 | Views: 277
of 9
Download PDF   Embed   Report

Solutions for Midterm 1 for Econometrics Midterm

Comments

Content

Midterm 1
ECO 231 - Undergraduate Econometrics
Prof. Carolina Caetano

1

Material Question 1

Different states have different alcohol laws. For example, some states may require liquor
stores to close at 9pm, while other states may have different time requirements. Suppose
that we are concerned with the effects of drinking on car accidents. More specifically we
would like to ask the following scientific question: what is the causal effect of reducing
liquor stores’ closing time on each locality by one hour on the number of accidents per
month in that locality. So you don’t have to write so much, let’s call the variable “stores
closing time in each locality” as time, and “number of accidents per month in that locality”
as accidents.
(a) What is the point of this research question? In other words, who would be concerned
with this, and why?
Answer: This research question would be of interest to policymakers concerned with
the prevalence of alcohol-related car accidents, particularly since restricting or easing
restrictions on liquor store hours may prove to be a more cost-effective method of
reducing alcohol-related car accidents than other common measures, like alcohol educational programs or increased police enforcement. The policymaker would need to
have an estimate of how effective such a measure could be - In other words, he would
be concerned with knowing what is the causal effect of reducing liquor stores’ closing
time on car accidents.
(b) If you were looking for an observational data set to answer this question, what would
it need to have?
Answer: An observational dataset would have to contain:
1. The treatment variable, stores closing time in each locality
2. The outcome variable, the number of accidents per month in that locality
3. A rich set of variables to use as controls. For example, the average income in the
locality, the average educational level, the average road density, etc.
4. A large number of observations.

1

(c) If my data set has the localities’ per capita income (income), years of education (educ),
and proportion of married households (married), is the average number of cars per
household (cars) a confounder? Why?
Answer: For the average number of cars per household (cars) to be a confounder it
must satisfy three conditions:
(a) It must be associated with the treatment - It is not clear, it may be that there
is an association between cars and time. Namely, localities with more cars per
household may tend to have different store closing times, or different liquor laws.
(b) it must be associated with the outcome variable - Yes, we would assume the there
is strong association between cars and accidents., though it is not clear in what
way, as more cars may increase the likelihood of accidents, but are also likely to
lead to a reduction of driving speed due to congestion.
(c) It must not be redundant - A variable is redundant if it is predicted by the controls.
Since it is likely that average educational level, per capita income, and average
proportion of married households jointly predict the average number of vehicles
per household, the variable cars is probably redundant.
Therefore, if the controls (income, educ, married) are included, the variable cars does
not satisfy condition (c), and is not a confounder. If income, educ, and married are
not included in the regression then the variable cars probably would satisfy (c), but
still may not satisfy (a). It is not clear if cars is a confounder in this case.
(d) Suppose that the data set yielded the graph of averages in the following page). Trace
the regression line of accidents on time. (Don’t do this in the graph below. There
is one just like it in the space provided for the answer to this question.) Should a
regression line be used to describe this data? Explain your answer.
accidents

0

time

2

Answer: Examining the graph of averages, we can observe a strong negative correlation between accidents and time. Since the relationship between the treatment and
the outcome variables appears to be linear, we can use a regression line to describe the
data.
accidents

0

time

3

(e) Suppose that you estimated the regression line of the previous item, and it is accidents =
120 − 10 · time. Comment on the following sentence: if the government forces liquor
stores to close one hour early, there will be 10 less accidents per month per locality.
Answer: The regression line measures changes in expected value, so a more correct
phrasing would be: if the government forces liquor stores to close one hour early, then
the expected number of accidents per month per locality would decrease by 10.
However, even the correct sentence would likely still be wrong, and for a different reason: Correlation does not imply causation. Changes in the number of car accidents per
month may not be due only to liquor stores closing times. There are likely other confounders that are correlated with liquor stores closing times that may affect a change
in accidents. For example, liquor stores may close later in urban areas than in rural
areas, while car accidents might be likelier to happen on rural roads than on urban
ones (due to worse road conditions, bad lighting, less traffic congestion, etc.).
(f) Express the partialling-out formula of the slope coefficient of the regression in item(e)
in terms of the variables in the model. Interpret it.
Answer: The general partialling-out formula is given by
Pn
r1i yi
b1 = Pi=1
n
2
i=1 r1i

where r1i is the residual of the regression of x1 onto the other controls. More formally,
r1i = x1i − x
ˆ1i
where
x
ˆ1i = dˆ1 + dˆ2 x2i + .. + dˆk xki
and dˆ1 , dˆ2 , ..., dˆ3 are the estimated OLS coefficients of regressing x1i on x2i , x3i ...xki .
In the regression given in part (e), no other controls are included apart from sleep, so
the predicted value of sleepi is given by
ˆ = dˆ1 = sleep
¯
sleep
i
Why? Since without any controls, our best prediction for sleep would simply be the
average. Therefore the residual r1i is simply the deviation of sleepi from the mean,
¯
r1i = sleepi − sleep

4

plugging the residual back into the formula for b1 , we get the following expression:
¯

Pn

b1 =

i=1 (timei − time) · accidentsi
Pn
¯ 2
i=1 (timei − time)

which is the univariate formula for b1 .
(g) If the model is
accidents = β0 + β1 time + β2 income + β3 educe + β4 married + u
where E[u|time, income, educe, married] = 0. What is this model saying about the
world?
Answer: We can take the expectation of the model conditional on time, income,
educe and married,
E [accidents|time, income, educe, married] =
= E[β0 + β1 time + β2 income + β3 educe + β4 married + u|time, income, educe, married]
From the rules of conditional expectation, we can rewrite our condition:
E [accidents|time, income, educe, married] =
= β0 + β1 time + β2 income + β3 educe + β4 married + E[u|time, income, educe, married]
From the model we know that E[u|time, income, educe, married] = 0, so we arrive at
E [accidents|time, income, educe, married] =
= β0 + β1 time + β2 income + β3 educe + β4 married
Namely, the model says that the conditional expectation of accidents (conditional on
time, income, educe and married) is a linear function of these controls. In other
words, if we were given for a specific locality information on the liquor stores’ closing
time, the average per capita income, the average years of education and the proportion
of married households, we would be able to predict the number of car accidents per
month in that locality by using a linear combination of the data given to us.
(h) Interpret β0 and β1 in this model.
Answer: The coefficient β0 , the intercept, is the expected value of accidents when all
controls receive a value of zero. More explicitly, it is the expected number of accidents

5

per month in a locality that has liquor stores that never open, zero per capita income,
zero average educational level, and zero married households. The absurdity of its
definition means we are not particularly interested in the value the intercept receives,
apart from identifying serious problems in our model.
The coefficient β1 measures how much we expect accidents to vary when we increase
time by 1 unit and leave everything else constant. That is to say, we would expect
β1 more (or less, if β1 is negative) car accidents per month to occur in a locality if
we were to delay the closing time of liquor stores in that locality by one hour, barring
any change in average per capita income, average education, proportion of married
households or any other variable predicted by these controls.
(i) If the model is true, is the OLS regression a good method for discovering the value of
the coefficients? Why?
Answer: We know the OLS is well suited for estimating linear models. We have
also established in (g) that if our model is true then the conditional expectation of
accidents , given our controls, is linear. Therefore estimating (1) using OLS is valid
and a generally good method.

2

Paper Question

This question refers to this year’s paper.
(a) Describe the ideal experiment to answer the paper’s question. Is it feasible? Why?
Answer: Take three random groups of women. Force two groups to have one more
year of education and one of those two groups to also have one more week of work
experience. Take three random groups of men and give the same treaments. All these
must be carried out without the individuals in the groups knowing that they are (or
are not) receiving the extra education and/or experience. Compute the difference in
the average wage between the group that also gets the experience and the group that
only got the extra education for women and men separately. Taking the difference of
these differences gives us the answer to our question.
(b) How does the author justify his choice of data set? Do you agree with his decision?
Why?
Answer: The author justifies his choice of data set by highlighting that the data set
has the treatment variables (gender and measures of human capital including education and experience), the outcome variable (earnings represented by the natural log of
a respondent’s hourly wage), a rich set of variables that can be used as controls (demographic information like race, marital status, geography). The author also notes that
6

the data set focuses on a cohort of men and women that is likely to have experienced
smaller earnings differences due to gender.
(c) What are the dependent and causal variables in this paper? How are they measured?
Answer: The dependent variable is log(y), the natural log of the respondent’s hourly
wage. The causal variable is ed ∗ exp, years of education multiplied by weeks of experience.
(d) What is the paper conclusion? If the conclusion is correct, should society change to
make things right? How?
Answer: The paper’s conclusion is that there are significant gender differences in how
education affects the growth of earnings on the job for labor market participants. The
author suggests that this might be due to labor market discrimination manifested as
a restriction on the occupational choice set of women.
Society should work to enact policies that encourage women to enter occupations that
have better on-the-job training and the opportunity for greater wage growth.
(e) Explain the model implied by the equation in page 459:
log(y) = β0 + β1 ed + β2 exp + β3 exp2 + β4 ed ∗ exp + β5 hpw + β6 married + β7 south
+β8 city + β9 λ + µ
You must state the model in full. Observe that usually the papers don’t really write
down the model explicitly. How does this model restrict reality?
Answer: The model in full is:

log (y) = β0 + β1 ed + β2 exp + β3 exp2 + β4 ed ∗ exp + β5 hpw +β6 married + β7 south
+ β8 city + β9 λ + µ
and E[µ|ed, exp, exp2 , ed ∗ exp, hpw, married, south, city, λ] = 0.
The model says that the expected logarithm of hourly wages is a linear equation of years
of education, experience, experience squared, education multiplied by experience, hours
worked per week, marital status, whether a respondent works in the south, whether
the respondent works in a city, and the respondent’s lambda, for a given value of all
these covariates. The model restricts reality in its assumption that expected earnings
is linearly related to the regressors and that the things we do not observe are expected
to be the same for all values of the regressors.
7

(f) Explain the exact meaning of the second row on table 2 (the one that begins with
“Education”). See Table 2 reproduced in the next page.
Answer: Ceteris paribus, one more year of education increases the natural log of
hourly wage for black women by 0.057, for white women by 0.067, for black men by
0.033, and for white men by 0.032.

8

9

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close