Docshare

Published on April 2017 | Categories: Documents | Downloads: 55 | Comments: 0 | Views: 5081
of 146
Download PDF   Embed   Report

Comments

Content

THE FUNDAMENTALS OF

Political Science
Re s earch
Paul M. Kellstedt
Texas A&M University

Guy D. Whitten
Texas A&M University

· !



:::

CAMBRIDGE
UNIVEltSITY l'RESS

CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo, Delhi
Cambridge University Press
32 A\"enue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9780521697880

Dedicated to
Lyman A. Kellstedt, Charmaine C. Kellstedt,
David G. Whitten, and Jo Wright-Whitten,
the best teachers we ever had

© Paul ~1. Kellstedt and Guy D. Whitten 2009

-PMKandGDW
This publication is in copyright. Subjeet to statutory exception
and to the provisions of re!evant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
Firsr published 2009
Prinred in rhe United Sta tes of America
A catalog record for tbis publication is available from tbe Britisb Library

Librar)' of Congress Cataloging in Publication data

Kellstedr, Paul M., 1968The fundamentals of political science research I Paul M. Kellstedt, Guy D. Whitten.
p. cm.
Ineludes bibliographical references and indexo
ISBN 978-0-521-87517-2 (hardbaek) - ISBN 978-0-521-69788-0 (pbk.)
1. Political science - Researeh. l. Whitten, Guy D., 1965- 11. Title.
JA86.K45 2009
2008035250
320.072 - de22
ISBN 978-0-521-87517-2 hardback
ISBN 978-0-521-69788-0 paperback
Cambridge University Press has no responsibility for the persistence or
aeeuracy of URLs for external or third-party Internet Web sites referred to in
this publicarion and does not guarantee that any content on such Web sites is,
or will remain, aeeurate or appropriate. Information regarding prices, trave!
timetables, and other factual information given in this work are correet at
the time of /irst printing, but Cambridge University Press does not guarantee
the accuracy of such information thereaner.

'c

Contents

Figures
Tables
Acknowledgments

page xiii
xv
xvii
1

. . . The Scientific Study of Politics
Overview
1.1 Polítical Science?
1.2 Approaching Politics Scientifically: The Search for Causal
Explanations
1.3 Thinking about the World in Terms of Variables and Causal
Explanations
1.4 Models of Polítics
1.5 Rules of the Road to Scientific Knowledge about Politics
1.5.1 Make Your Theoriés Causal
1.5.2 Don't Let Data Alotle Drive Your Theories
1.5.3 Consider Only Empirical Evidence
1.5.4 Avoid Normative Statements
1.5.5 Pursue Both Generality and Parsimony
1.6 A Quick Look Ahead
Concepts Introduced in This Chapter
Exercises

7
14
15
15
16
17
17
18
18
19
20

_

22

The Art of Theory Building
Overview
2.1 Good Theories Come from Good Theory-Building Strategies
2.2 Identifying Interesting Variation
2.2.1 Time-Series Example
2.2.2 Cross-$ectional Example
2.3 Learning to Use Your Knowledge
2.3.1 Moving from a Specific Event to More General
Theories

vii

1
1
3

22
22
23
24
25
26
26

viii

Contents

Know Local, Think Global: Can You Drop the
Proper Nouns?
2.4 Examine Previous Research
2,4.1 What Did the Previous Researchers Miss?
2,4.2 Can Their Theory Be Applied EIsewhere?
2.4.3 If We Believe Their Findings, Are There Further
Implications?
2,4,4 How Might This Theory Work at Different Levels of
Aggregation (Micro<==>Macro)?
2.5 Think FormaIly about the Causes That Lead to Variation in
Your Dependent Variable
2.5.1 Utility and Expected Utility
2.5.2 The Puzzle ofTumout
2.6 Think about the Institutions: The Rules UsuaIly Matter
2.6.1 Legislative Rules
2.6.2 The Rules Matter!
2.7 Extensions
2.8 How Do 1 Know If 1 Have a "Good" Theory?
2.8.1 Is Your Theory Causal?
2.8.2 Can You Test Your Theory on Data That You Have
Not Yet Observed?
2.8.3 How General Is Your Theory?
2.8,4 How Parsimonious Is Your Theory?
2.8.5 How New Is Your Theory?
2.8.6 How Nonobvious 15 Your Theory?
2.9 Conclusion
Concepts lntroduced in This Chapter
Exercises

ix

2.3.2

Evaluating Causal Relationships
••............
Overview
3.1 Causality and Everyday Language
3.2 Four Hurdles along the Route to Establishing Causal
Relationships
3.2.1 Putting It AH Together - Adding Up the Answers to
Our Four Questions
3.2.2 Identifying Causal Claims Is an Essential
Thinking SkilI
3.2.3 What Are the Consequenees of Failing to Control
for Other Possible Causes?
3.3 Why Is Studying Causality So Important? Three Examples
from Political Scienee
3.3.1 Life Satisfaction and Democratic Stability
3.3.2 School Choice and Student Achievement
3.3.3 Electoral Systems and the Number of Political Parties

27
28
29
29
30

Contents

3,4 Why Is Studying Causality So Important? Three Examples
from Everyday Life
3,4.1 Alcohol Consumption and Income
3.4.2 Treatment Choice and Breast Caneer Survival
3,4.3 Explicit Lyrics and Teen Sexual Behavior
3.5 Wrapping Up
Concepts Introduced in This Chapter
Exercises

61
61
62
63
65
65
65

30
31
32
34
36
36
38
39
40
40
41
41
41
41
42
42
43
43
45
45
45
48
50
50
53
54
54
55
57

. . . Research Design
Overview
4.1 Comparison as the Key to Establishing Causal Relationships
4.2 Experimental Research Designs
4.2.1 "Random Assignment" versus "Random Sampling"
4.2.2 Are There Drawbacks to Experimental Researeh
Designs?
4.3 Observational Studies (in Two Flavors)
4.3.1 Datum, Data, Data Set
4.3.2 Cross-Sectional Observational Studies
4.3.3 Time-Series Observational Studies
4.3,4 The Major Difficulty with Observational Studies
4,4 Summary
Concepts Introduced in This Chapter
Exercises

IIIIIDI ~easuren1ent
Overview
5.1 Why Measurement Matters
5.2 Social Science Measurement: The Varying ChaIlenges of
Quantifying Humanity
5.3 Problems in Measuring Concepts of Interest
5.3.1 Conceptual Clarity
5.3.2 Reliability
5.3.3 Measurement Bias and Reliability
5.3,4 Validity
5.3.5 The Relationship between Validity and Reliability
5,4 Controversy 1: Measuring Democracy
5.5 Controversy 2: Measuring Political Tolerance
5.6 Are There Consequences to Poor Measurement?
5.7 Conclusions
Concepts Introduced in This Chapter
Exercises

IIIIIDI Descriptive Statistics and Graphs
Overview
6.1 Know Your Data

67

67
67
68
74

74
77
79
81
82
83
83
84
84

86
86
86

S8
91
91
92
93
94
95
96
99

101
101
102

102
104

104
104

Contents

xi

6.2 What Is the Variable's Measurement Metric?
6.2.1 Categorical Variables
6.2.2 Ordinal Variables
6.2.3 Continuous Variables
6.2.4 Variable Types and Statistical Analyses
6.3 Describing Categorical Variables
6.4 Describing Continuous Variables
6.4.1 Rank Statistics
6.4.2 Moments
6.5 Limitations
Concepts Introduced in This Chapter
Exercises

~ Statistical Inference

..................

Overview
7.1 Populations and~amples
7.2 Learning about thePopulation from a Sample: The Central
Limit Theorem
7.2.1 The Normal Distribution
7.3 Example: Presidential Approval Ratings
7.3.1 What Kind of Sample Was That?
7.3.2 A Note on the Effects of Sample Size
7.4 A Look Ahead: Examining Relationships between Variables
Concepts Introduced in This Chapter
Exercises
I

;

_

...........

Bivariate Hypothesis Testing
Overview
8.1 Bivariate Hypothesis Tests and Establishing Causal
Relationships
8.2 Choosing the Right Bivariate Hypothesis Test
8.3 AH Roads Lead to p
8.3.1 The Logic of p-Values
8.3.2 The Limitations of p-Values
8.3.3 From p-Values to Statistical Significance
8.3.4 The NuH Hypothesis and p-Values
8.4 Three Bivariate Hypothesis Tests
8.4.1 Example 1: Tabular Analysis
8.4.2 Example 2: Difference of Means
8.4.3 Example 3: Correlation Coefficient
8.5 Wrapping Up
Concepts Introduced in This Chapter
Exercises

. . Bivariate Regression Models
Overview
9.1 Two:Variable Regression
9.2 Fitting a Line: Population <=> Sample

105
106.
106
107
108
109

110
111
114
118

118
118
120
120
120
122
122
128
129

130
131
132
132
134
134
134
135
136

136
137

138
138
139
139

145
150
155
156
157
159
159
159

160

Contents

9.3 Which Line Fits Best? Estimating the Regression Line
9.4 Measuring Our Uncertainty about the OLS Regression Line
9.4.1 Goodness-of-Fit: Root Mean-Squared Error
9.4.2 Goodness-of-Fit: R-Squared Statistic
9.4.3 Is That a "Good" Goodness-of-Fit?
9.4.4 Uncertainty about Individual Components of the
Sample Regression Model
9.4.5 Confidence Intervals about Parameter Estimates
9.4.6 Hypothesis Testing: Overview
9.4.7 Two-Tailed Hypothesis Tests
9.4.8 The Relationship between Confidence Intervals and
Two-Tailed Hypothesis Tests
9.4.9 One-Tailed Hypothesis Tests
9.5 Assumptions, More Assumptions, and Minimal Mathematical
.
Requirements
9.5.1 Assumptions about the Population Stochastic
Component
9.5.2 Assumptions about Our Model Specification
9.5.3 Minimal Mathematical Requirements
9.5.4 How Can We Make AH ofThese Assumptions?
Concepts Introduced in This Chapter
Exercises

.:r!I Multiple Regression Models 1: The Basics •
Overview
10.1 Modeling Multivariate Reality
10.2 The Population Regression Function
10.3 From Two-Variable to Multiple Regression
10.4 What Happens When We Fail to Control for Z?
10.4.1 An Additional Minimal Mathematical Requirement
in Multiple Regression
10.5 Interpreting Multiple Regression
10.6 Which Effect Is "Biggest"?
10.7 Statistical and Substantive Significance
10.8 Implications
Concepts Introduced in This Chapter
Exercises

lID Multiple Regression Models 11: Crucial Extensions •
Overview
11.1 Extensions of OLS
11.2 Being Smart with Dummy Independent Variables in OLS
11.2.1 Using Dummy Variables to Test Hypotheses about a
Categorical Independent Variable with Only Two
Values
11.2.2 Using Dummy Variables to Test Hypotheses about a
Categorical Independent Variable with More Than
Two Values

162
165
167
167
169
169
171
172
173
175
175
177
177
180
181
181
182
182
183
183
183

184
184
188

192
193
196
198
199
200
200
202
202
202
203

203

207

xii

Contents

11.3 Testing Interactive Hypotheses with Dummy Variables
11.4 Dummy Dependent Variables
11.4.1 The Linear Probability Model
11.4.2 Binomial Logit and Binomial Probit
11.4.3 Goodness-of-Fit with Dummy Dependent Variables
11.5 Outliers and Influential Cases in OLS
11.5.1 Identifying Influential Cases
11.5.2 Dealing with Influential Cases
11.6 Multicollinearity
11.6.1 How Does Multicollinearity Happen?
11.6.2 Detecting Multicollinearity
11.6.3 Multicollinearity: A Simulated Example
11.6.4 Multicollinearity: A Real-World Example
11.6.5 Multicollinearity: What Should 1 Do?
11.7 Being Careful with Time Series .
11.7.1 Time-Series Notation
11.7.2 Memory and Lags in Time-Series Analysis
11.7.3 Trends and the Spurious Regression Problem
11.7.4 The Differenced Dependent Variable
11.7.5 The Lagged Dependent Variable
11.8 Wrapping Up
Concepts Introduced in This Chapter

210
212
212
215
219
220
221
224
225
226
227
228
230
232
233
233
234
236
239
241
242
243

Figures
,
1

!

1

i
1.1
1.2
1.3
1.4
1.5
1.6

..:EI Multiple Regression Models IIl: Applications .

. . . . . . . . 244
Overview
244
244
12.1 Why Controlling for Z Matters
245
12.2 Example 1: The Economy and Presidential Popularity
12.3 Example 2: Politics, Economics, and Public Support for
Democracy
248
12.4 Example 3: Competing Theories of How Politics Affects
251
International Trade
12.5 ConcIusions
253
Concepts Introduced in This Chapter
254
Exercises
254

Appendix A. Critica! Values OfX2 ••

................

255

Appendix B. Critical Values of t • . • . • . • . . . . . . . . . . . . 256

2.1
2.2
2.3

I

II
I

I
II

Appcndix C. The A Link Function for B~ Models •

257

Appendix D. The 4» Link Function for BN? Models .

259



Bibliography
Index

261

I

265

2.4
3.1
3.2
3.3
3.4
4.1
5.1
5.2
6.1
6.2
6.3
6.4
6.5

xiü

The road to scientific knowledgc
page 4
From theory to hypothesis
9
What would you expect to see based on the theory of
economic voting?
10
What would you expect to see based on the theory of
economic voting? Two hypothetical cases
12
What would you expect to see based on the theory of
economic voting?
12
What would you expect to see based on the theory of
economic voting? Two hypothetical cases
13
Presidential approval, 1995-2005
24
Military spending in 2005
25
Gross U.S. government debt as a percentage of GDP,
1960-2004
42
Women as a percentage of members of parliament, 2004
43
The path to evaluating a ca!lsal relationship
51
Theoretical causes of the number of parties in legisla tu res
57
Nazi vote and the number of parties winning seats in Weimar
Republic elections, 1919-1933
59
Number of parties winning seats in German Bundestag
elections, 1949-2002
60
The possibly confounding effects of a healthy lifestyle on the
aspirin-blood-pressure relationship
71
Reliability, validity, and hypothesis testing
96
Polity IV score for Pakistan
98
Pie graph of religious identification, NES 2004
110
Bar graph of religious identification, NES 2004
110
Example output from Stata's "summarize" command with
"detail" option
111
Box-whisker plot of incumbent-party presidential vote
percentage, 1880-2004
114
Histogram of incumbent-party presidential vote percentage,
1880-2004
116

·1

1

i

iv

Figures

6.6

Histograms of incumbent-party presidential vote percentage,
1880-2004, depicted with 2 and then 10 blocks
6.7 Kernel density plot of incumbent-party presidential vote
percentage, 1880-2004
7.1 The normal probability distribution
7.2 The 68-95-99 rule
7.3 Frequency distribution of 600 rolls of a die
8.1 B~x-~hisker plot of Government Duration for majority and
mmonty governments
8.2 Kernel density plot of Government Duration for majority and
mmonty governments
8.3 Scatter plot of change in GDP and incumbent-party vote share
8.4 Scatter plot of change in GDP and incumbent-party vote share
with mean-delimited quadrants
8.5 What is wrong with this table?
9.1 Scatter plot of change in GDP and incumbent-party vote share
9.2 Three possible Iines
9.3 OLS regression line through scatter plot with mean-delimited
quadrants
9.4 Stata results for two-variable regression model of
VOTE = ex + ~ x GROWTH
9.5 Venn diagram of variance and covariance for X and Y
10.1 Venn diagram in which X, Y, and Z arecorrelated
10.2 Venn diagram in which X and Y are correlated with Z, but
not with each other
11.1 Stata output when we include both gender dummy variables in
our mode!
11.2 Regression lines from the interactive mode!
11.3 Regression lines from the interactive model
11.4 Three different models of Bush vote
11.5 Stata Ivr2plot for the model presented in Tab1e 11.8
11.6 OLS line with scatter plot for Florida 2000
11.7 Venn diagram with multicollinearity
11.8 The growth of golf and the decline of the American family,
1947-2002
11.9 Thc growth of the U.S. economy and the decline of the family,
1947-2002
11.10 First differences of the number of golf courses and percentage
of married families, 1947-2002
12.1 A simple causal model of the relationship between the
economy and presidential popularity
12.2 A revised mode! of presidential popularity

j

117

~~

&
~.¡.

117
123
124
124

Tables

147
147
151
152
156
162
163
165
166
168
191
192
204
207
212
218
222
223
226

4.1
4.2
6.1
6.2
6.3
8.1
8.2
8.3
8.4

8.5
8.6

237

8.7
8.8
8.9
8.10

238

8.11

240

8.12

245
246

9.1
10.1
10.2
11.1
11.2
xv

page 80
ExampIe of cross-sectional data
81
ExampIe of time-series data
109
Frequency tabIe for religious identification in the 2004 NES
113
Values of incumbent vote ranked from smallest to largest
119
Median incomes of the 50 states, 2004-2005
135
Variable types and appropriate bivariate hypothesis tests
Union households and vote in the 2004 U.S. presidential
140
election
Gender and vote in the 2004 U.S. presidential election:
141
Hypothetical scenario
Gender and vote in the 2004 U.S. presidential election:
Expectations for hypothetical scenario if there were no
142
relationship
142
Gender and vote in the 2004 U.S. presidential e1ection
Gender and vote in the 2004 U.S. presidential e1ection:
Calculating the expected cel1 values if gender and presidential
142
vote are unrelated
142
Gender and vote in the 2004 U.S. presidential e1ection
143
Gender and vote in the 2004 U.S. presidential election
149
Government type and government duration
Contributions of individual e1ection years to the covariance
153
calculation
Covariance table for economic growth and incumbent-party
154
presidential vote, 1880-2004
Incumbent reelection rates in U.S. congressional e!ections,
157
1964-2006
164
Measures of total residuals for three different lines
193
Three rc:gression models of U.S. presidential e!ections
Bias in /31 when the true population mode! is
201
Y; = ex + ~1 Xi + /32Z; + u¡ but we leave out Z
Two models of the effects oI gender and income on Hillary
205
Clinton Thermometer seo res
208
Religious Identification in the 1996 NES

xvi

Tables

11.3
11.4
11.5
11.6
11.7
11.8
11.9
11.10
11.11
11.12
11.13
11.14
11.15
12.1

12.2
12.3

The same model of religion and income on Hillary Clinton
Thermometer scores with different reference categories
The effects of gender and feelings toward the women's
movement on Hillary Clinton T)lermometer scores
The effects of partisanship and performance evaluations on
votes for Bush in 2004
The effects of partisanship and performance evaluations on
votes for Bush in 2004: Three different types of models
Classification table from LPM of the effects of partisanship
and performance evaluations on votes for Bush in 2004
Votes for Gore and Buchanan in Florida counties in the 2000
U.S. presidential election
The five largest (absolute-value) DFBETA scores for /3 from
the model presented in Table 11.8
Votes for Gore and Buchanan in Florida counties in the 2000
U.S. presidential election
Random draws of increasing size from a population with
substantial multicollinearity
Pairwise correlations between inpependent variables
Model results from random draws of increasing size from the
2004 NES
Golf and the decline of the family, 1947-2002
GDP and the decline of the family, 1947-2002
Excerpts from the table of MacKuen, Erikson, and Stimson on
the relationship between the economy and presidential
popularity
Excerpts from the table of Evans and Whitefield on the
relationship between the economy and support for democracy
Excerpts from the table of Morrow, Siverson, and Tabares on
the polítical causes of international trade

209
211
213

Acknowledgments

217
219
222
224
225
230
231
232
238
239

246
250
252

An inevitable part of the production of a book like this is an accumulation of
massive intellectual debts. We have been overwhelmed by both the quality
and quantity of help that we have received from our professional (and even
personal) contacts as we have gone through every stage of this project.
This book arose out of more than 20 years of combined teaching
experience at Brown University, the University of California, Los Angeles,
the University of Essex, the University of Minnesota, and Texas A&M
University. We tried out most of the examples in this book on numerous
classes of students before we refined them into their present state. We thus
owe a debt to every student who raised his or her hand or showed us a
furrowed brow as we worked our way through these attempts to explain
the complicated processes of scientifically studying politics.
More immediately, this project carne out of separate and skeptical
conversations that each author had with Ed Parsons during his visit to Texas
A&M in the spring of 2006. Without Ed's perfect balance of candor and
encouragement, this book would not have been started. At every stage in
the process he has helped us immensely. He obtained three sets of superbly
helpful reviews and seemed always to know the right times to be in and out
of touch as we worked our way through them. It has been a tremendous
pleasure to work with Ed on the book.
Throughout the process of writing this book, we got a steady stream
of support, understanding, and patience from Christine, Deb, Abigail, and
Elizabeth. We thank them for putting up with our crazy hours and for
helping us to keep things in perspective as we worked on this project.
For both authors, the lines separating family, friends, and professional
colleagues are pretty blurry. We relied on our combined networks quite
heavily at every stage in the production of this book. EarIy in the process
of putting the manuscript together, we received sage advice from Jeff Gill
about textbook writing for social scientists and how to handle earIy versions of our chapters. Our fathers, Lyman A. "Bud" Kellstedt and David
---a,

1
1

"viii

Acknowledgments

G. Whitten, provided their own unique and valuable perspectives on early
drafts of the book. In separa te but related ongoing conversations, john
Transue and Alan M. Brookhart engaged us in lengthy debates about
thc nature of experiments, quasi-experiments, and observational studies.
Other colleagues and friends provided input that also improved this book,
including Harold Clarke, Geoffrey Evans, john jackson, Marisa Kellam,
Eric Lawrence, Christine Lipsmeyer, Evan Parker-Stephen, David Peterson,
james Rogers, Randy Stevenson, Georg Vanberg, Rilla Whitten, andjenifer
Whitten-Woodring.
Despite all of this help, we remain solely responsible for any deficiencies
that persist in the book. We look forward to hearing about them from you
so that we can make future editions of this book better.
Throughout the process of writing this book, we have been mindful
of how our thinking has been shaped by our teachers at a variety of levels.
We are indebted to them in ways that are difficult to express. In particular,
Guy Whitten thanks the following, aH from his days at the University of
Rochester: Larry M. Bartels, Richard Niemi, G. Bingham Powell, Lynda
Powell, William H. Riker, and David Weimer. Paul Kellstedt thanks Al
Reynolds and Bob Terbog of Calvin College; Michael Lewis-Beck, Vicki
Hesli, and jack Wright at the University of Iowa; and jim Stimson andJohn
Freeman at the University of Minnesota.
Although wehave learned much from the aforementioned professors,
we owe our large~ debt to our parents: Lyman A. "Bud" Kellstedt, Charmaine C. KeHstedt, David G. Whitten, and jo Wright-Whitten. We dedica te
this book to the four of them - the best teachers we ever hado

THE FUNDAMENTALS OF POLITICAL SCIENCE RESEARCH

The Scientific Study of Politics

OVERVIEW

,

Most political science students are interested in the substance of politics
and not in its methodology. We begin with a discussion of the goals of
this book and why a scientific approach to the study of politics is Inore
interesting and desirable than a "just-the-facts" approach. In this chapter
we provide an overview of what it means to' study politics scientifically.
We begin with an introduction to how we move from causal theories to
scientific knowledge, and a key part of this process is thinking about the
world in terms of models in which the concepts of interest become variables
that are causally linked together by theories. We then introduce the goals
and standards of political science research that will be our rules of the road
to keep in mind throughout this book. The chapter concludes with a brief
overview of the structure of this book.

Doubt is the beginning, not the end, of wisdom.
- Chinese proverb

. . POLITICAL SCIENCE?

"Which party do you support?" "When are you going to run for office?"
These are questions that students often hear afrer announcing that they
are taking courses in political science. Although rnany political scientists
are avid partisans, and sorne political scientists have even run for elected
offices or have advised elected officials, for the rnost part this is not the
focus of rnodern political science. Instead, political science is about the
scientific study of political phenomena. Perhaps like you, a great many of
today's political scientists were attracted to this discipline as undergraduates
beca use of intense interests in a particular issue oc can(iídate. Although we

2

The Scientific Study of Politics

are often drawn into political science based on political passions, the most
respected political science research today is conducted in a fashion that
makes it impossible to tell the personal political views of the writer.
Many people taking their first political science research course are
surprised to find out how much science and, in particular, how much math
are involved. We would like to encourage the students who find themselves
in this position to hang in there with us - even if your answer to this
encouragement is "but I'm only taking this c1ass because they require it to
graduate, and I'll never use any of this stuff again." Even if you never run a .
regression model after you graduate, having made your way through these
materials should help you in a number of important ways. We have this
written this book with the foIlowing three goals in mind:
o

o

o

Yo help you consume academic political science research in your other
courses. One of the signs that a field of research is becoming scientific
is the development of a common technicallanguage. We aim to make
the common technicallanguage of political science accessible to you.
Yo help you become a better consumer of information. In political
science and many other areas of scientific and popular cornrnunication,
c1aims about causal relationships are frequently made. We want you
to be better able to evaluate such c1aims critically.
Yo start you on the Toad to becoming a producer of scientific research
on politics. This is obviously the most ambitious of our goals. In our
teaching we often have found that once skeptical students get comfortable with the basic tools of political science, their skepticism turns into
curiosity and enthusiasm.

To see the value of this approach, consider an alternative way of learning about politics, one in which political science courses would focus on
"just the facrs" of politics. Under this alternative way, for example, a
course offered in 1995 on the politics of the European Union (EU) would
have taught students that there were 15 member nations who participated
in governing the EU through a particular set of institutional arrangements
that had a particular set of rules. An obvious problem with this alternative
way is that courses in which lists of facts are the only material would probably be pretty boring. An even bigger problem, though, is that the political
world is constantly changing. In 2008 the EU is made up of 27 member
nations and has sorne new governing institutions and rules that are different
from what they were in 1995. Students who took a facts-only course on the
EU back in 1995 would find themselves lost in trying to understand the EU
of 2008. By contrast, a theoretical approach to politics helps us to better
understand why changes have come about and their likely impact on EU
politics.

3

1.2 Approachlng PoUtics Scientifically

In this chapter we provide an overview of what it means to study
politics scientificaIly. We begin this discussion with an introduction to how
we move from causal theories to scientific knowledge. A key part of this
process is thinking a bout the world in terms of models in which the concepts
of interest become variables1 that are causally Iinked togerher by theories.
We then introduce the goals and standards of polítical science research that
will be our rules of the road to keep in mind throughout this book. We
conclude this chapter with a brief overview of the structure of this book.
APPROACmNG POLITICS SCIENTIFICALLY: THE SEARCH
FOR CAUSAL EXPLANATIONS

I've said, [ don't know whether it's addictive. ['m not a doctor. I'm not a
scientist.
- Bob Dole, in a conversarion with Katie Couric about tobacco during the
1996 U.S. presidential campaign

The question of "how do we know what we know" is, at its heart, a
philosophical question. Scientists are lumped into different disciplines that
develop standards for evaluating evidence. A core part of being a scientist
and taking a scientific approach to studying the phenomena that interest
you is always being willing to consider new evidence and, on the basis of
that new evidence, change what you thought you knew to be true. This
wiJIingness to always consider new evidence is counterba/anced by a stern
approach to the evaluation of new evidence that permeates the scientific
approach. This is certainly true of the way that political scientists approach
politics.
So what do political scientists do and what makes them scientists? A
basic answer to this question is that, like other scientists, political scientists
develop and test theories. A theory is a tentative conjecture about the
causes of sorne phenomenon of interest. Once a theory has been developed,
we can restate it into one or more testable hypotheses. A hypothesis is a
theory-based statement about a relationship that we expect to observe. For
every hypothesis there is a corresponding null hypothesis. A null hyporhesis
is also a theory-based statement but it is about what we would expect to
observe if our theory was incorrecto Hypothesis testing is a process in which
scientists evaluate systematicaIly coIlected evidence to make a judgement of
1

When we introduce an important new term in this book, that term appears in boldface
type. We discuss variables at great length later in this and other chapters. For now, a good
working definition is that a variable is something that varies. An example of a variable is
voter turnout; researchers usually measure it as the percentage of voting-e1igible persons
in a geographically defined area who cast a vote in a particular election.

4

The Scientific Study of Politics

whether the evidence favors their hypothesis
or favors the corresponding nuU hypothesis.
If a hypothesis survives a series of rigorous
tests, scientists start to gain confidence in that
Hypothesis
hypothesis rather than in the null hypothesis, and thus they also gain confidence in the
theory from which they generated their hyEmpirical test
pothesis.
Figure 1.1 presents a stylized schematic
view of the path from theories to hypotheses
to scientific knowledge. 2 At the top of the figEvaluation of hypothesis
ure, we begin with a causal theory to explain
our phenomenon of interest. We then derive
one or more hypotheses about what our theEvaluation of causal theory ory leads us to expect when we measure our
concepts of interest (which we caU variables as subsequently discussed) in the real world.
In the $ird step, we conduct empirical tests of
Scientific knowledge
our hypotheses. 3 From what we find, we evalFigure 1.1. The road to scienti- uate our hypotheses re1ative to corresponding
lic knowledge.
nuU hYfotheses. Next, from the results of our
hypothesis tests, we evaluate our caJlsal theory. In light of our evaluation
of our theory, we then think about how, if at all, we should revise what we
consider to be scientific knowledge concerning our phenomenon of interest.
A core part of the scientific pr9cess is skepticism. On hearing of a
new theory, other scientists wiII cha.Jlenge this theory and devise further
tests. Although this process can occasionally become quite combative, it is
a necessary component in the development of scientific knowledge. Indeéd,
a core component of scientific knowledge is that, as confident as we are
in a particular theory, we rema in open to the possibility that there is stiII
a test out there that wiII provide evicfence that makes us lose confidence in
that theory.
It is important to underscore here the nature of the testing that scientists
carry out. One way of explaining this is to say that scientists are not
like lawyers in the way that they approach evidence. Lawyers work for
a particular clicnt, advocate a part~cular point of view (like "guilt" or
"innocence"), and then accumulate .evidence with a goal of proving their
case to a judge or jury. This goal 0l proving a desired result determines
Causal theory

2 In practice, the development of scientific knowledge is frequently much messier than this

step-by-step diagram. We show moreoE the complexity of this approach in later chapters.
3

By "emnirical" we simply mean "based oQ. observations oE the real world."

5

1.2 Approaching Politics Scientifically

their approach to evidence. When faced with evidence that conflicts with
their case, lawyers attempt to ignore or discredit such evidence. When
faced with evidence that supports their case, lawyers try to emphasize the
applicability of the supportive evidence. In many ways, the scientific and
legal approaches to evidence couldn't be further aparto Scientific confidence
in a theory-is achieved only after hypotheses derived from that theory have
run a gantlet of tough tests. At the beginning of a tri al, lawyers develop
a strategy to prove their case. In contrast, at the beginning of a research
project, scientists will think long and hard about the most rigorous tests
that they can conducto A scientist's theory is never proven beca use scientists
are always willing to consider new evidence.
The process of hypothesis testing reflects how hard scientists are on
their own theories. As scientists evaluate systematically collected evidence to
make a judgment of whether the evidence favors their hypothesis or favors
the corresponding null hypothesis, they a/ways favor the nuU hypothesis.
Statistical techniques aUow scientists to make probability-based statements
about the empirical evidence that they have colIected. You might think that,
if the evidence was 50-50 between their hypothesis and the corresponding
nuU hypothesis, the scientists would tend to give the nod to the hypothesis
(from their theory) over the nuU hypothesis. In practice, though, this is
not the case. Even when the hypothesis has an 80-20 edge over the null
hypothesis, most scientists will still favor the null hypothesis. Why? Because
scientists are very worried about the possibility of falsely rejecting the null
hypothesis and therefore making c1aims that others ultimately wiII sho\V to
be wrong.
Once a theory has become established as a part of scientific knowledge in a field of study, researchers can build upon the foundation that
this theory provides. Thomas Kuhn wrote about these processes in his famous book The Structllre of Scientific Revo/utions. According to Kuhn,
scientific fields go through cycles of accumulating knowledge based on a
set of shared assumptions and commonly accepted theories about the way
that the \Vorld works. Together, these shared assumptions and accepted
theories form what \Ve calI a paradigm. Once researchers in a scientific
field have widely accepted a paradigm, they can pursue increasingly technical questions that make sense only beca use of the work that has come
beforehand. This state of research under an accepted paradigm is referred
to as normal science. When a major problem is found with the accepted
theories and assumptions of a scientific field, that field will go through a
revolutionary period during which new theories and assumptions replacc
the old paradigm to establish a new paradigm. One of the more famous of
these scientific revolutions occurred during the 16th century when the ficld
of astronomy was forced to abandon its assumption that the Earth was the

.. .

",

6

7

The Scientific Study of Politics

center of the known universe. This was an assumption that had informed
theories about planetary movement for thousands of years. In the book
On Revolutions of the Heavenly Bodies, Nicolai Copernicus presented his
theory that the Sun was the center of the known universe. Although this
radical theory met many challenges, an increasing body of evidence convínced astronomers that Coperinicus had it right. In the aftermath of this
paradigm shift, researchers developed new assumptíons and theories that
established a new paradigm, and the affected fields of study entered into
new periods of normal scientific research.
It may seem hard to imagine that the field of polítical science has gone
through anything that can compare with the experiences of astronomers in
the 16th century. Indeed, Kuhn and other scholars who study the evolution of scientific fields of research have a lively and ongoing debate about
where the social sciences, líke political science, are in terms of their development. The more skeptical participants in this debate argue that political
science is not sufficiently mature to have a paradigm, much less a paradigm
shift. If we put aside this somewhat esoteric debate about paradigms and
paradigm shifts, we can see an important example of the evolution of scientific knowledge about politics from the study of public opinion in the
United States.
In the 1940s the study of public opinion through mas s surveys was in
its infancy. Prior to that time, political scientists and sociologists assumed
that U.S. voters were heavíly influenced by presidentíal campaigns - and,
in particular, by campaign advertising - as they made up their minds about
the candidates. To better understand how these processes worked, a team
of researchers from Columbia University set up an ín-depth study of public opinion in Erie County, Ohio, during the 1944 presidential election.
Their study in volved interviewing the same individuals at multiple time
periods across the course of the campaign. Much to the researchers' surprise, they found that voters were remarkably consistent from interview to
intervíew in terms of their vote intentions. Instead of being influenced by
particular events of the campaign, most of the voters surveyed had made up
their minds about how they would cast their ballots long before the campaigning had even begun. The resulting book by Paul Lazarsfeld, Bernard
Berelson, and Hazel Gaudet, titled The People's Choice, changed the way
that scholars thought about public opiníon and political behavior in the
United States. If political campaigns were not central to vote choice, scholars were forced to ask themselves what was critical to determining how
people voted.
At first other scholars were skeptical of the findings of the 1944 Erie
County study, but as the revised theories of politics of Lazarsfeld et al. were
evaluate? in other studies, the field of public opinion underwent a change

.;'

¡

-'

1.3 Variables and Causal ExplanaUons

that looks very much like what Thomas Kuhn calls a "paradigm shift." In
the aftermath of this finding, new theories were developed to attempt to
explain the origins of voters' long-Iasting attachments to political parties in
the United States. An example of an influential study that was carried out
under this shifted paradigm is Richard Niemi and Kent Jenning's seminal
book from 1974, The Political Character of Adolescence: The Influence
of Families and Schools. As the dtle indicates, Niemi and Jennings studied
the attachments of schoolchildren to polítical parties. Under the pre-Erie
County paradigm of public opinion, this study would not have made much
sense. But once researchers had.found that voter's partisan attachments
were quite stable over time, studying them at the early ages at which they
form became a reasonable scientific enterprise. You can see evidence of
this paradigm at work in current studies of party identification and debates
about its stability.

THINKING ABOUT THE WORLD IN TERMS OF VARIABLES
AND CAUSAL EXPLANATIONS

SO how do political scientists devetop theories about politics? A key element
of this is that they order their thoughts about the political world in terms
of concepts that scientists call variables and causal relationships between
variables. This type of mental exercise is just a more rigorous way of
expressing ideas about politics that we hear on a daily basís. You should
think of each variable in terms of Íts label and its values. The variable labe1
is a description of what the variable is, and the variable values are the
denominations in whích the variable occurs. So, if we're talking about the
variable that reflects an individual's age, we could simply label this variable
"Age" and sorne of the denominations in which this variable occurs would
be years, days, or even hours.
It is easier to understand the process ofturning concepts into variables
by using an example of an entire theory. For instance, if we're thinking
about U.S. presidential elections, a commonly expressed idea is that the incumbent president will fare better when the economy is relatively healthy.
If we restate this in terms of a political science theory, the state of the economy becomes the independent variable, and the outcome of presidential
elections becomes the dependent variable. One way of keeping the lingo of
theories straight is to remember that the value of the "dependent" variable
"depends" on the value of the "independent" variable. Recall that a theory
is a tentative conjecture about the causes of sorne phenomenon of interest.
In other words, a theory is a conjecture that the independent variable is
causally related to the dependent variable; according to our theory, change

8

The Scientific Study of Politlcs

9

in the value of the independent variable causes change in the value of the
dependent variable.
This is a good opportunity to pause and try to come up with your own
causal statement in terms of an inqependent and dependent variable; try
filling in the following blanks with sorne political variables:
__________________________ causes _________________________

1.3 Variables and Causal Explanations

Independen! variable ______________________-+.
(concept)
Causal theory

·
·
··
·

(Operalior-¡alizalion)

Sometimes it's easier to phrase causal propositions more specifically in
terms of the values of the variablestbat you have in mind. For instance,
higher _________________ causeslower __________________

or
hlgher _________________ causes bigher __________________

Once you learn to think about the world in terms of variables you will be
able to produce an almost endless s!ewof causal theories. In Chapter 4 we
will discuss at length how we design research to evaluate the causal e1aims
in theories, but one way to initially evaluate a particular theory is to think
about the causal explanation behiqd it. The causal explanation behind a
theory is the answer to the questioq, "why do you think that this independent variable is causally related to this dependent variable?" If the answer
is reasonable, then the theory has ppssibilities. In addition, if the answer is
original and thought provoking, tIlen you may really be onto something.
Let's return now to our working example in which the state of the economy is the independent variable and the outcome of presidential e1ections
is our dependent variable. The caqsal explanation for this theory is that
we believe that the state of the ecoqomy is causally related to the outcome
of presidential elections because vqters hold the president responsible for
management of the national economy. As a result, when the economy has
been performing well, more voters will vote for the incumbent. When the
economy is performing poorly, fewer voters will support the incumbent
candidate. If we put this in terms of the preceding fill-in-the-blank exercise,
we could write

Dependenl variable
(concep!)

(Operati0r:alizalion)

··
··
·
•·

Hypolhesis
Independen! variable _________________________+. Dependen! variable
(measured)
(measured)

Figure 1.2. From theory to hypothesis.

For now we'll refer to this theory, which has been widely advanced and
tested by political scientists, as "the theory of economic voting."
To test the theory of economic voting in u.s. presidential elections, we
need to derive from it one or more testable hypotheses. Figure 1.2 provides
a schematic diagram oí- the relarionship between a theory and one of ¡ts
hypotheses. At the top of this diagram are the components of the causal
theory. As we move from the top part of this diagram (Causal theory) ro
the bottom part (Hypothesis), \Ve are moving from a general statement
about how we think the world \Vorks to a more specific statement about a
relationship that we expect to find when we go out in the real world and
measure (or operationalize) our variables. 4
At the theory level at the top of Figure 1.2, our variables do not need to
be explicitly defined. With the economic voting example, the independent
variable, "Economic Performance," can be thought of as a concept that
ranges from very strong to very poor. The dependent variable, "Incumbent
Vote," can be thought of as a concept that ranges from very high to very
low. Our causal theory is that a stronger economic performance causes the
incumbent vote to be higher.
Because there are many ways in which we can measure each of our
two variables, there are many different hypotheses that we can test to find
out how well our theory holds up to real-world data. We can measure economic performance in a variety of ways. These measures inelude inflation,

economic performance causes presidential e1ection outcomes,
OI,

more specinca\\y, we cou\d wri~e
1].igher economic performance causes higher incumbent vote.

4 Thtou&nout tnis book we wi\\ use tne tetms "measute" and "operationa\'1.c" ,ntercnangc-

ably. 1t is faidy common practice in the current political science literature to use the term
"operationalize. "

10

11

The Sclentlftc Study of Polítics

o
~
o

O>
Q)

Ol

ni

E
Ql

o

00

o

e "
o
Q)

o-

Q)

~

~

8!
,.!.

ID

o

III

~

e o
Q)
.o

E

:l

o
.E

'"

o

N

~

o

-20

-10

O

20

10

One-Year Real Economic Growth

Figure 1.3. What would you expect to see based on the theoey of economic voting?

uncmployment, real economic growth, and many others. "Incumbent Vote"
may seem pretty straightforward to measure, but here there are also a number of choices that we need to make. For instance, what do we do in the
cases in which the incumbent president is not running again? Or what about
e1ections in which a third-party candidate runs? Measurement (or operationalization) of concepts is an important part of the scientific process. We
will discuss this in greater detail in Chapter 5, which is devoted entirely to
variable measurement.For now, we imagine that we are operationalizing
economic performance with real economic growth, as defined by official
u.s. government mea sures of the one-year rate of inflation-adjusted economic growth at the time of the e1ection. We operationalize our dependent
variable as the percentage of the popular vote, as reported in official e1ection results, for the party that controlled the presidency at the time of the
e1ection.
Figure 1.3 shows the axes of the graph that we could produce if we
collected the measures of these two variables. We could place each U.S.
presidential e1ection on the graph in Figure 1.3 by identifying the point that
corresponds to the value of both "One-Year Real Economic. Growth" (the
horizontal, or x, axis) and "Incumbent-Party Vote Percentage" (the vertical,
or y, axis). For instance, if these values were (respectively) and 50, the position for that election year would be exactly in the center of the graph. Based
on our theory, what would you expect to see if we collected these measures
for all eJections? Remember that our theory is that a stronger economic
performance causes the incumbent vote to be higher. And we can restate

°

1.3 Variables and Causal Explanatlons

this theory in reverse such that a weaker economic performance causes the
incumbent vote to be lower. So, what would this lead us to expect to see if
we plotted real-world data onto Figure 1.3? To get this answer right, let's
make sure that we know our way around this graph. If we move from left
to right on the horizontal axis, which is labeled "One-Year Real Economic
Growth," what is going on in reai-worldterms? We can see that, at the far
left end of the horizontal axis, the value is -20. This would mean that the
U.S. economy had shrunk by 20% over the pastyear, which would represent
a very poor performance (to say the least). As we move to the right on this
axis, each point represents a better economic performance up to the point
where we see a value of +20, indicating that the real economy has grown
by 20% over the past year. The vertical axis depicts values of "IncumbentParty Vote Percentage." Moving upward on this axis represents an inereasing share of the popular vote for the incumbent party, whereas moving
downward represents a decreasing share of the popular vote.
Now think about these two axes together in terms of what we would
expect to see based on the theory of economic voting. In thinking through
these matters, we should always start with our independent variable. This
is beca use our theory sta tes that the value of the independent variable
exerts a causal influence on the value of the dependent variable. So, if we
start witha very low value of economic performance - let's say -15 on
the horizontal axis - what does our theory lead us to expect in terms of
values for the incumbent vote, the dependent variable? We would also
expect the value of the dependent variable to be very low. This case would
then be expected to be in the lower-Ieft-hand comer of Figure 1.3. Now
imagine a case in which economic performance was quite strong at +15.
Under these circumstances, our theory would lead us to expect that the
incumbent-vote percentage would also be quite high. Such a case would
be in the upper-right-hand comer of our graph. Figure 1.4 shows two
such hypothetical points plotted on the same graph as Figure 1.3. If we
drawa line between these two points, this line would slope upward from
the lower left to the upper right. We describe such a line as having a
positive slope. We can therefore hypothesize that the relationship between
the variable labeled "One-Year Real Economic Growth" and the variable
labeled "Incumbent-Party Vote Percentage" will be a positive relationship.
A positive relationship is one for which higher values of the independent
variable coincide with higher values of the dependent variable.
Let's consider a different operationalization of our independent variable. Instead of economic growth, let's use "Unemployment Percentage"
as our operationalization of economic performance. We haven't changed
our theory, but we need to rethink our hypothesis with this new measurement or operationalization. The best way to do so is to draw a picture like

"

12

The Scientiftc Study of Politics

13

1.3 Variables and Causal Explanations

g

g~-----------------------------------------------'

r------------------------------------------------,

o
en
., o

~CX)



., o

.,!;!

""

0.0

., co

15

>0
~ll)

"'o
0.'<1"

E
llg

E
i3 o
.EN

o

o

~-----------r-----------r----------~----------~

-20

-10

20

O
10
One-Year Real Economic Growlh

Figure 1.4. What would you expect to see based on the theory of economic voting?
Two hypothetical cases.

Figure 1.3 but with the changed jndependent variable on the horizontal
axis. This is what we have in Figure 1.5. As we move from left to right
on the horizontal axis in Figure 1.5, the percentage of the members of the
workforce who are unemployed goes up. What does this mean in terms
o
o
o
en
"0

~CX)
c:

"0

.,

!;!""
0. 0
,SlCO

o

>0
~ll)

"'o

9;'<1"

c:

"0

~M

::>
00

.EN

o
o

O

10

20

30

40
50
60
70
Unemployment Percentage

80

90

100

Figure 1.5. What would you expect to see based on the theory of economic voting?

o

~--_,----~--_r----._--_r--_,----~--_r----._--~

O

10

20

30

40
50
60
70
Unemployment Percentage

80

90

100

Figure 1.6. What would you expect to see based on the theory of economic voting?
Two hypothetical cases.

of economic performance? Rising unemployment is generalIy considered a
poorer economic performance whereas decreasing unemployment is considered a better economic performance. Based on our theory, what should we
expect to see in terms of incumbent vote percentage when unemployment
is high? What about when unemployment is low?
Figure 1.6 shows two such hypothetical points plotted on our graph
of unemployment and incumbent vote from Figure 1.5. The point in the
upper-Ieft-hand comer represents our expected vote percentage when unemployment equals zero. Under these circumstances, our theory of economic voting leads us to expect that the incumbent party wiII do very well.
The point in the lower-right-hand corner represents our expected vote percentage when unemployment is very high. Under these circumstances our
theory of economic voting leads us to expect that the incumbent party
will do very poorly. If we draw a line between these two points, this line
would slope downward from the upper-Ieft to the lower-right. We describe
such a line as having a negative slope. We can therefore hypothesize that
the relationship between the variable labeled "Unemployment Percentage"
and the variable labeled "Incumbent-Party Vote Percentage" will be a ncgative relationship. A negative relationship is one for which higher values
of the independent variable coincide with lower values of the dependent
variable.
In this example we have seen that the same theory can lead to a hypothesis of a positive or a negative reIationship. The operationalization of

14

The Scientific Study of Politics

the illdependent and the dependent variables determines the direction of
the hypothesized relationship. It is often very helpful to draw a picture like
Figure 1.3 or 1.5 to translate our theories into hypotheses. Once we have
such a figure with the axes properly labeIed, we can determine what our
expected value of our dependent variable should be if we observe both a
high and a low value of the independent variable. And once we have placed
the two resulting points on our figure, we can tell whether our hypothesized
relationship is positive or negative.
Once we have figured out our hypothesized relationship, we can collect data from real-world cases and see how well these data reflect our
expectations of a positive or negative relationship. This is a very important step that we can carry out fairly easily in the case of the theory of
economic voting. Once we collect all of the data on economic performance
and e1ection outcomes, we will, however, still be a long way from confirming the theoey that economic performance causes presidential e1ection
outcomes. Even if a graph like Figure 1.3 produces compelling visual eviden ce, we will need to see more rigorous evidence than that. Chapters 8-12
focus on the evaluation of hypotheses by use of statistics. The basic logic
of statistical hypothesis testing is that we assess the probability that the
relationship we find could be due to random chanceo The stronger the evidence that such a relationship could not be due to random chance, the more
confident we would be in our hypothesis. The stronger the evidence that
such a relationship could be due to random chance, the more confident we
would be in the corresponding null hypothesis. This in turo reflects on our
theory.
We also, at this point, need to be cautious about claiming that we
have "confirmed" oue theory, because social scientific phenomena (such as
elections) are usually complex and cannot be explained completely with
a single independent variable. Take a minute or two to think about what
other variables, aside from economic performance, you be!ieve might be
causally re!ated to U.S. presidential e1ection outcomes. If you can come up
with at least one, you are on your way to thinking like a political scientist.
Because there are usual!y other variables that matter, we can continue to
think about our theories two variables at a time, but we need to qualify our
expectations to account for other variables. We will spend Chapters 3 and
4 expanding on these important issues.

15

1.5 Rules of the Road

Political scientist]ames Rogers provides an excellent analogy between models and maps to explain how these abstractions from reality are useful to
us as we tey to understand the polítical world:
... the very unrealism of a mode!, if properly constructed, is what makes
it usefu1. The models developed be!ow are intended to serve much the
same function as a street map of a city. If one compares a map of a city
to the real topography of that city, it is certain that what is represented
in the map is a highly unrealistic portrayal of what the city actually looks
Iike. The map utterly distorts what is really there and leaves out numerous
details about what a particular area looks Iike. But it is precisely because
the map distorts reality - becáuse it abstracts away from a host of details
about what is really there - tbat it is a useful too1. A map that attempted
to portray the full details ora particular area would be too c1uttered to
be useful in finding a particular location or would be too large to be
conveniently stored. (2006, p.. 276, emphasis in original)
The essential point is that model¿ are simplifications. Whether or not they
are useful to us depends on whát we are trying to accomplish with the
particular model. One of the remarkable aspects of mode!s is that they
are often more useful to us when they are ina<,:curate than when they are
accurate. The process of thinking about the failure of a model to explain
one or more cases can generate a new causal theory. Glaring inaccuracies
often point us in the direction of ~ruitful theoretical progre~s.
RULES OF THE ROAD TO SCIENTIFIC KNOWLEDGE
ABOUT POLITICS

In the chapters that follow, we ~ill focus on particular tools of political
science research. As we do this, try to keep in mind our larger. purpose teying to advance the state of scientific knowledge about politics. As scientists, we have a number of basic tules that should never be far froro our
thinking:






Make your theories causal.
Don't let data alone drive your theories.
Consider only empirical evidence.
Avoid normative statements.
Pursue both generality and parsimony.

MODELS OF POLITICS

When we think about the phenomena that we want to better understand
as dependent variables and develop theories about the independent variables ~hat causally influence them, we are constructing theoretical models.

~¡~~1j Make Your Theories Causal
Al! of Chapter 3 deals with the issue of causality and, specifically, how we
identify causal relationships. When political scientists construct theories,

16

The Scientific Study of Politics

it is critical that they always think in terms of the causal processes that
drive the phenomena in which tb,ey are interested. For us to develop a
better understanding of the political world, we need to think in terms of
causes and not mere covariation. ¡he term covariation is used to describe
a situation in which two variables vary together (or covary). If we imagine
two variables, A and B, then we would say that A and B covary if it is
the case that, when we observe higher values of variable A, we generally
also observe higher values of variable B. We would also say that A and B
covary if it is the case that, when we observe higher values of variable A,
we generally also observe lower values of variable B.s It is easy to assume
that when we observe covariation we are also observing causality, but it is
important not to faH into this trap.

17

What this rather silly example illustrates is that We don't want our
theories to be crafted based entirely on observations from real-world data.
We are likely to be somewhat familiar with empirical patterns relating to
the dependent variables for which we are developing causal theories. This
is normal; we wouldn't be able to develop theories about phenomena about
which we know nothing. But we need to be careful about how much we Jet
what we see guide our development of our theories. One of the best ways
to do this is to think about the underlying causal process as we develop
our theories and to let this have much more influence on our thinking than
patterns that we might have observed.
~ Consider Only Empirical Evidence

As we previously outlined, we need to always remain open to the possibility
that new evidence will come along that will decrease our confidence in even
a well-established theory. A closely related rule of the road is that, as
scientists, we want to base what we know on what we see from empírical
evidence, which, as \Ve have said, is simply "evidence based on observing
the real world." Strong logical arguments are a good start in favor of
a theory, but before we can be convinced, we· need to see results from
rigorous hypothesis tests.ti

llt~rn Don't Let Data Alone Drive Your Theories

This rule of the road is closely linked to the first. A longer way of stating
it is "try to develop theories before examining the data on which you will
perform your tests. " The importance of this rule is best illustrated by a silly
example. Suppose that we are looking at data on the murder rate (number
of murders per 1000 people) in the city of Houston, Texas, by months of
the year. This is our dependent va¡riable, and we want to explain why it
is higher in sorne months and lower in others. If we were to take as many
different independent variables as possible and simply see whether they
had a relationship with our dependent variable, one variable that we might
find to strongly covary with the murder rate is the amount of money spent
per capita on ice cream. If we pedorm sorne verbal gymnastics, we might
develop a "theory" about how heightened blood sugar levels in people who
eat too much ice cream lead to murderous patterns of behavior. Of course,
if we think about it further, we might realize that both ice cream sales and
the number of murders committeq go up when temperatures rise. Do we
have a causally plausible explanation for why temperatures and murder
rates might be causally related? It ;s pretty weH known that people's tempers tend to fray when the temperature is higher. People also spend a lot
more time outside during hotter weather, and these two factors might combine to produce a plausible relationship between temperatures and murder
rates.
5 A closely related term is correlation. For now we use these twO terms interchangeably.

In Chapter 8, you will see that there are precise statistical measures of covariance and
correlation that are closely related to each other but produce different numbers for the
same data.

1.5 Rules of the Road

~t~"'7i'-'''~!''4i

¡¿~~!~j

Avoid Normative Statements

Normative statements are statements about how the world ought to be.
Whereas politicians make and break their potitical careers with normative
statements, political scientists need to avoid them at all costs. Most political scientists care about political issues and have opinions about how the
world ought to be. On its own, this is not a problem. But when normativc
preferences about how the world "should" be structured creep into their
scientific work, the results can become highly problematic. The best way
to avoid such problems is to conduct research and report your findings in

ti It is worth noting that sorne political scientists use data drawn from experimental setrings

to test their hypotheses. There is some debate about whether such data are, strictly
speaking, empirical or noto We discuss political science experiments and their limitations
in Chapter 4. In recent rearssome political scientists have also made e1ever use of simulated
data to gain leverage on their phenomena of interest, and the empirical nature of such
data can cenainly be debated. In the context of this textbook we are not interested in
weighing in on these debates abóut exactly what is and is not empirical data. Instead, we
suggest that one should a1ways consider the overalI quality of data on which hypothesis
tests have been performed when evaluating causal c1aims.

J

1

¡
.~

-j
1
1

18

The Scientific Study of Politics

such a fashion that it is impossible for the reader to tell what your values
are or your normative preferences about the world are.
This does not mean that good political science research cannot be used
to change the world. To the contrary, advances in our scientific knowledge
about phenomena enable policy makers to bring about changes in an effective manner. For instance, if we want to rid the world of wars (normative),
we need to understand the systematic dynamics of the international system
that produce wars in the first place (empirical and causal). If we want to
rid America of homelessness (normative), we need to understand the pathways into and out of being homeless (empirical and causal). If we want
to help our favored candidate win elections (normative), we need to understand what characteristics make people vote the way they do (empirical
and causal).
.

.- "

1.5.5..:i Pursue Both Generality and Parsimony

Our final rule of the road is that we should always pursue generality and
parsimony. These two goals can come into conflicto By "generality," we
mean that we want our theories to be applied to as general a class of
phenomena as possible. For instance, a theory that explains the causes of a
phenomenon in only one country is less useful than a theory that explains
the same phenomenon across multiple countries. Additionally, the more
simple or par~.monious a theory is, the more appealing it becomes.7
In the real world, however, we often face trade-offs between generality
and parsimony. This is the case because, to make a theory apply more
generally, we need to add caveats. The more caveats that we add to a
theory, the less parsimonious it becomes.

MHI

19

Concepts Introduced in This Chapter

of theories and research designs to study causal relationships about politics. In Chapter 2, "The Art of Theory Building," we discuss a range of
strategies for developing theories about political phenomena. In Chapter 3,
"Evaluating Causal Relationships," we provide a detailed explanation of
the logic for evaluating causal claims about relationships between an independent variable, which we call "X," and a dependent variable, which
we call "Y." In Chapter 4, "Research Design," we discuss the research
strategies that political scientists use to investigate causal relationships.
In the second section of this book, we éxpand on the basic tools that
polítical scientists need to test their theories. Chapter 5, "Measurement," is
a detailed discussion of how we measure (or operationalize) our variables.
Chapter 6, "Descriptive Statistics and Graphs," introduces a set of tools
that can be used to summarize the characteristics of variables one at a
time. Chapter 7, "Statistical Inference," is an introduction to the logic of
statistical hypothesis testing. In Chapter 8, "Bivariate Hypothesis Testing,"
we begin to apply the lessons frorh Chapter 7 to a series of empirical tests
of the relationship between pairs of variables.
The third and final section of this book introduces the critical concepts of the regression mode!. Chapter 9, "Bivariate Regression Models,"
introduces the two-variable regression model ás an extension oi the concepts from Chapter 8. In Chapter10, "Multiple Regression Models 1: The
Basics," we introduce the multivariate-regression model, with which researchers are able to look at the effects of independent variable X on dependent variable Y while controlling for the effects of other independent variables. Chapter 11, "Multiple Regression Models 11: Crucial Extensions,"
and Chapter 12, "Multiple Regression Models IIl: Applications," provide
in -depth discussions of and advice" for comtnonl y encountered research scenarios involving multivariate-regression models.

A QUICK LOOK AHEAD

You now know the rules of the road. As we go through the next 11 chapters,
you will acquire an increasingly complicated set of tools for developing and
testing scientific theories about politics, so it is crucial that, at every step
along the way, you keep these rules in the back of your mind. The rest of this
book can be divided into three different sections. The first section, which
includes this chapter through Chapter 4, is focused on the development
7

The term "parsimonious" is often used in a relative sense. So, if we are comparing two
theories, the theory that is simpler would be the more parsimonious. Indeed, this rule of
the road might be phrased "pursue both generality and simplicity." We use the words
"parsimony" and "parsimonious" beca use they are widely used to describe theories.

CONCEPTS INTRODUCED IN THis CHAPTER

causal
correlation
covary (or covariation)
dependent variable
empirical
hypothesis
hypothesis testing
independent variable
normal science
normative statements

null hypothesis
operationalíze
paradigm
paradigm shift
parsimonious
theoretical models
theory
variable
variable label
variable values

20

The Scientific Study of Politics

EXERCISES

1. Think about something in the poli~ical world that you would like to to better
understand. Try to think about this as a variable with high and low values.
This is your dependent variable at the conceptuallevel. Now think about what
might cause the values of your dependent variable to be higher or lower. Try
to phrase this in terms of an independent variable, also at the conceptuallevel.
Write a paragraph about these two variables and your theory about why they
are causally related to each other.
2. Identify something in the world that you would like to see happen (normative).
What scientific knowledge (empirical and causal) would help you to pursue
this goal?
3. The 1992 U.S. presidential election, in which challenger Bill Clinton defeated
incumbent George H. W. Bush, has often been remembered as the "It's the
economy, stupid," election. How can we restate the causal statement that
embodies this conventional wisdom - "Clinton beat Bush because the economy
had performed poorly" - into a more general theoretical statement?
For Exercises 4 and 5, consider the following statement about the world:
"H you care about economic success in a country, you should also care about
the peoples' political rights in that country. In a society in which people ha ve
more political rights, the victims of corrupt business practices will work through
the system to get things corrected. As a result, countries in which people have
more political rights will have less corruption. In countries in which there is
less corruption, there will be more economic investment and more economic
success."
4. Identify at least two causal claims that have been made in the preceding statemento For each causal claim, identify which variable is the independent variable
and which variable is the dependent variable. These causal c1aims should be
stated in terms of one of the following types of phrases in which the first blank
should be filled by the independent variable and the second blank should be
filled by the dependent variable:
___________________________ causes ___________________________
higher _____________________ causes lower ___________________
higher ___________________ causes higher ___________________

5. Draw a graph like Figure 1.3 for each of the causal c1aims that you identified
in Exercise 4. For each of your figures, do the following: Start on the left-hand
side of the horizontal axis of the figure. This should represent a low value of the
independent variable. What value of the dependent variable would you expect
to find for such a case? Put a dot on your figure that represents this expected
location. Now do the same for a case with a high value of the independent
variable. Draw a line that coIinedts these two points and write a couple of
sentenées that describe this picture.

21

Exercises

6. Find an article in a political science journal that contains a model of polities.
Provide the citation to the article, and answer the following questions:
(a) What is the dependent variable?
(b) What is one of the independent variables?
(c) What is the causal theory that connects the independent variable to the
dependent variable?
(d) Does this seem reasonable?

23



The Art of Theory Building

2.2 Identifying Interesting Variatlon

You would be even more likely lo get struck by Iightning if, once in the
area of thunderstorms, you climbed to the top of a tall barren hill. But you
would be still more likely to get struck if you carried with you a nine iron
and, once on top of the barren hill, in the middle of a thunderstorm, you
held that nine iron up to the sky. The point here is that, although there
are no magical formulae that make the development of a good theory (or
getting hit by lightning) a certain event, there are strategies that you can
fol1ow to increase the Iikelihood of it happening.

-il lDENTIFYlNG lNTERESTlNG VARIATION
'OVERVIEW

In this chapter we discuss the art of theory building. Unfortunately there is
no rnagical formula or cookbook for developing good theories about politics.
But there are strategies for developing theories that will help you to develop
good theories. We discuss these strategies in this chapter.

Njl

GOOD THEORlES COME FROM GOOD THEORY-BUILDlNG
STRATEGIES

In Chapter 1 We discussed the role of theories in developing scientific knowledge. Frorn that discussion, it is clear that a "good" theory is one that, after
going through the rigors of the evaluation process, rnakes a contribution to
scientific knowledge. In other words, a good theory is one that changes the
way that we think about sorne aspect of the political world. We also know
frorn our discussion of the rules of the road that we want oue theories to
be causal, empirical, nonnormative, general, and parsimonious. This is a
talI order, and a logical question to ask at this point is "How do I come up
with such a theory?"
Unfortunately, there is neither an easy answer nor a single answer.
Instead, what we can offer you is a set of strategies. "Strategies?" you may
ask. Imagine that you were given the folIowing assignment: "Go out and
get struck by lightning." 1 There is no cut-and-dried formula that will show
you how to get struck by Iightning, but certainly there are actions that you
can take that will make it more likely. The first step is to look at a weather
map and find an area where there is thunderstorm activity, and if you were
to go to such an area, you would increase your likelihood of getting struck.
I

22

Qur lawyers have asked us to make clear that this is an illustrative analogy and that we
are jn no way encouraging you to go out and tey to get struck by lightning.

A useful first step in theory building is to think about phenomena that vary
and to focus on general patterns. Because theories are designed to explain
variation in the dependent variable, identifying sorne variation that is of
interest to you is a good jumpirlg-off point. In Chapter 4 we present a
discussion of two of the most common research designs - cross-sectional
and time-series observational studies - in sorne detail. For now it is useful
to give a brief description of each in terms of the types of variation in
the dependent variable. These should help clarify the types of variation to
consider as you begin to think about potential research ideas.
When we think about measuring our dependent variable, the first
things that we need to identify ate the time and spatial dimensions over
which we would like to measure this variable. The time dimension identifies the point or points in time at which we would like to measuee oue
variable. Depending on what we ate measuring, typical time increments for
political science data are annual, quarterly, monthly, or weekly measures.
The spatial dimension identifies the units that we want to measure. There
is a lot of variability in terms of the spatial units in political science data. If
we are looking at survey data, the spatial unit will be the individual people
who answered the survey (known as survey respondents).If we are looking at data on U.S. state goverrurtents, the typical spatial unit will be the
50 U.S. states. Data from international relations and comparative politics
often take nations as their spatial units. Throughout this book, we think
about measuring our dependent variable such that one of these two dimensions will be static (or constant). This means that our measures of our dependent variable will be of one of two types. The first is a time-series measure,
in which the spatial dimension is the same for al1 cases and the dependent
variable is measured at multiple points in time. The second is a crosssectional measure, in which the time dimension is the same for a11 cases and
the dependent variable is measured for multiple spatial units. Although it is
possible for us to measure the same variable across both time and space, we
strongly recommend thinking in terms of variation across only one of these

24

The Art of Theory Building

25

gr-------------------------~----------,

2.2 Identifying Interesting Variation

2i

(!l

~



'O
ID
lij'CO

o

lO

1

D..CO
01





::l













•••••



~ Lo-.-.~~r-.-.-~,--r-.-.~-,--r-.-.-,-,--r~
!!I'''

~

...

~

"

~

...

~

...

~

...

~

...

~

...

~

"

~

...

~

...

P.>~ J),"'Zq,#'P.i-~~J),'b~rI'~,A~p'>q,~<:\~~~~<:\"~D"~"'~D"'~,e,~<:\~~~~~~D"'~

,,'!) "~.,,

,,'!) ,,'!)

,,'O. ,,'O. ,,'!) "q¡' ,,'!)

",'J" ",<:S' ",'J"

<6l

"'~-

<61

",<:S' "'~- ",'J"

<6l <6l

YearlMonth

Figure 2.1. Presidential approval, 1995-2005.

Figure 2.2. Military spending in 2005.

two dimensions as you attempt ~o develop a theory about what causes this
variation. 2 Let's consider an example of each type of dependent'variable.

also can phrase this rule as "try to develop theories before examining the
data on which you will perform your tests." So what this means is that we
might develop a theory about U.S. presidential approval using Figure 2.1,
but we would want to test that theory by using a different set of data that
may or may not contain the data depicted in Figure 2.1.

~).2._~j Time-Series Example
In Figure 2.1 we see the average monthly level of U.S. presidential approval
displayed from 1995 to 2005. We can tell that this variable is measured as
a time series beca use the spatial unit is the same (the United States), but the
variable has been measured at tPultiple points in time (each month). This
measure is comparable across rlle cases; for each month we a-re looking at
the average percentage of peopl~ who reported that they approved of the
job that the president was doing. Once we have a measure like this that
is comparable across cases, we éan start to think about what independent
variable might cause the level qf the dependent variable to be higher or
lower.
If you just had a mental alarm bell go off telling you that we seemed
to be violating one of our rules of the road from Chapter 1, then congratulations - you are doing a good job paying attention. Our second rule of
the road is "don't let data alone drive your theories." Remember that we
2 As we mentioned in Chapter 1, we will eventually theorize about multiple independent

variables simultaneously causing the same dependent variable to vary. Confining variation in the dependent variable to a single dimension helps to make such multivariate
con.siderations tractable.

1::7-"~

1~;.~,,~J

Cross-Sectional Example
In Figure 2.2 we see military spending as a percentage of gross domestic
product (GDP) in 2005 for 24 randomly selected nations. We can tell
that this variable is measured cross sectionally, beca use it varies across
spatial units (nations) but does not vary across time (it is measured for
the year 2005 for each case). When we measure variables across spatial
units like this, we have to be careful to choose appropriate measures that
are comparable across spatial units. To better understand this, imagine
that we had measured our dependent variable as the amount of money
that each nation spent on its military. The problem would be that country
currencies - the Afghan afghani, the Armenian dram, and Austrian euro do not take on the same value. We would need to know the currency
exchange rates in order to make these comparable across nations. Using
currency exchange rates, we would be able to convert the absolute amounts
of money that each nation had spent into a common measure. We could
think of this particular measure as an operationalization of the conccpt of
relative military "might." This would be a perfectly reasonable dependent

26

The Art of Theory Building

variable for theories about what makes one nation more powerful than
another. Why, you might ask, would we want to measure military spending
as a percentage of GDP? The answer is that this comparison is our attempt
to measure the percentage of the total budgetary effort available that a
nation is putting into its armed forces. Sorne nations ha ve larger economies
than others, and this measure allows us to answer the question of how much
of their total economic activity each nation is putting toward its military.
We can theorize about what would cause a nation to put more or less of its
available economic resources toward military spending.
Of course, as we discussed in the previous subsection, we would not
want to develop our theory by using data from these 24 cases and then test
it by using only the same set of cases.


, . LEARNING TO USE YOUR KNOWLEDGE

One of the common problems that people have when trying to develop a
theory about a phenomenon of interest is that they can't get past a particular
political event in time or a particular place about which they know a lot. It
is helpful to know sorne specifics about politics, but it is also important to
be able to distance yourself from the specifics of one case and to think more
broadly about the underlying causal process. To use an analogy, it's fine
to know something about trees, but we want to theorize about the foresto
Remember, one of our rules of the road is to try to make our theories
general.

l'

,.,-.

,~"

'C'l

~,.~~~:~j

Moving from a Specific Event to More General Theories

For an example of this, return to Figure 2.1. What is the first thing that
you think most people notice when they look at Figure 2.1? Once they
have figured out what the dimensions are in this figure (U.S. presidential
approval over time), many people look at the fall of 2001 and notice the
sharp increase in presidential approval that followed the terrorist attacks on
the United States on September 11,2001. This is a period of recent history
about which many people have detailed memories. In particular, they might
remember how the nation rallied around President Bush in the aftermath
of these attacks. There are few people who would doubt that there was a
causallinkage between these terrorist attacks and the subsequent spike in
presidential approval.
At first glance, this particular incident might strike us as a unique
event from which general theoretical insights cannot be drawn. After all,
terrorist attacks on U.S. soil are rare events, and attacks of this magnitude
are ev~n more rare. The challenge to the scientific mind when we have strong

27

:~

2.3 Leaming to Use Your Knowledge

confidence about a causal relationship in one specific incident is to push
the core concepts around in wha'.t we might call thought experiments: How
might a less-effective terrorist attack affect public opinion? How mi~t
other types of international incidents shape public opinion? Do we thmk
that terrorist attacks lead to similar reactions in public opinion toward
leaders in other nations? Each of these questions is posed in general terms,
taking the specific events of this one incident as a jumping-off point. T~e
answers to these more general questions should lead us to general theofles
about the causal impact of interhational incidents on public opinion.
In the 1970s John Mueller moved from the specifics of particular
international incidents and their influence on public opinion towards a
general theory of what causes rallies (or short-term increases) in public
opinion.3 Mueller developed a theory that presidential popularity wo~ld
increase in the short term any time that there was international confllct.
Mueller thought that this would occur because, in the face of international
conflict, people would tend to put their partisan differences and ot~er
critiques that they may have of the president's handling of his job aSlde
and support him as the commander and chief of the nation. In Mueller's
statistical analysis of time-series data on presidential approval, he found
that there was substantial support for his hypothesis that international
conflicts would raise presidential approval rates, and this in turn gave him
confidence in his theory of public opinion rallies.
~:"ii2'}
Know Local, Think Global: Can
You Drop the Proper Nouns?
~_.~ ....... _:.1
'.

Physicists don't have theories that apply only in France, and neither should
we. Yet many political scientists write articles with one particular geographic context in mind. Among these, the articles tnat have the greatest
impact are those that advance general theories from which the proper nouns
have been removed.4 An excellent example of this is Michael Lewis-Beck's
article titled "Who's the Chef?" Lewis-Beck, like many observers of French
politics, had observed the particularly colorful period from 1986 to 1988
during which the president was a socialist named Fran~ois Mitterand and
the prime minister was Jacques Chirac, a right-wing politician from the
Guallist RPR party. The height of this political melodrama occurred when
both leaders showed up to international summits of world leaders claiming
to be the rightful representative of the French Republic. This led to a. famous photo of the leaders of the G-7 group of nations that contained elght

3 See Mueller (1973).
4 By "proper nouns, h we mean specific names of people or counrries. But this logic can and

should be pushed further to inelude specific dates, as \Ve subsequendy argue.

28

The Art of Theory Building

people. 5 Although many people ~aw this as ¡ust another colorful anecdote
about the ever-changing nature of the power reIationship between presidents and prime ministers in Fifth Republic France, Lewis-Beck moved
from the specifics of such events fO deveIop and test a general theory about
political control and public opio¡.on.
His theory was that chan!WIg the political control of the economy
would cause public opinion to shift in terms of who was held accountable
for the economy. In France, during times of unified political control of the
top offices, the president is dominant, and thus according to Lewis-Beck's
theory the president should be heId accountable for economic outcomes.
However, during periods of divided control, Lewis-Beck's theory leads to
the expectation that the prime minister, because of his or her control of
economic management during such periods, should be held accountable
for economic outcomes. Through careful analysis of time-series data on
polítical control and economic accountability, Lewis-Beck found that his
theory was indeed supported.
Although the results of this study are important for advancing our understanding of French politics, the theoretical contribution made by LewisBeck was much greater because qe couched it in general terms and without
proper nouns. We also can use this logic to move from an understanding
of a specific event to general theories that explain variation across multiple
events. For example, although it might be tempting to think that every U.S.
presiden ti al eIection is entirely up.ique - with different candidates (proper
names) and different historical cjrcumstances - the better scientific theory
does not explain only the outcorne of the 2008 U.S. presidential eIection, but
of U.S. presidential elections in general. That is, instead of asking "Why did
Bush beat Kerry in the 2004 election?" we should ask either "What causes
incumbent success rates in U.S. fresidential eIections?" or "What causes
Republican candidates to fare better or worse than Democratic candidates
in U.S. presidential elections?"

_j'

EXAMINE PREVlOUS

RESEA~CH

Once you have identified an area in which you want to conduct research,
it is often useful to look at what' other work has been done that is related
to your areas of interest. As we discussed in Chapter 1, part of taking a
scientific approach is to be skeptical of research findings, whether they are
our own or those of other researchers. By taking a skeptical look at the

29

2.4 Examine Previous Research

research of others, we can develop new research ideas of our own and thus
deveIop new theories.
We therefore suggest looking at research that seerns interesting to you
and, as you examine what has been done, keep the following list of questions
in mind:
• What (if any) other causes of the dependent variable did the previous
researchers miss?
• Can their theory be applied elsewhere?
• If we believe their findings, are there further implications?
• How might this theory work at different levels of aggregation
(micro<==:>macro)?

12J!!] What Did the Previous Researchers Miss?
.
'"!;:

Any time that we read the work of others, the first thing that we should
do is break down their theory or theories in terms of the independent and
dependent variables that they claim are causally related to each other. 6 Once
we have done this, we should think about whether the causal arguments
that other researchers have advanced seem reasonable. (In Chapter 3 we
present a detailed four-step process for doing this.) We should also be in the
habit of coming up with other independent variables that we think might
be causally related to the same dependent variable; Going through this type
of mental exercise can lead to new theories that are worth pursuing.

~~~~] Can Their Theory Be Applied EIsewhere?
When we read about the empirical research that others have conducted, we
should be sure that we understand which specific cases they were study ing
when they tested their theory. We should then proceed with a mental exercise in which we think about what we might find if we applied the same
theory to other cases. In doing so, we will probably identify sorne cases for
which we expect to get the same results, as well as other cases for which we
might have different expectations. Of course, we would have to carry out
our own empirical research to know whether our speculation along tnese
Iines is correct, but replicating research can lead to interesting findings. Thc
most useful theoretical development comes when we can identify systematic patteros in the types of cases that will fit and those that will not fit the
6 We cannot overstate the importance of this endeavor. We understand that this can be a

s The G-7, now the G-8 with the inclusion of Russia, is an annual summit meeting of the
heays oE govemment from the world's most powerful nations.

difficult task for a beginning student, but it gets easier with practice. A good wa y to sta rt
this process is to look at the figures or tables in an article and ask yourselE, "What is the
dependent variable here?"

30

The Art of Theory Building

31

find trends in public opinion at Higher (macro) leve!s of aggregation, ir is
always an interesting thought exercise to consider what types of patterns of
individual-Ievel or "micro-" leve! behavior are driving these aggregate-Ieve!
findings.
As an example of this, retutn to the rally 'round the flag example
and change the leve! of aggregatlon. We have evidence that, when there
are international conflicts, publi¿ opinion toward the president becomes
more positive. What types of individual-leve! forces might be driving this
observed aggregate-level trend? It might be the case that there is a uniform
shift across al! types of individuais in their feelings about the president. Ir
might also be the case that the shift is less uniformo Perhaps individuaIs
who dislike the president's policypositions on domestic events are willing
to put these differences aside in the face of international conflicts, whereas
the opinions of the people who were already supporters of the president
rema in unchanged. Thinking about the individual-Ievel dynamics that drive
aggregate observations can be a fruitful source of new causal theories.

established theory. These systematic patterns are additional variables that
determine whether a theory wiII work across an expanded ser of cases. In
rhis way we can think abour deve!oping new theories thar will subsume the
original established theory.
. ,2._4...3

If We Believe Their Findings, Are There Further Implications?

Beginning researchers ofren find themselves intimida red when they read
convincing accounts of the research carried out by more establíshed scholars. After aH, how can we ever expect to produce such innovative theories and find such convincingly supportive results from extensive empirical
tests? Instead of being intimidated by such works, we need to learn to view
them as opportunities - opportunities to carry rheir logic further and think
about what other implications might be out there. If, for example, another
researcher has produced a convincing theory about how voters behave, we
could ask how might this new understanding alter the behavior of strategic
politicians who understand that voters behave in this fashion?
One of the best examples of this type of research extension in polítical sciencecomes from our previous example of John Mueller's research
on rallies in presidential popularity. Because Mueller had found such convincingly supportive evidence of this "rally 'round the flag effect" in his
empirical testing, other researchers were able to think through the strategic
consequences of this phenomenon. This led to a new body of research on
a phenomenon ~called "diversionary use of force." The idea of this new research is that, beca use strategic políticians wiII be aware that international
conflicts temporarily increase presidential popularity, they will choose to
genera te international conflicts at times when they need such a boost.
·2.4.4: How Might This Theory Work at Different Levels of Aggregation

(Micro{:::::}Macro)?

As a final way to use the research of others to generate new theories we
suggest considering how a theory might work differently at varying leve!s
of aggregation. In political science research, the lowest leve! of aggregation
is usually at the leve! of individual people in studies of public opinion. As
we saw in Subsection 2.4.3, when we find a trend in terms of individualleve! behavior, we can deve!op new theoretical insights by thinking about
how strategic politicians might take advantage of such trends. Sometimes
it is possible to gain these insights by simply changing the leve! of aggregation. As we ha ve seen, political scientists have often studied trends in
public opinion by examining data measured at the nationalleve! over time.
This ty'pe of study is referred to as the study of macro politics. When we

2.5 Think Formally

.i'

THINK FORMALLY ABOUT TBE CAUSES THAT LEAD TO
VARIATION IN YOUR DEPENDENT VARIABLE

Thus far in this book we have discussed thinking about the political world
in an organized, systematic fashion. By now, we hope that you are srarting
to think about politics in terms of independent variables and dependent
variables and are developing theories about the causal re1ationships between them. The theories that we nave considered thus far have come from
thinking rigorously aboutthe pheriomena that we want to explain and deducing plausible causal explan~tioÓs. One extension of this type of rigorous
thinking is labeled "formal theory" or "rational choice.'"
The formal-theory approach to social science phenomena startS out
with a fairly basic set of assumptions about human behavior and then uses
game theory and other mathematiCal tools to build models of phenomena
of interest. We can summarize these assumptions about human behavior by
saying that formal theorists assume that all individuals are rational utility
maximizers - that they attempt to maximize their self-interest. Individuals are faced with a variety of choices in political interactions, and those
choices carry with them different consequences - sorne desirable, others
, The terms "formal theory" and "rational choice" have been used fairly interchangeably
to describe the application of game theory and other formal mathematical tools to puzzles
of human behavior. We have a slight preference for the -term "formal theory" beca use it is
a more overarching term describing the enterprise of using these tools, whereas "rarional
choice" describes the most critical assumption that this approach makes.

32

The Art of Theory Building

undesirable. By thinking through the incentives faced by individuals, users
of this approach begin with the strategic foundations of the decisions that
individuals face. Formal theorists then deduce theoretical expectations of
what individuals will do given their preferences and the strategic environment that they confronto
That sounds like a mouthful, we know. Let's begin with a simple
example: If human beings are self-interested, then (by definition) members
of a legislature are self-interested. This assumption suggests that members
will place a high premium on reelection. Why is that? Because, first and
foremost, a politician must be iq office if she is going to achieve her potitical
goals. And from this simple deduction flows a whole set of hypotheses about
congressional organization and behavior. 8
This approach to studying politics is a mathematically rigorous attempt
to think through what it would be like to be in the place of different actors
involved in a situation in which they have to cho05e how to act. In essence,
formal theory is a lot like the saying that we should not judge a person
until we have walked a mile in his or her shoes. We use the tool5 of formal
theory to try to put ourselves in the position of imagining that we are in
someone else's shoes and thinking about the different choices that he or she
has to make. In the following subsections we introduce the basic tools for
doing this by using an expected utility approach and then provide a famous
example of how researchers llsed this framework to develop theories about
why peorle vote.

t. ....
2.5~1·l Utility and Expected Utility'
'
.1
Think about the choice that you have made to read this chapter of this
boo~. What are your expected benefits and what are the eosts that you
expect to ineur? One benefit may be that you are genuinely curious about
how we build theories of politics. Another expected benefit may be that your
professor is likely to test you on this material, and you expect that you will
perform better if you have read this chapter. There are, no doubt, also costs
to reading this book. What else might you be doing with your time? This
is the way that formal theorists approach the world.
Formal theorists think about the world in terms of the outcome of
a collection of individual-level decisions about what to do. In thinking
about an individual's choiees of actions, formal theorists put everything in
terms of utility. Utility is an intentionally vague quantity. The utility from
a particular action is equal to the sum of aH benefits minus the sum of aH
costs from that action. If we consider an action Y, we can summarize the
8 Se~

Mayhew (1974) and Fiorina (1989).

33

2.5 Think FormalIy

utility from Y for individual i with the foHowing formula:
U¡(Y) =

L

B¡(Y) -

L C¡(Y),

where U¡(Y) is the utility for individual i from action Y, L B¡(Y) is the sum
of the benefits B¡ from action Y for individual i, and L C¡(Y) is the sum of
the costs C¡ from action Y for individual i. When choosing among a ser of
possible actions (including the decisions not to act), a rational individual
will choose that aetion that maximizes their utility. To put this formally,
given a set of choices Y = Y¡, Y2 , Y3 , ••• , Yn ,
individual i will choose Ya such that U¡(Y,,) > U¡(Yb) V b -:j:. a,

.~

which translates into, "given a set of choices of action Yi through Y",
individual i will choose that action (Ya) such that the utility to individual i
from that action is greater than the utiIity to individual i from any aetion
(Yb) for all (V) actions b not equal to a." In more straightforward terms,
we could transIate this into the individual choosing that action which he
deems best for himself.
At this point, it is reasonable to look arourÍd the real world and think
about exceptions. Is this really the way that the world works? What about
altruism? During the summer of 2006, the world's second-riehest man,
Warren Buffet, agreed to dona te more than 30 billion dollars to the Bil! and
Melinda Gates Foundation. Could this possibly have been rational utilitymaximizing behavior? What about suicide bombers? The answer to these
types of questions shows both the flexibility and a potential problem of the
concept of utility. Note that, in the preceding formulae, there is always a
subscripted i under each of the referenced utility components, (U¡, B¡, C¡).
This is beca use different individuals ha ve different evaluations of the benefits (B¡) and costs (C¡) associated with a particular action. When rhe criric
of this approach says, "How can rhis possibly be urility-maximizing behavior?" the formal theorist responds, "Beca use this is just an individual with
an unusual utility structure."
Think of it another way. Criticizing formal theory because it takes
preferences as "given" - that is, as predetermined, rather rhan the foclls
of inquiry - strikes us as beside the point. Orher parts of political science
can and should study preference formation; think about political psychology and the study of public opinion. What formal theory does, and docs
well, is ro say, "Okay, once an individual has her preferences - regardless of where they carne from - how do those preferences interact wirh
strategic opporruniries and incentives ro produce political outcomes?" Because formal theory takes those preferences as given does not mean that rhe

34

The Art of Theory Building

35

preference-formation process is unimportant. It merely means that formal
theory is here to explain a different portion of social reality.
From a scientific perspective, this is fairly unsettling. As we discussed
in Chapter 1, we want to build scientific knowledge based on real-world
observation. How do we observe people's utilities? Although we can ask
people questions about what they like and don't like, and even their perceptions of costs and benefits, we can never truly observe utilities. Instead,
the assumption of utility maximization is just that - an assumption. This
assumption is, however, a fairly robust assumption, and we can do a lot
if we are wi1Iing to make it and move forward while keeping the potential
problems in the back of our minds.
Another potentially troubling aspect of the rational-actor utilitymaximizing assumption that you may have thought of is the assumption
of complete information. In other words, what if we don't know exactly
what the costs and benefits will be from a particular action? In the preceding formulae, we were operating under the assumption of complete
information, for which we knew exactly what would be the costs, benefits, and thus the utility from each possible action. When we relax this
assumption, we move our discussion from utility to expected utility. This
is a pretty straightforward transformation in which we put expectations in
front of all utilities. So, under incomplete information, for an individual
action Y,
E[U¡(Y)]

=L

E[B¡(Y)] -

L E[e¡(Y)].

and a rational actor will maximize his expected utility thus:
given a set of choices Y = Y1• Y2. Y3 •...•

Y".

individual i will choose Ya such that E[U¡(Ya)] > E[U¡(Yb)]Vb =1= a.
;:)~.5.2·1 The Puzzle of Turnout

One of the oldest and enduring applications of formal theory to politics is
known as the "paradox of voting." William Riker and Peter Ordeshook
set out the core arguments surrounding this application in their influential
1968 artide in the American, Po/itica/ Science Review titled "A Theory of
the Caleulus of Voting." Their paper was written to weigh in on a Iively
debate over the rationality of voting. In Riker and Ordeshook's notation
(with subscripts added), the expected utility of voting was summarized as

.t

4

2.5 Think Fonnally

more preferred candidate over his less preferred one" (Riker and Ordeshook
1968, p. 25), p¡ is the probability that that voter will cast the deciding vote
that makes her preferred candidate the winner, and e¡ is the sum of the
costs that the voter incurs from voting. 9 If R; is positive, the individual
votes; otherwise, she abstains.
We'll work our way through the right-hand side of this formula and
think about the likely values of each term in this equation for an individual eligible voter in a U.S. presidential election. The term B¡ is Iikely
to be greater than zero for most eligible voters in most U.S. presidential
elections. The reasons for this vary widely from policy preferences to gut
feelings about the relative character traits of the different candidates. Note,
however, that the B¡ term is multiplied by the p¡ termo What is the likely
value of p¡? Most observers of efections would argue that p¡ is extremely
small and effectively equal to zero for every voter in most elections. In the
case of a U.S. presidential election, for one vote to be decisive, that voter
must live in a state in which the popular vote total would be exact/y tied,
and this must be a presidential eleétion for which that particular state would
swing the outcome in the Electoral College to either candidate. Because p¡
is effectively equal to zero, the entire term (B¡ p¡) is effectively equal to zero.
What about the costs of voting, e¡? Voting takes time for all voters.
Even if a voter lives right next door to the polling place, she has to take sorne
time to walk next door, perhaps stand in a line, and cast her baIlot. The
well-worn phrase "time is money" certainly applies here. Even if the voter
in question is not working at the time that she votes, she could be doing
something other than voting. Thus it is pretty clear that C; is greater than
zero. If C¡ is greater than zero artd (B¡ Pi) is effectively equal to zero, then
R; must be negative. How then, do we explain the millions of people that
vote in U.S. presidential elections, or, indeed, e1ections around the world? Is
this evidence that people are truly not rational? Or, perhaps, is it evidence
that millions of people systematlcally overestimate Pi? Influential political economy scholars, induding Anthony Downs and Gordon Tullock ,
posed these questions in the early years of formal theoretical analyses of
politics.
Riker and Ordeshook's answer was that there must be some other
benefit to voting that is not captured by the term (B¡Pi)' They proposed
that the voting equation should be
R;

= (B¡P¡) - e¡ + D¡.

R¡ = (B¡P¡) - e¡.
where R¡ is the reward that an individual receives from voting, B¡ is the
differential benefit that an individual voter receives "from the success of his

9 For simplicity in this example, consider an election in which there are only two candidates
competing. Adding more candidates makes the calculation of B¡ more complicate~, but
does not change the basic result of this mode\.

36

The Art of Theory Building

where Di is the satisfaction that individuals feel from participating in the
democratic process, regardless of the impact of their participation on the
final outcome of the election. Riker and Ordeshook argued that Di could
be made up of a variety of different efficacious feelings about the political
system, ranging from fulfillingone's duties as a citizen to standing up and
being counted.
Think of the contribution that Riker and Ordeshook made to political
science, and that, more broadly, formal theory makes to political science, in
the following way: Riker and Ordeshook's theory leads us to wonder why
any individual will vote. And yet, empirically, we notice that close to half
of the adult population votes in any given presidential election in recent
history. What formal theory accomplishes for us is that it helps us to focus
in on exactly why people do bother, rather than to assert, normatively, that
people should. 10

-JI THINK ABOUT TBE INSTITqTIONS: TBE RULES
USUALLY MATTER

One rich source for theoretical insights comes from thinking about institutional arrangements and the influence that they have in shaping political
behavior and outcomes. In other words, take sorne time to think about the
rules under which the political game is played. To fully understand these
rules and their impact, we need to think through sorne counterfactual scena ríos in which we imagine how outcomes would be altered if there were
different rules in place. This type of exercise can lead to sorne valuable theoretical insights. In the subsections that follow, we consider two examples
of thinking about the impact ofinstitutions.
2.6.1,: Legislative Rules

Considering the rules of the political game has yielded theoretical insights
into the study of legislatures and other governmental decision-making
10 Of course, Riker and Ordeshook did not haye the final word in 1968. In fact, the debate

oyer the rationality of turnout has been at the core of the debate oyer the usefulness
of formal theory in general. In their 1994 book titled Pathologies 01 Rational Choice
Theory, Donald Grecn and lan Shapiro made the first point of attack in their critique of
the role that formal theory plays in political science. One of Green and Shapiro's major
criticisms of this part of political science was that the linkages between formal theory
and empirical hypothesis tests were too weak. In reaction to these and other critics, the
National Science Foundation launched a new program titled "Empirical Implications of
Theoretical Models" (EITM) that was designed to strengrhen the linkage between formal
theory and empirical hypothesis testi..

37

2.6 Think about the Institutions

bodies. This has typically involved thinking about the preference orderings of expected utility-maximizing actors. For example, let's imagine a
legislature made up of three individual members, X, Y, and Z.11 The task
in front of X, Y, and Z is to choose between three alternatives A, B, and C.
The preferences orderings for these three rational individuals are as follows:
X:ABC,
Y: BCA,
Z:CAB.

An additional assumption that is made under these circumstances is
that the preferences of rational individuals are transitive. This means that
if individual X likes A better than B and B better than C, then, for X's
preferences to be transitive, he or she must also like A better than C. Why
is this an important assumption to make? Consider the alternative. What
if X liked A better than B and B better than C, but liked C better than
A? Under these circumstances, it would be impossible to discuss what X
wants in a meaningful fashion beca use X's preferences would produce an
infinite cycle. To put this another way, no matter which of the three choices
X chose, there would always be sorne other chóice that X prefers. Undcr
these circumstances, X could not make a rational choice.
In this scenario, what would the group prefer? This is not an easy
question to answer. lE they each voted for their first choice, each alternative
would receive one vote. If these three individuals vote between pairs of
alternatives, and they vote according to their preferences, we would observe
the following results:
A vs. B, X&Z vs. Y, A wins;

B vs. C, X&Yvs. Z, B wins;
C vs. A, Y&Z vs. X, C wins.
Which of these three alternatives does the group collectively prefer? This
is an impossible question to answer beca use the group's preferences cycle
across the three alternatives. Another way of describing this group's preferences is to say that they are intransitive (despite the fact that, as you can
see, each individual's preferences are transitive).
11 We know that, in practice, legislatures tend to haye many more members. Starting with

this type of miniature-scaled legislature makes formal considerations much easier ro carry
out. Once we haye arrived at conclusions based on calculations made on such a small
scale, it is important to consider whether the conclusions that we haye drawn would apply
to more realistically larger-scaled scenarios.

38

The Art of Theory Building

39

and Z as being members of a legislature, we can see that whoever controls
the ordering of the voting (the rules) has substantial power. To explore these
issues further, let's examine the situation of individual Y. Remember that
Y's preference ordering is BCA. So y would be particularly unhappy about
the outcome of the voting according to X's rules, beca use it resulted in Y's
least-favorite outcome. But remeínber that, for our initial consideration,
we assumed that X, Y, and Z wÚI vote according to their preferences. If
we relax this assumption, what might Y do? In the first round of voting,
y could cast a strategic vote forC against B. If both X and Z continued
to vote (sincerely) according to their preferences, then C would win the
first round. Because we know that both X and Z prefer C to A, C would
win the second round and would be the chosen alternative. Under these
circumstances, Y would be better off beca use Y prefers alternative C to A.
From the perspective of merilbers of a legislature, it is clearly better
to control the rules than to vote strategically to try to obtain a better
outcome. When legislators face reelection, one of the common tactics of
their opponents is to point to specific votes in which the incumbent appears
to have voted contrary to the preferences of his constituents. It would seem
reasonable to expect that legislator Y comes from a district with the same
or similar preferences to those of Y. By casting'a strategic vote for C over
B, y was able to obtain a better outcome but created an opportunity for an
electoral challenger to teIl voters i:hat Y had voted against the preferences
of his district.
In Congressmen in Committees, Richard Fenno's classic study of the
U.S. House of Representatives, one of the findings was that the Rules Committee - along with the Ways an~ Means and the Appropriations Committees - was one of the most requested committee assignments from the
individual members of Congress. At first glance, the latter two committees
make sen se as prominent committees and, indeed, receive much attention
in the popular media. By contrast, the Rules Committee very rarely gets any
media attention. Members of Congress certainly understand and appreciate
the fact that the rules matter, and formal theoretical thought exercises like
the preceding one help us to see why this is the case.

This result should be fairly troubling to people who are concerned
with the fairness of democratic elections. One of the often-stated goals
of elections is to "let the people speak." Yet, as we have just seen, it
is possible that, even when the people involved are all rarional actors,
their coIlective preferences may not be rational. Under such circumstances,
a lot of the normative concepts concerning the role of elections simply
break down. This finding is at the heart of Arrow's theorem, which was
developed by Kenneth Arrow in his 1951 book titled Social Choice and
Individual Values. At the time of its publication, political scientists largely
ignored this book. As formal theory became more popular in political
science, Arrow's mathematical approach to these issues became increasingly
recognized. In 1982 William Riker popularized Arrow's theorem in his
book Liberalism Against Populism, in which he presented a more accessible
version of Arrow's theorem and bolstered a number of Arrow's claims
through mathematical expositions.
2.6.2'1 The Rules Matter!

Continuing to work with our example of three individuals, X, Y, and
Z, with the previously described preferences, now imagine that the three
individuals will choose among the alternatives in two different rounds of
votes between pairs of choices. In the first round of voting, two of the
alternatives will be pitted against each other. In the second round of voting,
the alternative that won the first vote wiII be pitted against the alternative
that was not among the choices in the first round. The winner of the second
round of voting is the overall winning choice.
In our initial consideration of this scenario, we will assume that X, Y,
and Z wiII vote according to th~ir preferences. What if X got to decide on
the order in which the alterna ti ves got chosen? We know that X's preference
ordering is ABe. Can X set things up so that A will win? What if X made
the following rules:
1st round: B vs. C;
2nd round: 1st round winner vs. A.
What would happen under these rules? We know that both X and Y prefer
B to C, so B would win the first round and then would be paired against
A in the second round. We also know that X and Z prefer A to B, so
alternative A would win and X would be happy with this outcome.
Does voting like this occur in the real world? Actually, the answer
is "yes." This form of pairwise voting among alternatives is the way that
legisl.atures typically conduct their voting. If we think of individuals X, Y,

2.7 Extensions

M,'

EXTENSIONS

These examples truly represent just the beginning of the uses of formal
theory in political science. We have not even introduced two of the more
important aspects of formal theory - spatial models and game theory - that
are beyond the scope of this discussion. In ways that mirror applications
in microeconomics, polítical scientists have used spatial models to study
phenomena such as the placement of political parties along the ideological

40

-J:'

2.8 How Do 1 Know If 1 Have a "Good" Theory?

The Art of Theory Building

41

speetrum, mueh as eeonomists pave used spatiaI models to study the Ioeation of firms in a market. Likewise, game theory utilizes a highIy struetured
sequenee of moves by different players to show how any particular aetor's
utility depends not only on her own choices, but also on the choices made
by the other actors. It is easy to see hints about how game theory works
in the preceding simple three-actor, two-stage voting exampIes: X's best
vote in the first stage likeIy depends on which alternative Y and Z choose
to support, and vice versa. Game theory, then, highIights how the strategic
choices made in poIitics are interdependent.

~:;m Can You Test Your Theory on Data That You Have

HOW DO 1 KNOW IF 1 HAVE ,.."GOOD" THEORY1

Once you have gone through sorne or aH of the suggested courses of action
for building a theory, a reasonable question to ask is, "How do I know if
I have a 'good' theory?" Unfortunately there is not a single succinct way
of answering this question. Instead, we suggest that you answer a set of
questions about your theory apd consider your honest answers to these
questions as you try to evaluate the overall quality of our theory. You will
notice that sorne of these questions come directly from the "rules of the
road" that we developed in Chapter 1:

Not Yet ObselVed?

Our second rule of the road is "Don't let data alone drive your theories,"
which we restated in a slightly Ionger form as "Try to develop theorics
before examining the data on which you will perform your tests." If you
have deríved your theory from considering a set of empírical data, you
need to be careful not to have observed aH of the data on which you can
test your theory. This can be a somewhat gray area, and only you know
whether your theory is entirely data driven and whether you observed all
of your testing data before you developed your theory.

111.'

How General Is Your Theory?

We could rephrase this question for evaluating your theory as "How wideIy
does your theory appIy?" To the extent that your theory is not limited to
one particular time period or to one particular spatial unit, it is more
general. Answers to this question vary along a continuum - it's not the end
of the world to have a fairly specific theory, but, all else being equal, a more
general theory is more desirable.
~~j How Parsimonious Is Your Theory?








..,...,
ít~~~,I!.!~

Is your theory causal?
Can you test your theory on data that you have not yet observed?
How general is your theory?
How parsimonious is your theory?
How new is your theory?
How nonobvious is your theory?

~~".:~

Is Your Theory Causal?

Remember that our first rule of the road to scientific knowIedge about
politics is "Make your theories causal." If your answer to the question "Is
your theory causal?" is anything other than "yes," then you need to go
back to the drawing board until the answer is an emphatic "yes."
As scientists studying politics, we want to know why things happen
the way that they happen. As such, we will not be satisfied with mere
correlations and we demand causal explanations .. We know from Chapter 1 that one way initiaHy to evaluate a particular theory is to think about
the causal explanation behind it. The causal explanation behind a theory
is the answer to the question "Why do you think that this independent
varia.ble is causally related to this dependent variable?" If the answer is
reasonable, then you can answer this first question with a "yes."

As with the question in the preceding subsection, answers to this question
also vary along a continuum. In fact, it is often the case that we face a tradeoff between parsimony and generality. In other words, to make a theory
more general, we often have to give up parsimony, and to make a theory
more parsimonious, we often llave to give up generality. The important
thing with both of these desirable aspects of a theory is that we have them
in mind as we evaluate our theory. If we can make our theory more general
or more parsimonious and without sacrifice, we should do so.

Bi!!J How New Is Your Theory?
At first it might seem that this is a pretty straightforward question to answer.
The problem is that we cannot know about al! of the work that has been
done before our own work in any particular area of research. It also is often
the case that we may think our theory is really new, and luckily we ha ve not
been able to find any other work that has put forward the same theory on
the same political phenomenon. But then we discover a similar theory on
a related phenomenon. There is no simple answer to this question. Rather,
our scholarly peers usually answer this question of newness for us when
they evaluate our work.

42

43

The Art of Theory Building

Exercises

ealo
Ev
oS

:;:
oS



o.
15

f!~

al

.D

E
al

~

o~
al





j
1:
B

0;0

0. .....
oS





1:
al



Ea
o
o

v

3:

~---------.'---------'----------r---------.--~
1970
1980
1990
2000

1960

qJb<O.::>

Year

2.8.6¡ How Nonobvious ls Your Theory?

As with the question "How new is your theory?" the question "How nonobvious is your theory?" is best answered by our scholarly peers. If, when they
are presented with your theory, they hit themselves in the head and say,
"Wow, 1 never thought about it like that, but it makes a lot of sense!" then
you have scored very well on this question.
Both of these last two questions ilIustrate an important part of the role
of theory development in any science. It makes sense to think about theories
as being Iike products and scientific fields as being very much Iike markets
in which these products are bought and sold. Like other entrepreneurs in
the marketplace, scientific entrepreneurs will succeed to the extent that their
theories (products) are new and exciting (nonobvious). But, what makes
a theory "new and exciting" is very much dependent on what has come
before it.

..





~




• •

• •



~'b- ~~'b- ~"v0~~ ~.J>..#'0 ~r; ~'b- J>-'b- ~4,'b- $' ~~ .§'b-_,¡-'b- .$:-0 #,0 J'. ~ ~0r; is'l>

tf' ~<:'
,¡>

Figure 2.3. Gross U.S. government debt as a percentage of GDP, 1960-2004.

• •



#

CJ 6~ ú{l ~0~ ~ ~ ~o ~'b' ~q; rf-....A..fl' ~<;:I
~'b' ~~A'I>~
b,0
(:),>'b'
~0'b-' ~ ~ ~ ~'b- ~.::> ~~ ~~J> ·v
CJ
*-0
éy0
.;§l'

Figure 2.4. Women as a percentage of members of parliament, 2004.

s

-" .

and hopefully, as a result, appreciate our description of theory building as
an "art" in the chapter's tide. Theoretical developments come from many
places and being critically immersed in the ongoing literature that studies
your phenomenon of choice is a good place to start.

CONCEPTS INTRODUCED IN THIS CHAPTER

complete information
cross-sectional measure
expected utility
formal theory
incomplete information
intransitive
preference orderings
rational choice

rational utility maximizers
spatial dimension
strategic vote·
time dimension
time-series measure
transitive
utility

EXERCISES

MJI

CONCLUSION

We have presented a series of different strategies for developing theories of
politics. Each of these strategies involves sorne type of thought exercise in
which we arrange and rearrange our knowledge about the political world
in hopes that doing so willlead to new causal theories. You have, we're
cert~in, noticed that there is no simple formula for generating a new theory

1. Figure 2.3 shows gross U.S. govern~ent debt as a percentage of GDP from

1960 to 2004. Can you think of a theory about what causes this variable tO be
higher or lower?
2. Figure 2.4 shows the percentage of a nation's members of parliament who were
women for 20 randomly selected nations in 2004. Can you think of a theory
about what causes this variable to be higher or lower?

44

The Art of Theory Building

3. Think about a political event with which you are familiar and follow these
instructions:
(a) Write a short description of the evento
(b) What is your understanding of why this event happened the way that it
happened?
(c) Moving from local to global; Reformulate your answer to part (b) into a
general causal theory without proper nouns.



Evaluating Causal Relationships

4. Find a polítical science ¡oumal article of interest to you, and of which your
instructor approves, answer the questions, and follow the instructions:
(a) What is the main dependent variable in the article?
(b) What is the main independent variable in the article?
(c) Briefly describe the causal theory that connects the independent and
dependent variables.
(d) Can you think of another independent variable that is not mentioned in
the article that might be causally related to the dependent variable? Briefly
explain why that variable might be causally related to the dependent
variable.
5. Imagine that the way in which the U.S. House of Representatives is elected was
changed from the current single-member district system to a system of national
proportional representation in whichany party that obtained at least 3% of
the vote nationally would get a proportionate share of the seats in the House.
How many and what types of parties would you expect to see represented in the
House of Representatives under this different electoral system? What theories of
politics can you come up with from thinking about this hypothetical scenario?

OVERVIEW

-

-

-~ ,-

Modem political science fundamentally revolves around establishing
whether there are causal relationships between important concepts. This
is rarely straightforward and serves as the basis for almost aU scientific controversies. How do we know, for example, ti economic development causes
democratization, or if democratizatian causes economic development, or
both, or neither? To speak more generally, ti we wish to know whether
some X ~ Y, we need to cross four causal hurdles: (1) ls there a credible
causal mechanism that connects X to Y? (2) Can we eliminate the possibility
that Y causes X? (3) ls there covariation between X and Y? (4) ls there sorne
Z related to both X and Y that makes the observed relationship between
X and Y spurious? Many people, especially those in the media, make the
mistake that crossing just the third causal hurdle - observing that X and Y
covary - is tantamount to crossing all four. In short, finding a relationship is
not the same as finding a causal relationship, and causality is what we care
about as political scientists.

6. Applying formal theory to something in which you are interested. Think about
something in the political world that you would líke to better understand. Try
to think about the individual-Ievel decisions that playa role in deciding the
outcome of this phenomenon. Wpat are the expected benefits and costs that
the individual who is making this decision must weigh?

1 would rather discover one causal/aw than be King of Persia.
- Democritus (quoted in Pearl 2000)

Mil

CAUSALITY AND EVERYDAY LANGUAGE

Like that of most sciences, the discipline of political science fundamentally
revoIves around evaluating causal claims. Qur theories - which ma y be righ t
or may be wrong - typically specify that sorne independent variable causes
sorne dependent variable. We then endeavor to find appropriate empirical
evidence to evaluate the degree to which this theory is or is not supported.
But how do we go about evaluating causal claims? In this chapter and the
next, we discuss sorne principIes for doing this. We focus on the logic of
45

46

¡

I
F

Evaluating Causal Relationshlps

causality and on several criteria for establishing with sorne confidence the
degree to which a causal connection exists between two variables. Then,
in Chapter 4, we discuss various ways to design research that help us to
investigate causal claims. As we pursue answers to questions about causal
relationships, keep our "rules of the road" from Chapter 1 in your mind,
in particular the admonition to consider only empirical evidence along the
way.
It is important to recognize a distinction between the nature of most
scientific theories and the way the world seems to be ordered. Most of our
theories are limited to descriptions of relationships between a single cause
(the independent variable) and a single effect (the dependent variable). 5uch
theories, in this sense, are very simplistic representations of reality, and
necessarily so. In fact, as we noted at the end of Chapter 1, theories of this
sort are laudable in one respect: They are parsimonious, the equivalent of
bite-sized, digestible pieces of information. We cannot emphasize strongly
enough that almost alI of our theories about social and polítical phenomena
are bivariate - that is, involving just two variables.
But social reality is not bivariate; it is multivmate, in the sense that
any interesting dependent variable is caused by more than one factor. So
although our theories describe the pro po sed relationship between sorne
cause and sorne effect, we always have to keep in the forefront of our
minds that the phenomenon we are trying to explain surely has many other
possible causes. And when it comes time to design research to test our
theoretical ideas - which is the topic of Chapter 4 - we have to try to
account foc, oc "control for," those other causes. If we don't, then our
causal inferences about whether ouc pet theory is right - whether X causes
y - may very well be wrong. 1 In this chapter we layout sorne practical
principies for demonstrating that, indeed, sorne X does cause Y. You also
can apply these criteria when evaluating the causal claims made by others be they a journalist, a candidate for office, a political scientist, a fellow
classmate, a friend, or just about anyone else.
Nearly everyone, nearly every day, uses the language of causa lity sorne of the time formaIly, but far more often in a very informal manner. Whenever we speak of how sorne event changes the course of subsequent events, we invoke causal reasoning. Even the word "because" implies that a causal process is in operation. 2 Yet, despite the ubiquitous

1 Throughout this book, in the text as well as in the figures, we will use arrows as a

shorthand for "causality." For example, the text"X ~ Y" should be read as "X causes
Y." Oftentimes, especialIy in figures, these arrows will have question marks over them,
indicating that the existence of a causal connection between the concepts is uncertain.
2 This example was suggested to us by Brady (2002).

47

3.1 Causallty and Everyday LangUage

use of the words "because," "affects," "impacts," "causes," and "causality," the meanings of these words are not exactly clear. Philosophers of
science have long had vigorous debates over competing formulations of
"causality. "3
Although our goal here is not ro wade too deeply into these debates,
there is one feature of the discussions about causality that deserves brief
mention. Most of the philosophy of science debates originate from the
world of the physical sciences. The notions of causality that come to mind in
these disciplines are mostly deterniinistic - that is, if sorne cause occurs, then
the effect will occur with certainty. In contrast, though, the world of human
interactions is probabilistic - increases in X are associated with increases
(or decreases) in the probability of Y occurring, but those probabilities are
not certainties. Whereas physical laws like Newton's laws of motion are
deterministic - think of the law of gravity here - social science more closely
resembles probabilistic causation like that in Darwin's theory of natural
selection, in which random mutadons make an organism more or less fit to
survive and reproduce. 4 However, in reviewing three prominent attempts
within the philosophy of science to elaborate on the probabilistic nature of
causality, the philosopher Wesley Salmon (1993, p. 137) notes that "In the
vast philosophicallíterature on causality [probabilistic notions of causality]
are largely ignored." But in political science, our conceptions of causality
must be probabilistic in nature. When we theorize, for example, that an
individual's level of wealth causes her opinions on oprimal tax policy, we
do not at aIl mean that every wealthy person will want lower taxes, and
every poor person will prefer higher taxes. Consider what would happen if
we found a single rich person who favors high taxes or a single poor person
who favors low taxes. One case .alonedoes not decrease our confidence
in the theory. In political science there will always be.exceptions beca use
human beings are not deterministic robots whose behaviors conform to
lawlike statements. In other sciences in which the subjects of study are more
robotic, it may make more sense to speak of laws that describe behavior.
Consider the study of planetary orbits, in which scientists can precisely
predict the movement of celestial bodies hundreds of years in advance. The
political world, in contrast, is extremely difficult to predict. As a result,
most of the time we are happy to be able to make proba bilis tic statements
about causal relationships.

You can find an excelIent account of the vigor of these debates in a 2003 book by David
Edmonds and John Eidinow titled Wittgenstein's Poker: Tbe Story of a Ten Minute
Argument Between Two Great Pbilosopbers.
4 We borrow the helpful comparison of probabilistic social science to Darwinian natural
selection from Brady (2004).
3

t
-~

48

Evaluating Causal Relationshlps

49

What aH of this boils down tQ is that the entire notion of what it means
for something "to cause" something else is far from a settled matter. Should
social scientists abandon aH hope of finding causal connections? Not at aH.
What it means is that we should proceed cautiously and with an open mind,
rather than in sorne hyperformulaic fashion.

.ij

FOUR HURDLES ALONG THE ROUTE TO ESTABLISHING
CAUSAL RELATIONSHlPS

If we wish to investigate whether sorne independent variable, which we
will caH X, "causes" sorne dependent variable, which we will caH Y, what
procedures must we follow before we can express our degree of confidence
that a causal relationship does or does not exist? Finding sorne sort of
covariation (or, equivalently, correlation) between X and Y is not sufficient
for such a conclusion.
We encourage you to bear in mind that establishing causal relationships between variables is not at all akin to hunting for DNA evidence
like sorne episode from a televisjon crime drama. Social reality does not
lend itself to such simple, cut-and-dried answers. In light of the preceding discussion about the nature of causality itself, consider what foHows
to be guidelines as to what constitutes "best practice" in political science.
With any theory about a causal relationship between X and Y, we should
carefully consider the answers to the following four questions:
1.
2.
3.
4.

Is there a credible causal mechanism that connects X to Y?
Could Y cause X?
Is there covariation between X and Y?
Is there sorne confounding variable Z that is related to both X and Y
and makes the observed associatíon between X and Y spurious?

First, we must consider whether it is credible to claim that X could
cause Y. To do this, we need to go through a thought exercise in which we
evaluate the mechanics of how X would cause Y. In other words, what is
it specifically about having more (or less) of X that will in all probability
lead to more (or less) of Y? In effect, this hurdle represents an effort to
answer the "how" and "why" questions about causal relationships. The
more outlandish these mechanics would have to be, the less confident we
are that our theory has cleared this first hurdle. Failure to clear this first
hurdle is a very serious matter; the result being that either our theory needs
to be thrown out altogether, 01:' we need to revise it after sorne careful
rethinking of the underlying mechanisms through which it works. It is
worth proceeding to the second question only once we have a "yes" answer
to thi~ question.

(

(

.,

'C

.'.

{

r

hur Causal Hurdles

Second, and perhaps with greater difficulty, we must ask whcther ir
)Qssible (or even likely) that Y might cause X. As you willlearn from
: discussion of the various strategies for assessing causal connections in
lapter 4, this poses thorny problems for sorne forms of social science
:search, but is less problematic for others. Occasionally, this causal hurdle
an be crossed logically. For example, when considering whether a person's
~ender (X) causes him or her to have particular attitudes about abortion
policy (Y), it is a rock-solid certainty that the reverse-causal sccnario can
be dismissed: A person's attitudes about abortion does not "cause" them
to be male or female. If our theory do es not clear this particular hurdle, thc
race is not lost. Under these circumstances, we should proceed to the next
question, while keeping in mind the possibility that our causal arrow might
be reversed.
Throughout our consideration of the first two causal hurdles, we were
concerned with only two variables, X and Y. The third causal hurdle can
involve a third variable Z, and the fourth hurdle always does. Often it is
the case that there are several Z variables.
For the third causal hurdle, we must consider whether X and Y
covary (or, equivalently, whether they are correlated or associated). Generally speaking, for X to cause Y, there must be sorne form of measurable
association between X and Y, such as "more of X is associated with more
of Y," or "more of X is associated with less of Y." Demonstrating a simple
bivariate connection between two variables is a straightforward matrer, and
we will cover it in Chapter 8. Of course, you may be familiar with the dictum "Correlation does not prove causality," and we wholeheartedly agree.
It is worth noting, though, that correlation is normally an essential component of causality. But be careful. It is possible for a causal relationship to
exist between X and Yeven if there is no bivariate association between X
and Y. Thus, even if we fail to c1ear this hurdle, we should not throw out
our causal claim entirely. Instead, we should consider the possibility rhat
there exists sorne confounding variable Z that we need to "control fo['''
before we see a relationship between X and Y. Whether or not we find a
bivariate relationship between X and Y, we should proceed to our fourth
and final hurdle.
Fourth, in establishing causal connections between X and Y, we mu st
face up to the reality that, as we noted atthe outset ofthis chapter, we live in
a world in which most of the interesting dependent variables are caused by
more than one- often many more than one - independent variable. What
problems does this pose for social science? It means that, when trying to
establish whether a particular X causes a particular Y, we need to "control
for" the effects of other causes of Y (and we call those other effects Z). If
we fail to control for the effects of Z, we are quite likely to misunderstand

j

í

1

1
]

50

Evaluating Causal Relationships

51

3.2 Four Causal Hurdles

the relationship between X and Y and make the wrong inference about
whether X causes Y. This is the most serious mistake a social scientist can
make. If we find that X and Y are correlated, but that, when we control
for the effects of Z on both X and Y, the association between X and Y
disappears, then the relationship between X and Y is said to be spurious.
;,.,"~.~ ..1;

Putting It All Together - Adding Up the Answers to Our
Four Questions

As we ha ve ¡ust seen, the process for evaluating a theoretical c1aim that
X causes Y is a complicated process. Taken one at a time, each of the
four questions in the introduction to this section can be difficult to answer
with great c1arity. But the challenge of evaluating a c1aim that X causes Y
involves summing across all four of these questions to determine our overall
confidence about whether X causes Y. To understand this, think about the
analogy that we have been using by calling these questions "hurdles." In
track events that feature hurdles, runners must do their best to try to c1ear
each hurdle as they make their way toward the finish lineo Occasionally
even the most experienced hurdler will knock over a hurdle. Although this
slows them down and diminishes their chances of winning the race, all
is not lost. If we think about putting a theory through the four hurdles
posed by the preceding questions, there is no doubt our confidence will
be greatest when we are able to answer a1l four questions the right way
("yes," "no," "yes," "no") and without reservation. As we described in the
introduction to this section, failure to c1ear the first hurdle shpuld make us
stop and rethink our theory. This is also the case if we find our relationship
to be spurious. For the second and third hurdles, however, failure to c1ear
them completely does not mean that we should discard the causal c1aim in
question. Figure 3.1 provides a summary of this process. In the subsections
that follow, we wiJI go through the process described in Figure 3.1 with a
series of examples.
3.2.2! Identifying Causal Claims Is an Essential Thinking Skill

We want to emphasize that the logic ¡ust presented does not apply merely
to political science research examples. Whenever you see a story in the
news, or hear a speech by a candidate for public office, or, yes, read a
research article in a polítical science c1ass, it is almost always the case that
sorne form of causal c1aim is embedded in the story, speech, or artide.
Sorne times those causal c1aims are explícit - indented and italicized so that
you just can't miss them. Quite often, though, they are harder to spot, and
most.of the time not because the speaker or writer is trying to confuse yOu.

1. Is there a credible causal
mechanism linking Xto Y?

;/~

r-Ij-~-Id-Y~-us-=-e
X--"I No

Stop and reformulate your
theory unit the answer is

"yeso"

i-.:3:-.-:-Is"""th:-e-re-co-v-ar-:-ia"""'ti-on"""'l

Proceed with
caulion to hurdle 3.

between X and Y?

r-------i~
I~
4.

there a confounding
vana~le Z that makes Ihe
assoclation between X
and Yspurious?

~

" " No

Think aboul confounding
variables before moving
to hurdle 4.

~/\:~

Slop and
reformulate your
causal explanation

confounding
variables unti! yóur
answer is ·yes" or
"no."

Proceed with
confidence and
summarize your
findings.

. Figure 3.1. The path to evaluating a causal relationship.

What we want to emphasize is that spotting and identifying causal c1aims
is a thinking skill. It does not come naturally to most people, but it can be
practiced.
Take a common example from a political campaign: A candidate for
president or prime minister who is running for reelection asserts that the
voters should give him or her another term in office beca use the national
economy is performing well. (Or, if economic performance is poor, the
challenger will c1aim that the voters should replace the poor economic
management team of the incumbent with the challenger's party.) There are
perhaps two related causal c1aims embedded in a candidate's appeal tO be
reelected on the basis of the economy's performance. First, the candidate
may be saying that the economic performance is better than it would be if
the voters had chosen the other candidate in the last election. Second, the
incumbent may be c1aiming that economic performance will be better in the
future if he or she is reelected than it will be if the opposing candidate wins.

52

Evaluating Causal Relationships

Because the second claim is about an unpredictable future, let's set it
aside for the moment; it is interesting speculation, but there's no doubt that
it is ¡ust speculation. Focus, then, on the first claim: that the economy is
performing well beca use of the administration's economic policies. Is such
a causal claim credible? For us to make such a judgment, we need to focus
on our four causal hurdles. To start, we need to evaluate whether there
is a credible causal mechanism t)lat connects X (the administration and its
policies) with Y (economic performance). Thinking through the mechanics
of how this causal relationship would work is pretty straightforward. Presidents and prime ministers have a wide array of economic policy-making
tools at their disposal. It seems pretty reasonable that the use of these tools
could cause the economy to fare better or worse. The answer to the first
question is "yes." Second, is it possible that Y causes X? This would mean
that the current economy causeq the administration's economic policies. In
this case, we would need to figure out whether policy enactments did or
did not precede good economic performance. Because there are so many
different policies being enacted at various points in time, this would be a
difficult question to answer. To pe conservative, let's say that the answer to
the second question is "yes." To answer our third question we need tO figure out whether X (the administration and its policies) is associated with Y
(economic performance). Presumably it ¡s, and presumably this relationship
is such that the economic performance improved after the administration's
policies were put in place - right? - or else the candidate would not be making the claim. But, if we wanteq to evaluate such a claim, we would have
to choose an indicator of economic performance and make a comparison
across time. For now, let's say tpat the answer the third question is "yes."
So far, the politician's claim of a causal relationship is doing pretty well
(because Qur answers are "yes," "yes," and "yes"). But the fourth causal
hurdle is where the candidate (and we) can get tripped up. Is there sorne
other force, Z, that is related to both X and Y and renders the relationship
between X and Y spurious? T4ink about this one: Can we think of any
other reasons, besides administration policies, why the economy might be
performing well? Of course we can. It would be patently silly to assert
that the sole cause of strong economic performance is goveroment policy.
Innovation in the private sector (Z), for example, might (by happenstance)
coincide with goveroment policy changes (X) and be strongly related to
prosperity (Y). Without sorne further analysis, \ve have ample reason to
be skeptical of such a candidate's claim that the economy is prosperous
beca use of the administration's sound policies. Unfortunately for the politician, the answer to the fourtb question is "maybe." We're going to have
to see sorne more evidence before we, as scientists, are going to believe her
causal claim.

53

3.2 Four Causal Hurdles

We could rather easily think of a host of other factors that might fit [he
description of a confounding variable. But let's be careful before we dismiss
the politician as a charlatan. Does this mean that the candidate is wrong,
and that we know that administration policies did not cause prosperity?
Absolutely noto All we have done in this simple thinking exercise is to recognize that the dependent variable of interest (Y), the health of the economy,
is certainly a function of many things, one of which may or may not be the
administration's economic policies (X). To know that the administration
produced the prosperity, we would need to control for other possible ca uses
of prosperity, and we haven't done that. Therefore we should conclude that
it is possible that the candidate's daim is appropriate. But it has not yet
been empirically supported, because alternative explanations have not ret
been ruled out. Identifying the underlying causal daim, in this case, helps us
to be skeptical of the self-interested claims of political actors. A candidate's
job, of course, is not to evaluate causal daims carefully; it is to get votes.
But evaluating the credibility of a candidate's often-implicit causal c1aims
is important if we, the voters, do not want to be led astray by vote-hungry
politicians.
An important part of taking a scientific approach to the study of politics
is that we turo the same skeptical logic loose on scholarly daims abollt
causal relationships. Before we can evaluate a causal theory, we need to
consider how well the available evidence answers each of the four questions
about X, Y, and Z. Once we have answered each of these four questions,
one at a time, we then think about the overalllevel of confidence that \Ve
have in the c1aim that X causes Y.
~,"f4"'<~,,-r''l

ff,~~~:,~;

What Are the Consequences of Failing to Control for Otber
Possible Causes?

When it comes to any causal claim, as we have just noted, the fourth causal
hurdle often trips us up, and not just for evaluating political rhetoric or
stories in the news media. This is true for scrutinizing scientific research as
well. In fact, a substantial portion of disagreements between scholars boils
down to this fourth causal hurdle. When one scholar is evaluating another's
work, perhaps the most frequent objection is that the researcher "failed to
control for" sorne potentially important cause of the dependent variable.
What happens when we fail to control for some plausible other cause
of our dependent variable of interest? Quite simply, it means that we have
failed to cross our fourth causal hurdle. So long as a credible case can be
made that some uncontro/led-for Z migbt be related to both X and Y,
we cannot condude with fu/l confidence tbat X indeed causes Y. Because
the main goal of science is to establish whether causal connections between

54

Evaluating Causal Relationships

variables exist, then failing to control for other causes of Y is a potentially
serious problem.
One of the themes of this book is that statistica! analysis should not
be disconnected from issues of research design - such as controlling for
as many causes of the dependent variable as possible. When we discuss
multiple regression (in Chapters 10 and 11), which is the most common
statistical technique that political scientists use in their research, the entire
point of those chapters is to learn how to control for other possible causes
of the dependent variable. We will see that failures of research design,
such as failing to control for all relevant causes of the dependentvariable,
have statistical implications, and the implications are always bad. Failures
of research design produce problems for statistical analysis, but hold this
thought. What is important to realize for now is that good research design
will make statistical analysis more credible, whereas poor research design
will make it harder for any statistical analysis to be conclusive about causal
connections.
."

WHY IS STUDYING CAUSALITY SO IMPORTANT? THREE
EXAMPLES FROM POLITICAL SCIENCE

Our emphasis on causal connections should be clear. We turn now to severa!
active controversies within the discipline of political science, showing how
debates about causality lie at the heart of precisely the kinds of controversies
that got you (and mo~-t of us) interested in politics in the first place.
3.3.1.J Lite Satisfaction and Democratic Stability

One of the enduring controversies in political science is the re!ationship
between life satisfaction in the mass public and the stability of democratic
institutions. Life satisfaction, of course, can mean many different things, but
for the current discusslon let us consider it as varying along a continuum,
from the public's being highly unsatisfied with day-to-day life to being
highly satisfied. What, if anything, is the causal connection between the
two concepts?
Political scientist Ronald Inglehart (1988) argues that life satisfaction
(X) causes democratic system stability (Y). If we think through the first
of the four questions for establishing causal relationships, we can see that
there is a credible causal mechanism that connects X to Y - if people in a
democratic nation are more satisfied with their lives, they will be less likely
to want to overthrow their government. The answer to our {irst question
is "yes." Moving on to our second question: Is it possible that democratic
stabil~ty (Y) is what causes life satisfaction (X)? Certainly it is. It is very easy

55

3.3 Causality and Politlcal Sclence

to conceive of a causal mechanism in which citizens take careful note of the
politica! system when they consider how happy they are and that citizens
living in stable democracies are apt to look back on a history of government
stability - that is, a recent history without violent revolutions - and iee! a
sense of safety and happiness as a resulto The answer to our second question
is "yes." We now turn to the third question. Using an impressive amount
of data from a wide variety of developed democracies, Inglehart and his
colleagues have shown that there is, indeed, an association between average
life satisfaction in the public and the length of uninterrupted democratic
governance. That is, countries with higher average levels of life satisfaction
have enjoyed longer uninterrupted periods of democratic stability. Converse!y, countries with lower levels of life satisfaction have had shorter
periods of democratic stability and more revolutionary upheaval. The answer to our third question is "yes." With respect to the fourth question, it
is easy to imagine a myriad of otuer factors (2's) that lead to democratic
stability, and whether Inglehart has done an adequate job of controlling
for those other factors is the subject of considerable scholarly debate. The
answer to our fourth question is "maybe. " Inglehart's theory has satisfactorily answered questions 1 and 3, but it is the answers to questions 2 and
4 that have given skeptics substantia! reasons to doubt his causal claim.

I

[si:~:~J School Choice and Student Achievement
In recent years, during which there has been considerable concern about
the performance of public elementary and secondary schools, the possibility of the government issuing vouchers to allow families to send children
to private schools has become higHly controversia!. Setting the norma ti ve
issues.aside about whether "school choice" is either inherendy desirable or
instead something that will by its nature drain the public schools, there is
lurking in the background an important empirical and qlUsal issue: Does
the type of school a child attends (X) affect student performance (Y)? Ir
can be argued that, as researchers cannot demonstrate that school-choice
programs improve student performance, the programs lose a substantial
portion of their appeal.
Clearly, the first question establishing causal relationships is easy
enough to answer, because a credible (if not airtight) argument can be made
that children will receive an education that better prepares them for standardized tests in private schools, which typically have smaller class sizes
and fewer layers of bureaucracy. The answer to OUT {irst question is "yes. "
In this example, the second hurdle is pretty easy to c:lear - how could testscore results (Y) cause the type of school (X)? The answer to our second
question is "no."

1:

11

,1

!;¡
,

,

.'

56

Evaluating Causal Relationships

Let's move to the third question of whether there is covariation between X and Y. At first glance, this would seem like an entirely straightforward matter. Find a city or state where there is a school-choice program;
compare the scores onstandar4ized tests between students in the public
school with those in the private school; then draw a conclusion. Is this
comparison useful? Suppose we were to compare scores on a standardized
math test among eighth-graders in Metropolís, USA, sorne of whom went
to private schools by way of a school-choice program and others who remained in Metropolis's public schools. And suppose we find that, indeed,
the average math test score among students who participate in the choice
program is higher than that of fhose who remained in the publíc school.
In this hypothetical case, the a"swer to our third question ;s "yes." Our
theory is looking pretty good. So far, all of the answers have supported
it. Does this mean that the choice program caused their test scores to be
higher?
It is a tempting conclusion to leap to, isn't it? It sounds like a classic
case of comparing apples to app~es, so to speak. But let's try to stick to our
four questions. We have already, in our hypothetical example, conceded
that the type of school (X) is associated with test scores (Y).
The fourth question is the pnly one remaining, and it is, in this case,
a difficult question to answer. Can you think of another cause (Z) that is
related to whether or not a stuqent enrolls in the choice program (X) that
will also be related to the standardized test score? Yeso In this case, the level
of parental involvement (Z) could surely affect both X and Y and might
make the association that we see between X and Y spurious. Parents who
are actively involved in their child's education (Z) are more likely to be
aware of a school-choice program in their district and are more líkely to
pursue that option (X). Similarly, parents with high levels of involvement
in their child's education (Z) are more likely to have children who perform
well on standardized tests (Y); such parents read to their children more,
help them with homework, and stress the importance of education in a
child's life.
In this case, the Z variable we have identified produces what is called
a selection effect - a situation in which a systematic force causes only a
nonrandom subset of eligible t~rgets to participate in a programo In any
substantive area in which we are trying to evaluate the effectiveness of
a government polícy, it is critical to compare participants in the program
with nonparticipants in a rigorous fashion. If we find systematic differences
between participants and nonparticipants - as we surely would in a schoolchoice program - then it becomes exceedingly crucial to try to control
for those forces when evaluating the program's effectiveness. In the schoolchoice example considered here, what seemed like a simple apples-to-apples

57

3.3 Causality and Polítical Science

Political culture

Number 01 political
parties

Electoral system

Figure 3.2. Theoretical causes of the number of parties in legislaturcs.

comparison really turned out to be an apples-to-giraffes comparison. At the
very least, the answer to our (irst question is "maybe.,,5
Let's be extremely careful here. Does this mean that school-choice
programs do not helP to improve student performance on tests? Not at all.
What our four questions do is remind us that, sometimes, the tempringly
easy conclusion needs additional scrutiny before we embrace it. In Chapter 4, we will talk about sorne research designs that can help to ame!iorate
situations precisely like this one.
.
,t

~3:3·.3\1
Electoral Systems and the Number of Political Parties
t·:l'-;(_'··,,·_~
*

'.

"f

Polítical science has a long tradition of examining the impact of institutional
arrangements on political outcomes. One prominent example of this ty pe of
research has focused on the influence of electoral systems on the number of
polítical parties in legislatures. Figure 3.2 depicts a theoretical mode! of the
number of parties that will be represented in a legislature. The first theory is
that, the more societal divisions there are that shape a political culture, the
more political parties there will be in the legisla tu re. The second theory 011
which we focus in this subsection is that, if we hold constant the poI itical
culture of the area that the legislature represents, the more disproportional
the electoral system is in translating votes into seats (X) and fewer poI itical
parties will be represented in the legislature (Y).
The term "disproportional" in this theory is expressed in terms of the
translation of votes into seats for political parties. A perfectly proportional
system would be one in which the percentage of votes cast for each party
was exactly equal to the percentage of seats awarded to that party in the
5 In addition to parental involvement, there are other possible selection mechanisms at

work. A private school involved in a school-choice program, for example, might choose
to use test seores as a criterion for admission.

58

Evaluatlng Causal Relatlonships

legislature as a result of the election. In practice, perfectly proportional
electoral systems are never found; in fact electoral systems differ substantially in terms of how close they come to this ideal. Turning to our four
hurdles, the causal mechanism behind the theory of electoral systems and
the number of parties is driven by the organizational incentives politicians
face when deciding whether to form new political parties or work within
established parties to contest elections. Disproportionate electoral systems
tend to reward the largest parties and greatly penalize the smaller parties in
terms of translating votes into seats. Thus, the more disproportionate the
electoral system, the greater will be the tendency for politicians who are
competing for legislative seats to band together, resulting in fewer political
parties in the legislature. If you believe this, then the answer to our first
question is "yes."
To better understand this theory of the influence of electoral institutions, consider the U.S. House of Representatives. A quick review of the
history of party membership in the U.S. House indicates that, with fewexceptions, two political parties have held all or most of the seats. According
to this theory, this is the case because the U.S. House of Representatives
is elected by use of a set of rules that produce disproportionate outcomes
in terms of the translation of votes to seats. That system is known as a
"single-member district plurality" system. The entire country is divided
into electoral districts, and on election day whichever candidate receives
the most votes (a plu.rality) is elected to represent that district. When the
votes and seats are t~llied up at the nationallevel, results tend disproportionately to favor the parties with the most votes. For example, in the 1992
U.S. House elections, Democratic Party candidates received 49.95% of the
votes cast and 59.31 % of the seats. In that same election, Republican Party
candidates received 44.75% of the votes cast and 40.46% of the ~eats; all
other parties together received 5.3% of the votes cast and only one seat (or
0.2% of the available seats).
One of the most proportional electoral systems in history was that
of the Weimar Republic in Germany between World War I and World
War 11. Under the Weimar Republic electoral system, Germany was divided into 13 electoral regions, and seats in the nationallegislature were
awarded to any party that managed to get 60,000 or more votes in any one
of the electoral regions. Given that the number of voters who turned out
in Weimar Republic elections was never les s than 28 million, politicians
had very little legal incentive to band together to contest elections. Consistent with the theory, the Reichstag had many different political parties
throughout the time of the Weimar Republic. Sorne scholars have suggested
that, beca use the politicians were divided into so many different political
parties in the legislature, they were unable to band together to counter the

59

3.3 Causality and Polltlcal Science

g+---------------------------------------~--------~
Ql

f

~~_t_------------------------------------/--------------I

8:.

••

~+-.--~--------------------~---------~--~

1920

1925

1935

1930

Year
• Number 01 Parties Winnlng Seats

-+--

% Vote lar the Nazi

party

Figure 3.3. Nazi vote and the number of parties winning seats in Weimar Republic
elections, 1919-1933.

rising strength and popularity of the Nazi Party. The Nazi Party, in contrast
with many of the other parties in Germany at the time, was willing and able
to hold together its politicians through coercive means. Figure 3.3 shows
the number of parties in the Reichstag and the percentage of votes for the
Nazi Party across the period of the Weimar Republic.
In the aftermath of World War n, the constitution of West Germany
was designed with the Weimar Republic experience very much in mind.
Although the electoral system continued to be a form of proportional representation, one major change was that parties that got less than 5% of the
vote nationally were not given seats in the nationallegislature. 6 If we look at
Figure 3.4, we can see an interesting pattern. In the first election after World
War n, 11 political parties won ~eats in the Bundestag. After this, though,
fewer and fewer political parties were represented, with only four parties in
the Bundestag throughout the 1960s and 1970s. In the 1980s, the political
culture of what was then West Germany began to change. The Green Party
cleared the 5% threshold and was represented in the Bundesrat. In 1990
East and West Germany were reunified. In each of the elections since then,
6

The West German electoral system, which is now the electoral system for reunified
Germany, is one of the most complicated electoral systems in the world. Jt is possible.
for a parry to gain seats despite failing ro clear the 5% rhreshold provided rhar rhey win
district-Ievel seats. For a nice overview of rhis system, we recommend Gallagher, Laver,
and Mair (2006).

60

Evaluating Causal Relationships

61

3.4 Causality and Everyday Life

we have subjected it to the four causal hurdles. But, as we willlearn later,
we should base our answer to question 3 on more evidence than what we
have examined thus faro



. , . WHY IS STUDYING CAUSALITY SO IMPORTANT? THREE
EXAMPLES FROM EVERYDAY LIFE

~ausal c1aims are not Iímited to social science research like those previously
dlscussed. There are times when causal c1aims in politics, the news, or just
everyday life are downright humorous. Learning the intelIectual habit of
sifting through an argument to find the embedded causal argument can be
useful.



..

.1
-

• •

IfE[]

~

1950

1960

1970

1980

1990

2000

Year
Figure 3.4. Number of parties winning seats in German Bundestag e1ections, 19492002.

six parties have been represented i~ the Bundesrat. The additional party is
the Party of Democratic Socialísm, which is a left-wing party created out
of the remains of the East Germall Communist Party.
Consider the evidence from Germany in terms of our four questions.
In the preceding subsections, we have a reasonably credible mechanism
(incentives faced by office-seeking politicians) of how X (type of electoral
system) causes Y (the number of political parties), and thus we answer our
first question. We do not have any evidence from this case that the number
of political parties caused the electoral system so we can be confident that
the answer to our second questiov is "no." Figures 3.3 and 3.4 certainly
seem to indicate that there was covariation between the electoral system
and the number of parties? When the more disproportionate electoral
system of post-World War II Germany was put into effect, the number of
political parties went down substantially. On the basis of this evidence we
can conclude preliminarily that the answer to our third question is "yes."
We have not yet conducted an ex~ensive search for confounding variables
(Z) that may be related to both rpe electoral system (X) and the number
of polítical parties (Y). But it is difficult to imagine such a variable. So,
on the basis of our consideration of our theories so far, the answer to our
fourth question is "no." T aken together, this theory has done very well as
7

Beginning in Chapter 8 we will discuss more systematic ways in which to use statistical
techniques to evaluate empirical evidence of relationships between variables.

Alcohol Consumption and Income

When you're in the checkout Hne at the grocery store, do you ever pick
up the tabloids and scan them for the latest news about alien abductions
and celebrity breakups? lE so, you might remember this gem in the tabloid
magazine Weekly World News of May 14, 20q2, under the screaming
headline "Want to be Ioaded? Then get loaded!":
Do you want to be rich and successful, like Donald Trump, BiII Gates
or Oprah Winfrey? Then belly up to the bar and drink your way to
wealth! ... when it comes to raking in the dough, boozers leave teetotalers
in the dust, with all but the heaviest drinkers earning more.
Perhaps it's easy to believe that such a claim would appear in a tabloid
magazine. At least, in this case, the causal claim is right there in the title oE
the article. Think about it for a moment: Given the fact that the consumers
of tabloids do not have a bevy of data at their disposal about both alcohol
consumption and adult earnings, what kind of evaluation can we make
about such a causal c1aim? Think about our causal hurdles. Perhaps _
perhaps! - we can cross the third hurdle by finding that it is true that alcohol
consumption (X) is associated with higher earnings (Y). What about our
second hurdle? Is it possible that earnings cause alcohol consumption? Ir
is, at least in the sen se that individuals with higher incomes ha ve more
discretionary dollars to spend on anything - including alcohol - that they
like, and that, in contrast, people with lower levels of income ha ve a natural
ceiling on how much they can spend on alcohol. So maybe the causal
arrow mns the other way after aU, though not in the pernicious way that
the tabloid suggests. The fourth causal hurdle - trying to think of possible
confounding variables that might be related to both X and Y - is also simple
in this case: People who work in corporate America and have business
dinners with c1ients are more likely to consume alcohol, and they are also

62

Evaluating Causal Relationships

apt to make higher salaries than those who are not in the corporate world.
But, most egregiously, the first causal hurdle trips us up. As much as avid
producers and consumers of alcoholic beverages might like to convince
people that higher levels of drinking will translate to higher income, can you
think of a credible causal mechanism that connects alcohol consumption
to income?
3.4.2~: Treatment Choice and Breast Cancer Survival

In 2006 the National Breast Cancer Foundation forecast that 211,000
women and 1600 men in the United Sta tes would be told by their doctors
that they have breast cancer. One of the most painful situations that sorne
patients have to face occurs when they have to choose a treatment option
to pursue.
Two treatment strategies strike different balances between the desirability of aggressively treating the cancer and keeping the procedures as
minirnally invasive as possible. The first strategy, called a radical mastectomy, represents an effort to try to purge the entire body of cancer by
removing the en tire breast. In effect, this strategy acknowledges that the
breast can produce cancer, so thechoice is to remove it. Because of this,
it is maximally invasive to a patient's body, and, understandably, few patients find it appealing. The second strategy, called a lumpectomy, is a
more localized surgery in which the cancerous tumor is removed from the
patient's breast, but as much of the breast as possible is left intacto Certainly this treatment is less invasive, less aggressive, and is less distasteful to
most breast cancer patients. It also carries with it the risk that sorne cancer
cells might be missed during the surgery and, as a result, left behind in the
body.
Which treatment option should a patient choose? Obviously, patients
might consider a myriad of factors when faced with such a choice, but
arnong them might be the expected survival rates for each treatment option. They might expect that patients who choose radical mastectomies, on
average, live longer than patients who choose lumpectomies, for the simple
fact that the lumpectomy procedure carries with it the risk that sorne cancer
cells will be missed and leh behind in the breast tissue, a risk that the radical
mastectomy, by its very nature, avoids.
Perplexingly, there tends to be no association between breast cancer
treatment choice (X) and posttreatment longevity (Y). That is, patients
who go through both procedures have roughly the same 1-year and 5-year
survival rates. (This is to say, the third causal hurdle has not been crossed.)
Does this mean that radical mastectomies are unnecessarily invasive and
shoul~ not be considered a good treatment option?

63

3.4 Causality and Everyday Life

Again, consider the rest of our causal hurdles. From the preceding
discussion of treatments, it is pretty clear that we can clear the first hurdle there is a credible causal mechanism between X and Y. The second hurdle,
that longevity (Y) might cause treatment (X), is obviously not possible. But
the fourth causal hurdle is crucial for evaluating this relationship. Can we
imagine any factors (Z) that might be related to both treatment choice (X)
and longevity (Y)? Certainl y we cad. N ot all cancers are detected a t the same
stage of advancement. Sorne are c~ught early, whereas others are detected
only when the cancer has spread considerably. Thus the severity of cancer
at detection (2) might affect both treatment choice and longevity. Patients
whose cancers are diagnosed in early stages (Z), when the tumors are
small, may be more likely to choose lumpectomies (X) and are more likely
to survive (Y). Conversely, patients whose cancer is spotted in advanced
stages (Z) may have almost no choice but to get the more radical treatment
(X), and their prospects for long-term survival (Y) are less bright.
Like the school-choice example discussed earlier, this is a case in which
a third variable - severity of the diSease when it is detected - opera tes as a
selection mechanism that makes the comparison between individuals with
different values of the independent variable - treatment - extremely difficult
to compare. Although it might be tempting simply to examine the bivariate
relationship between treatment and survival rates and condude that the
treatrnents don't produce different results, that conclusion might be exactly
wrong. Why? Because the patients who get radical mastectomies are systematically different from those who get lumpectomies. Simple comparisons in
such a case can produce incorrect irlferences about causal effects. 8
Note that this is one of those somewhat unusual situations in which
we believe that X may indeed cause Y in spite of the fact that we did not
s~lccessfu//y c/ear the third causal hurdle - that is, that there is no bivariate
association between X and Y. This supports our view that once a theory
has successfully cleared the first hurdle (meaning that there is a credible
causal mechanism) it should be put through all of the remaining three
causal hurdles.

,ª:~:3~~ Explicit Lyrics and Teen Sexual Behavior
What is the role of popular culture in determining teen behavior? Is it
the case that the explicit sexual content that saturates so much of today's
8 Ir should be obvious that we are not oncologists with particular expertise in this very
sensitive area. We are not in any way advocating or disparaging a particular trearment
for breast cancer. Rather, we think it's important to show how rhinking rigoeously about
causality can lead us to look past surface relationships and dig deeper to find bettee
answers.

64

Evaluating Causal Relationships

Exercises

65

Other psychologists might disagree, of course. But the failure to control
for all other confounding variables that might be related to the independent
and dependent variables is more than enough ammunition to allow a sa vvy
record-company executive to cast doubts on such a study.

culture causes teenagers (especially) to be sexually active at an earlier age?
Or is it the case that popular culture is merely a mirror, reflecting back to
us who we truly are? A study reported by the Associated Press in 2006,
titled "5exuallyrics prompt teens to have sex," takes a rather clear position
on this issue:
Teens whose iPods are full of music with raunchy, sexuallyrics start having
sex sooner than those who prefer other songs, a study found .... Teens
who said they listened to lots of music with degrading sexual messages
were almost twice as likely to start having intercourse or other sexual
activities within the following two years as were teens who listened to
little or no sexually degrading music. Among heavy listeners, 51 percent
started having sex within two years, versus 29 percent of those who said
they listened to little or no sexually degrading music.
50 the third causal hurdle - whether X (music listening) and Y (sexual
behavior) are related - has been been cleared. And, for the moment, let's
dismiss the reverse-causal scenario (question 2) that a teen's sexual behavior
causes them to listen to particular kinds of music.
But focus on the fourth causa} question. 5urely explicit lyrics cannot be
the sole factor that causes teens to be sexually active. (And it's worth noting
that no person in the article makes the claim that it is the only cause.) Are
there other factors that might be related to both music-listening habits and
sexual behaviors? According to tJte article, the research "tried to account
for other factors that could affect feens' sexual behavior, including parental
permissiveness, and still found explicit lyrics had a strong influence." 5urely,
parental permissiveness (Z) could be related to both music listening (X) and
sexual behavior (Y), and the finding that the X-Y connection survived such
a control is helpful. But are there still other possible causes in addition to
parental permissiveness? Certainly, and critics mentioned in the article are
quick to point them out. Could peer pressure (Z) be related to both X and
Y? Absolutely. What about self-esteem? Again, yes. Failing to account for
those possible causes - and any others that you might think of that can be
related to both X and Y - can cause us to make the wrong inference about
whether exposure to lyrics causes sexual behavior.
With respect to the first question - the existence of a credible causal
mechanism - the article quotes a psychologist who sees a logical connection:

"

.
~-

The brain's impulse-control center undergoes "major construction" during the teen years at the same time that an interest in sex starts to
blossom .... Add sexually arousing lyrics and "it's not that surprising that
a kid with a heavier diet of that ... would be at greater risk for sexual
b~havior."

¡;'-

"

."

WRAPPING UP

Learning the thinking skills required to evaluate causal claims as conclusively as possible requires practice. They are intellectual habits that, like a
good knife, will sharpen with use.
Translating these thinking skills into actively designing new research
that helps to address causal questions is the subject of Chapter 4. AH of the
"research designs" that you willlearn in that chapter are strongly linked
to issues of evaluating causal claims. Keeping the lessons of this chapter
in mind as we move forward is essential to making you a better consumer
of information, as well as edging you forward toward being a producer of
research.

CONCEPTS INTRODUCED IN THIS CHAPTER

bivariate
confounding variable
deterministic
multivariate

probabilistic
selection effect
spurious

EXERCISES

1. Think back to a history c1ass in which you learned about the "causes" of
a particular historical event (for instance, the Great Depression, the French
Revolution, or World War 1). How well does each causal c1aim perform when
you try to answer the four questions for establishing causal relationships?
2. Go to your local newspaper's web site (if it has one; if not, pick the web site
of any media oudet you visit frequendy). In the site's "Search" box, type the
words "research cause" (without quotes). (Hint: You may need to limit the
search time frame, depending on the site you visit.) From the search results,
find two articles that make c1aims about causal relationships. Print them out,
and inelude a brief synopsis of the causal c1aim embedded in the article.
3. For each of the following examples, imagine that sorne researcher has found the
reported pattern of covariation between X and Y. Can you think of a variable
Z that might make the relationship between X and Y spurious?
(a) The more firefighters (X) that go to a house fire, the greater property
damage that occurs (Y).

, 66

Evaluatlng Causal Relationships

(b) The more money spent by an incumbent member of Congress's campaign
(X), the lower their percentage of vote (Y).
(e) The more ehildren in a community that participate in a Head Start program (X), the greater percentage of students that demonstrate kindergarten
readiness (Y).
(d) The higher the salaries of Presbyterian ministers (X), the higher the priee
of rum in Havana (Y).
4. For each of the following pairs of independent and dependent variables, write
about both a probabilistic and a deterministic relationship to describe the Iikely
relationship:
(a) A person's education (X) and voter turnout (Y).
(b) A nation's eeonomic health (X) and politieal revolution (Y).
(e) Candidate height (X) and election outeome (Y).
5. Take a loo k at the codebook for the data set "BES 2005 Subset" and write
about your answers to the following items:
(a) Develop a causal theory about the relationship between an independent
variable (X) and a dependent variable (Y) from this data set. Is it the
eredible causal mechanism that conneets X to Y? Explain your answer.
(b) Could Y cause X? Explain your answer.
(e) What other variables (Z) would you like to control for in your tests of this
theory?



Research Design

OVERVIEW

Given our focus on causality, what research strategies do polítical scientists
use to investigate causal relationships? Generally speaking, the controlled
experiment is the foundation tor seíentific research. And sorne polítical seíentists use experiments in their work. However, owing to the nature of our
subject matter, most polítical scientists adopt one of two types of "observational" research designs that are intended to inimic experiments. The
cross-sectional observational study focuses on variation across individual
units (like people or countries). The thne-series observational study focuses
on variation in aggregate quantities tlike presidential popularity) over time.
What is an "experiment" and why is it so useful? How do observational
studíes tey to mimic experimental designs? Most important, what are the
strengths and wealmesses of each of these three research designs in establishing causal relationships between concepts? That is, how does each
one help lIS to get across the four causal hurdles identified in Chapter 3?
Relatedly, we introduce issues relating to the selecUon of samples of cases
to study in which we are not able to study the entire population of cases to
which our theoey applies. This is a subject that will feature prominently in
many of the subsequent chapters.

COMPARISON AS THE REY TO ESTABLISHING
CAUSAL RELATIONSmpS

So far, you have learned that political scientists care about causal relationships. You have learned that most phenomena we are interested in
explaining have multiple causes, hut our theories typically deal with only
one of them while ignoring the othets. In sorne of the research examples
in the previous chapters, we ha ve noted that the multivariate nature of the
67

68

Research Design

69

world can make our first glances misleading. In the breast cancer example,
at first it did not appear that any kind of relationship (let alone a causal
relationship) existed between treatment choice and patient longevity. In the
school-choice example, it first appeared that a relationship (and perhaps a
causal one) did exist between participation in the program and test scores.
But, we argued, in both cases those first glances were potentially quite
misleading.
Why? Because what appeared to be the straightforward comparisons
between two groups - patients whQ chose one treatment compared with
patients who chose another, or eighttt-graders in one school compared with
eighth-graders in another school- ended up being far from simple. On sorne
very important factors, our different groupings for our independent variable X were far from equal. That is, patients who chose different treatment
options (X) had differing levels of d~e disease when it was discovered (2),
which also affected their longevity (Y). And students in different school
programs (X) had parents who had systematically different levels of involvement in their childrens' education (Z), which also affected test scores
(Y). As convincing as those bivariate comparisons might have been, they
would likely be misleading.
Comparisons are at the heart of science. If we are evaluating a theory
about the relationship between sorne X and sorne Y, the scientist's ¡ob is to
do everything possible to make sure that no other influences (Z) interfere
with the comparisons that we will rely on to make our inferences about a
possible causal relationship between X and Y.
The obstacles to causal inference that we described in Chapter 3 are
substantial, but surmountable. We don't know whether, in reality, Xcauses
Y. We may be armed with a theory tJtat suggests that X does, indeed, cause
Y, but theories can be (and often are) wrong or incomplete. So how do
scientists generally, and polítical scjentists in particular, go about testing
whether X causes Y? There are several strategies, or research designs, that
researchers can use toward that end. The goal of all types of research
designs are to help us evaluate how well a theory fares as it makes its
way over the four causal hurdles - that is, to answer as conclusively as is
possible the question about whether X causes Y. In the next two sections
we focus on the two strategies that political scientists use most cornmonly
and effectively: experiments and observational studies.

4.2 Experimental Research Designs

successfully tested the drug on rats and developed a dosage regimen that
they expect will be effective on people. However, the drug has yet to be
tested on people.
And it is important to add here that the causal claim has a particular
directional component to it; that is, increased (not decreased) amounts of
the drug are alleged to lower (not raise) blood pressure.
How would researchers in the physical sciences and medicine evaluate
whether this new and promising drug works on humans? Note the focus
on causality here. In more "causal" language, how can we find out whether
taking the drug (X) will cause patients to have lower blood pressure (Y)?
As the introduction to this chapter highlights, we will need a comparison
of sorne kind, and we will want that comparison to isolate any potentially
different effects that the drug has on a patient's blood pressure. It is very
important, and not at aH surprising, to realize that patients may have high
or low blood pressure for a variety of reasons (Z's) that have nothing to
do with our new drug - varying exercise habits, varying diets, and varying
genetic predispositions can aH cause blood pressure to be high or low. So
how can we establish whether, among these other influences (Z), our new
drug (X) also causes a patient's blood pressure (Y) to fall?
The standard answer to this question in the physical and medical
sciences is that we would need to conduct an experimento Because the word
"experiment" has such common usage, its scientific meaning is frequently
misunderstood. An experiment is not simply any kind of analysis that is
quantitative in nature; neither is it exclusively the domain of laboratories
and white-coated scientists with pocket protectors. We define an experiment
as foHows: An experiment is a research design in which the researcher both
controls and randomly assigns values of the independent variable to the
participants.
Notice the twin components of the definition of the experiment: That
the researcher both contro/s values of the independent variable - or X, as we
have called it - as well as random/y assigns those values to the participants
in the experimento Together, these two features form a complete definition
of an experiment, which means that there are no other essential features of
an experiment beside these twO.
What do es it mean to say that a researcher "controls" the value of the
independent variable that the participants receive? Ir means, most important, that the values of the independent variable that the participants receive
are not determined either by the participants themselves or by nature. In
our example of our blood-pressure drug, this requirement means that we
cannot compare people who, by their own choice, airead y take the drug
with those who do not (in this case the choice of whether or not to take the
drug is a Z variable that may exert an influence on Y separate from X). Ir

.·fj EXPERIMENTAL RESEARCH DESIGNS
Suppose that you were the CEO of a pharmaceutical company, and your
scientific team te lis you that they Jtave just discovered a new drug that
will help' lower blood pressure. The pharmacists tell you that they have
..

",-.

70

71

Research Deslgn

4.2 Experimental Research Deslgns

means that we, the researchers, have to decide which of our experimental
participants wiJI take the drug and which ones will noto
But the definition of an experiment has one other essential component
as well: We, the researchers, must not only control the values of. t~e independent variable, but we must also assign those values to partlclpants
randomly. In the context of our drug-testing example, this means that we
must toss coins, draw numbers out of a hat, use a random-n~ber generaror, or sorne other such mechanism to ensure that our partlclpants are
divided into a treatment group (who will receive our drug) and a control
group (who will not receive the drug, but will instead presumably receive a

Aspirin regimen

placebo).
.,
.
What's the big deal here? Why is randomly asslgnmg subJects to treatment groups important? What scientific benefits arise from the random
crucial,
assignment of people to treatment groups? To see why this is
recal! that we have emphasized that all science is about compansons and
also that every interesting phenomenon worth exploring - every interesting dependent variable - is caused by many factors, not just one. Random
assignment to treatment groups ensures that the comparison we make between the treatment group and the control group is as pure as possible
and that sorne other cause of the dependent variable (Z) will not pollute
that comparison. By first taking a group of participants and then randomly
splitting them into two groups on the basis of a coin flip, what we ha ve
ensured is that the participants will not be systematically different from one
another. Indeed, in ffie aggregate - and provided that the participant pool
is reasonably large - randomly assigning participants to treatment groups
ensures that the groups, as a whole, are identical. If the two groups are
identical, save for the coin flip, then we can be certain that any d~fferences
we observe in the groups must be beca use of the independent vanable that
we have assigned to them.
Retuen to our drug-trial example. An experiment involving our new
blood-pressure drug would involve finding a group of people - however
obtained - and then randomly assigning them to receive either the new
drug or a placebo. We fully realize that there are other causes of low and
high blood pressure and that our experiment does not negate those factors.
In fact, our experiment will have nothing whatsoever to say about those
other causes. What it will do, and do well, is to determine whether our drug
has an effect on blood pressure.
Contrast the comparison that results from an experiment with a comparison that arises from a nonexperiment. (Be patient. We'lI talk all about
nonexperimental designs in the next section.) Suppose that the makers of a
particular brand of aspirin wanted to test the claim that people who take
their.aspirin ha ve lower blood pressure than people who don't take their

Figure 4.1. The possibly confounding effects of a healthy lifestyle on the aspirin-bloodpressure relationship.

8100d pressure

Healthy lifestyle

S?

-;>

aspirin. Let's even assume that they conduct a random-sample survey of
adults, and that people answer the survey truthfully about their aspirin intake and blood pressure. If there is an e!evated rate of high blood pressure
in the nonaspirin group compared with that of the aspirin takers, do es that
mean that aspirin caused - see that word again - people to have lower
blood pressure? No.l Why not? Because aspirif!-takers and non-aspirintakers might be systematically different. What does that mean? People who
take a daily aspirin are more likely to be more health conscious than nonaspirin-takers. In this instance, the leve! of health consciousness could be
an important Z variable. These individuals Íike!y exercise more and eat a
healthier dieto Both of these things, of course, are probably asso~iated with
lower blood pressure. What this means is that the comparison between
aspirin-takers and non-aspirin-takers is potentially misleading beca use it is
confounded by other factors like diet and exercise; So is the lower blood
pressure the result of the aspirin, or is it the result of the better diet and
increased exercise that aspirin-takers also benefit from? Because this particular nonexperimental research design does not answer that question, it do~s
not clear our fourth causal hurdle. It is impossible to know whether it was
the aspirin that caused the lower biood pressure. In this nonexperimental
design just described, because there are other factors that influence blood
pressure - and, critically, because these factors are also re!ated to whether
or not people take aspirin - it is very difficult to say conclusive!y thar the
independenr variable (aspirin intake) causes the dependent variable (blood
pressure). Figure 4.1 shows this graphically.
Here is where experiments difter so drastically from any orher kind
of research designo What experimental research designs accomplish by way
1 Technically, of course, aspirin may or may not cause changes in blood pressure. But even

if it does, the evidence just described does not prove it.

72

Research Deslgn

of random assignment to treatmept groups, then, is to decontaminate the
comparison between the treatment and control group of all other influences.
Before any stimulus (like a drug or placebo) is administered, all of the
participants are in the same pool. Researchers divide them by using sorne
random factor like a coin flip, and that difference is the only difference
between the two groups.
Think of it another way. The way that the confounding variables in
Figure 4.1 are correlated with the independent variable is highly improbable
in an experimento Why? Because if X is determined by randomness, like
a coin flip, then (by the very definition of randomness) it is exceedingly
unlikely to be correlated with anything (including confounding variables Z).
When researchers control and assign values of X randomly, the comparison
between the different groups wi).l not be affected by the fact that other
factors certainly do cause Y, the dependent variable.
Connect this back· to our d¡scussion from Chapter 3 about how researchers attempt to establish wpether sorne X causes Y. As we will see,
experiments are not the only method that help researchers cross the four
causal hurdles, but they are uniquely capable in accomplishing that task.
Consider each hurdle in turno Fíest, we should evaluate whether there is
a credible causal mechanism befare we decide to ron the experimento It is
worth noting that the crossing of this causal hurdle is neither easier nor
harder in experiments than in nonexperiments. Coming up with a credible
causal scenario that links X to Y heightens our dependence on theory, not
on data or research designo
Second, in an experiment, itis impossible for Y to cause X - the second
causal hurdle - for two reasons. First, assigning X occurs in time before
Y is measured, which makes it impossible for Y to cause X. In addition,
as previously noted, if X is generated by randomness alone, then nothing
(including Y) can cause it.
Establishing, third, whether X and Y are correlated is easily done in
any research design (as we will see in Chapter 8). What about our fourth
causal hurdle? Is there sorne Z that is related to both X and Y that makes
the association between X and y spurious? Experiments are uniquely wel!
equipped to help us answer this question definitively. An experiment does
not, in any way, elimina te the possibility that a variety of other variables
(that we cal! Z) might also affect Y (as well as X). What the experiment
does, through the process of ran~omly assigning subjects to different values
of X, is to equate the treatment"and control groups on all possible factors.
On every possible variable, whether or not it is related to X, or to Y, or
to both, or to neither, the treatment and control groups should, in theory,
be identical. That makes the comparison between the two values of X

73

4.2 Experimental Research Designs

unp.olluted by any possible Z variables because we expect the groups to be
eqUlvalent on all values of Z.

::¡

. I~ Our examp!e. of the new drog and blood pressure, by experimentally
asslgnmg our partlclpants to treatment (drug) and control (placebo) groups,
we do not deny that diet and exercise, for example, might affect blood
pressure; we merely neutralize those influences. How? Say that sorne of
our participants are triathletes, who likely have low blood pressure, and
ot~ers. are couch ~o.tatoes, who likely have high blood pressure. Randomly
aSSlgnmg al! partlclpants - triathletes and couch pota toes alike _ to the
drug and placebo groups will neutralize the effects of their exercise (or lack
thereof) on the aggregate blood-pressure statistics for the two groups. Why?
Because, by randomly assigning all participants to the drug or the placebo
group, we would expect, on average, half of the triathletes to be in the drug
group and half in the placebo group. Likewise, we would expect half of
our couch pota toes to be in the treatment group, and haIf in the placebo
group. Thus, when the treatment and control groups' rates of high blood
pressure are compared, the effects of different amounts of exercise will not
mislead us into thinking that the drug does (or does not) have an effect.
Remarkably, the experimental ability to control for the effects of outsi de variables (2) applies to all possible confounding variables, regardless
of whether we, the researchers, are aware of them. Let's make the exampIe downright preposterous. Let's say that, 20 years from now, another
tea m of scientists discovers that having attached (as opposed to detached)
earlobes cau~es people to have low blood pressure. Does. that possibility
threaten the mference that we draw from our experiment about our drug
and blood pressure today? No, not at all. Why not? Because, whether or
not we are aware of it, the random assignment of participants to treatment
groups means that, whether we are paying attention to it or not, wc would
expect our treatment and control groups to ha ve equal numbers of people
with ~ttached earlobes, and for both groups to ha ve equal numbers of pe 0pie wlth detached earlobes. The key element of an experimental research
design ~.randomly assigning subjects to difierent values of X, the independent vanable - controls for every Z in the universe, whether or not we are
aware of that Z.
Together, aIl of this means that experiments bring with them a partic-

ularl~ st~ong confidence !n the causal inferences drawn from the analysis.
In sClentlfi~ parlance, thls is cal!ed internal validity. If a research design
pr?duces hlg~ le~els of confidence in the concIusions about causality, it is
sald to have hlgh Internal validity. Conversely, research designs that do not
allow for particularly definitive concIusions about whether X causes Y are
. said to have low degrees of internal validity.

74

75

Research Design

r:4;2.~~. J "Random Assignment" versus "Random Sampling"
It is critical that you do not confuse the experimental process of randomly
assigning people to treatment groups, on the one hand, with the process
of randomly sampling people for participation, on the other hand. They
are entirely different, and in fact have nothing more in common than that
six-Ietter word "random." They are, however, quite often confused for
one another. Random assignment to treatment groups occurs when the
participants for an experiment are assigned randomly to one of several
possible values of X, the independent variable; imporrantly, this definition
says nothing at aIl about how thesubjects were selected for participation.
But random sampling is, at its very heart, about how researchers select
people for participation in a study - they are selected at random, that is,
every member of the underlying populatiQn has an equal probability of
being se!ected. (This is common in survey research, for example.) Mixing
up these two critical concepts will produce a good bit of confusion.
4.2.2

Are There Drawbacks to Experimental Research Designs?

Expcriments, as we have seen, have a unique ability to get social scientists
across our hurdles needed to establish whether X causes Y. But that does not
mean they are without rheir disadvantages. Many of these disadvantages
are related to the differences between medical and physical sciences, on the
one hand, anuthe social sciences, on the other. We now discuss four such
drawbacks to experimentation.
First, especiaIly in the. social sciences, not every independent variable
(X) is controIlable and subject toexperimental manipulation. Suppose, for
example, that we wish to study the effects of gender on political participation. Do men contribute more money, vote more, volunteer more in
campaigns, than women? There are a variety of nonexperimental ways to
study this relationship, but it is impossible to experimentaIly manipula te a
subject's gender. RecaIl that the definition of an experiment is that the researcher both controls and randomly assigns the values of the independent
variable. In this case, the presumed cause (the independent variable) is a
person's gender. Compared with drugs versus placebos, assigning a participam's gender is another matter entirely. Ir is, to put it mildly, impossible.
People show up at an experiment either male or female, and it is not within
the experimenter's power to "randomly assign" a participant to be male or
female.
This is true in many, many political science examples. There are simply myriads of substantive problems that are, impossible to study in an
experif!lental fashion. How does a person's partisanship (X) affect his issue

.~

,

'.(

4.2 Experimental Research Designs

opinions (Y)? How does a person's income level (X) affect her campaign
contributions (Y)? How does a country's leve! of democratization (X) affect its openness to international trade (Y)? How does the level of military
spending in India (X) affect the level of military spending in Pakistan (Y) _
and, for that matter, vice versa? How does media coverage (X) in an election campaign influence voters' priorities (Y)? In each of these examples
that intrigues social scientists, the independent variable is simply not subject
to experimental manipulation. Social scientists cannot, in any meaningful
sense, "assign" people a party identification or an income, "assign" a country a level of democratization or level of rnilitary spending, or "assign" a
campaign-specific, long-term amount of media coverage. These variables
simply exist in nature, and we cannot control exposure to them and randomly assign different values to different cases (that is, individual people
or countries).
A second potential disadvantage of experimental research designs is
that experiments often suffer fromIow degrees of external validity. We
have noted that the key strength of experiments is that they typicaIly have
high levels of internal validity. That is, we can be quite confident that the
conclusions about causality reached in the analysis are not confounded
by other variables. External validity, in a sense, is ihe other si de of the
coin, as it represents the degree to wliich we can be confident that the
results of our analysis apply not only to the participants in the study,
but also to the population more broadl y construed. RecaIl that there is
nothing whatsoever in our definition of an experiment that describes how
researchers recruit or select people to participate in the experimento To
reiterare: lt is absolutely not the case that experiments require a random
sample of the target population. Indeed, is extremely rare for experiments
to draw a random sample from a population. 2 In drug-trial experimenfs,
for example, it is common to place advertisements in newspapers or on the
radio to invite participation, usualIy involving sorne form of compensation
to the participants. ClearIy, people who see and respond to advertisements
like this are not a random sample of the population of interest, which is
typicaIly thought of as aIl potential recipients of the drug. Similarly, when
professors "recruit" people from their (or their coIleagues') classes, the

:r

2 Since 1990 or so, however, there has been a growing movement in the field oE survey
research - which has always used random samples oE the population - to use computers
in the interviewing process that ineludes experimental randomization oE variations in
survey questions, in a technique called a "survey experiment." Such designs are intended
~o reap the ?~nefits oE both random assignment to treatment groups, and hence ha ve high
mternal valtdJty, as well as the benefits oE a random sample, and hence have high external
validity. See Piazza, Sniderman, and Tetlock (1990) and Sniderman and Piazza (1993).

1:
"

;~

11
"

¡;;.
76

1 ,;

77

Research Design

participants are not a random sample of any population. 3 The participant
pool in this case represents wqat we would call a sample of convenience,
whích is to say, "this ís more or less the group of people we could beg,
coerce, entice, or cajole to participate."
With a sample of convenience, it is simply unclear how, if at all, the
results of the experiment generalize to a broader population. As we will
leaen in Chapter 7, this is a critical issue in the social sciences. Because
most experiments make use ofsuch samples of convenience, with any single experiment, it is difficult to know whether the results of that analysis
are in any way typical of what we would find in a different sample. With
experimental designs, then, sc¡entists leaen about how their results apply
to a broader population through the process of replication, in which researchers implement the same procedures repeatedly in identical form to
see íf the relationships hold in a consistent fashion.
Experimental research designs, at times, can be plagued with a third
dísadvantage, namely that they carry special ethical dilemmas for the researcher. Ethical issues about the treatment of human participants occur
frequently with medical experiments, of course. If we wished to study experimentally the effects of different types of cancer treatments on survival
rates, this would require obtaining a sample of patients with cancer and
then randomly assigning the patients to differing treatment regímens. This
is typically not considered acceptable medical practice. In such high-stakes
medical situations, most individuals value making these decisions themselves, in consultation with their doctor, and would not relinquish the
important decisions about thej.r treatment to a random-number generator.
Ethícal situations arise le~s frequentIy, and typically less dramatically,
in social science experimentation, but they do arise on occasion. During
the behavioral revolution in psychology in the 1960s, several famous experiments conducted at universities produced vigorous ethical debates. Psychologist Stanley Milgram conducted experiments on how easily he could
make individuals obey an authority figure. In this case, the dependent variable was the willingness of the participant to administer what he believed
to be a shock to another participant, who was in fact an employee of
Milgram's. (The ruse was that Milgram told the participánt that he was
tcsting how negative reinforcement - electric shocks - affected the "leaeníng" of the "student.") The independent variable was the degree to which

3 Think about that Eor a momento Experiments in undergraduate psychology or political

science classes are not a mndom sample oE 18- to-22 year olds, or even a random sample
of undergraduate students, or even a random sample oE students from your college or
university. Your psychology class is populated with people more interested in the social
sciences than in the physical sciences or engineering or the humanities.

4.3 Observational Studies (in Two Flavors)

Milgram conveyed his status as an authority figure. In other words, the X
that Milgram manipulated was the degree to which he presented himself as
an authority who must be obeyed. For sorne participants, Milgram wore a
white lab coat and informed them that he was a professor at Yale University.
For others, he dressed more casually and never mentioned his instítutional
affiliation. The dependent variable, then, was how strong the (fake) shocks
would be before the subject simply refused to go on. At the highest extreme,
the instrument that delivered the "shock" said "450 volts, XXX." The results of the experiment were fascinating beca use, to his surprise, Milgram
found that the great majority of his participants were willing to administer
even these extreme shocks to the "learners." But scientific review boards
consider such experiments unethical today, beca use the experiment created
a great degree of emotional distress among the true participants.
A fourth potential drawback of experimental research designs is that,
when interpreting the results of an experiment, we sometimes make mistakes of emphasis. If an experiment produces a finding that sorne X docs
indeed cause Y, that does not mean that that particular X is the most
prominent cause of Y. As we have emphasized repeatedly, a variety of
independent variables are causally related to every interesting dependent
variable in the social sciences. Experimental research designs often do not
help to sort out which causes of the dependent variable have the largest
effects and which ones have smaller effects.
• . , . OBSERVATIONAL STUDIES (IN TWO FLAVORS)

--~

Taken together, the drawbacks of experiments mean that, for any given political science research situation, implementing an experiment often provcs
to be unworkable, and sometimes downright impossible. As a result, experimentation is not the most common research design used by political science
researchers. In sorne subfields, such as political psychology - which, as the
name implies, studies the cognitive and emotional underpinnings of political decision making - experimentation is quite common. And it is becoming
more common in the study of public opinion and electoral competition. But
the experiment, for many researchers and for varying reasons, remains a
tool that is not applicable to many of the phenomena that we seek to study_
Does this mean that researchers have to shrug their shoulders and
abandon their search for causal connections before they even begin? Not
at all. But what options do scholars have when they cannot control exposure to different values of the independent variables? In such cases, the
only choice is to take the world as it already exists and make the comparison between either individual units - like people, political parties, or
countries - or between an aggregated quantity that varies over time. These

,
j

,

78

Research Design

represent two variants of what is most commonly called an observational
study. Observational studies are not experiments, but they seek to emulate them. They ar~ known as observational studies beca use, unlike the
controlled and somewhat artificial nature of most experiments, in these
research designs, researchers simply take reality as it is and "observe" it,
attempting to sort out causal connections without the benefit of randomly
assigning participants to treatment groups. Instead, different va!ues of the
independent variable a!ready exist in the world, and what scientists do is
observe them and then evaluate their theoretical claims by putting them
through the same four causal hurd!es to discover whether X causes Y.
This leads to the definition of an observational study: An observational
study is a research design in which the researcher does not have control
over values of the independent variable, which occur naturally. However,
it is necessary that there be sorne degree of variability on the independent
variable between cases, as well as variation in the dependent variable.
Because there is no random assignment to treatment groups, as in experiments, sorne scholars claim that it is impossib!e to speak of causality
in observational studies, and therefore sometimes refer to them as correlationa! studies. Along with most political scientists, we do not share this
view. Certainly experiments produce higher degrees of confidence about
causal matters than do observational studies. However, in observational
studies, ifsufficient attention is paid to accounting for al! of the other
possible causes of the dependent variable that are suggested by current understanding, then \Ve can make informed evaluations of their confidence
that the independent variable does cause the dependent variable.
Observational studies, as this discussion implies, face exactly the same
four causal hurdles as do experiments. (Recall that those hurdles are present
in any research design.) So how, in observational studies, do we cross
these hurdles? The first causal hurdle - focusing on a credible mechanism
connecting X and Y - is identical in experimental and observational studies.
In an observational study, however, crossing the second causal hurdie - can we rule out the possibility that Y causes X? - can sometimes
be problematic. For example, do countries with higher levels of economic
development (X) have, as a consequence, more stable democratic regimes
(Y)? Crossing the second causal hurdle, in this case, is a rather dicey matter.
It is clearly plausible that having a stable democratic government makes
economic prosperity more likely, which is the reverse-causal scenario.
After all, investors are probably more comfortable taking risks with their
money in democratic regimes than in autocratic ones. Those risks, in turn,
likely produce greater degrees of economic prosperity. It is possible, of
course, that X and Y are mutually reinforcing - that is, X causes Y and Y
causes X.

79

4.3 Observational Studias (in Two Flavors)

The third hurdle - do X and Y covary - is no more difficult for an
observationa! study than for an experimento (The techniques for examining
relationships between two variables are straightforward, and you willlearn
them in Chapter 8.) But, unlike in an experimental setting, if we fail to find
covariation between X and Y in an observational setting, we should still
proceed to the fourth hurdle beca use the possibility remains that we will
find covariation between and X and Y when we control for sorne variable
Z (think back to ihe breast cancer example).
The most pointed comparison between experiments and observational
studies, though, occurs with respect to the fourth causal hurdle. The nearmagic that happens in experiments beca use of random assignment to treatment groups - which enables researchers to know that no other factors
interfere in the relationship between X and Y - is not present in an observational study. So, in an observational study, the comparison between
groups with different values of the independent variable may very well be
polluted by other factors, interfering with our ability to make conclusive
statements about whether X causes Y.
Within observational studies, there are two pure types - cross-sectional
observational studies, which focus on variation between spatial units for
a single time unit, and time-series observational studies, which focus on
variation within a single spatial unit over multiple time units. There are,
in addition, hybrid designs, but for sake of simplicity we will focus 011 the
pure types. 4 Before we get into the two types of observational studies, we
need to provide a brief introductioh to observational data.

-~

~1~U Datum, Data, Data Set
The word "data" is one of the most grammatically misused words in the
English language. Why? Because most people use this word as though it
were a singular word when it is, in lact, plural. Any time you read "the data
is," you have found a grammatical error. Instead, when describing data,
the phrasing should be "the data a~e." Get used to it: You are now one of
the foot soldiers in the crusadetto get people to use this word appropriately.
It will be a long and uphill battle.
The singular form of the word data is "datum." Together, a collection
of datum produces data or a "data set." We define observatio~l data sets
by the variables that they contain and the spatial and time units over which
they are measured. Political scientists use data measured on a variety of
.. The c1assic statements of observational studies appeared in 1963 in Donald Campbell
and Julian Stanley's seminal work Experimental and Quasi-experimental Designs for
Research.

80

81

Research Design

4.3 Observationai Studies (in Two Flavors)

Table 4.1. Example of cross'sectional data

Table 4.2. Example of time,series data

Nation'
Finland
Denmark'
United States
Spain
Sweden' ,
Belgium;.
Japan ,.' .
New Z~aiaÍl.d
Ireland "
ltaiy
Portugai
Norway
N etherlands
Gennany
Canada
Greece
France
Switzerland
United Kingdom
Australia

2002.01 " .
2002.02" .,:,
2002.03 .:'
2002.04
2002.05
2002.06
2002.07
2002.08. '
2002.09 -,
2002.10
2002.11
2002.12

~., i~~:~~l~:~;~::::~;,::"·' , ' ,

,r

','

3.2
¡;2~7

.·¡i;~~¿t::':

2.4
1.4
'e
0.6
;" :;;"'6.9

42.6' ' :

4.7
2.1
'L7
2.1
0.9
6.3
2.1
2.8
0.0
3.1
2.6

',''!.

~.6

',',

2p

23.6

~.7

26.9
1~.4

P
8.2
63.6
23.8



difIerent spatial units. For instance, in survey research, the spatial un~ is
the individual survey responden,!. In comparative U.S. state government
studies, the spatial unit is the U.S. state. In international relations, .!!!e
spatial unit is oiten the nation. Commonly studied time units are mon'!ps,
quarters, and years. It is also common to refer to the spatial and time units
that define data sets as the data set dimensions.
Two oí the most common types ol data Tets correspond directly to the
two types oí observational studies that we just introduced. For cross-section
quasi-experiments, researchers analyze cross-sectional data to determine
whether the third causal hurdle has been c1eared. For instance, ~
presents a cross-sectional data set in which the time unit is the year 1972
and the spatial unit is nations. These data could be used to test the theo.!f
that unemployment percentage (Xl -+ government debt as a ercenta e of
gross natlOna product ( ).
For time-series observational studies time-ser data ar n
d to
determine whether the third causal hurdle has been c1eared. .Jhese_,data
contain measures of X and Y acrosS time for a single spatial unit. For
lnst'¡\'nce, Table 4.2 displays a time-series data set in which tbe spatial !!nit
is the• Unit~lStates"and the time unit is months.
We could use these ~-ªt~ to_
..--r-'

':":~

,

83.7
82.0

79.8
'76.2
76.3
.73.4
. ,71·6
',' ~6.5
67.2
65.3
65.5
62.8

Inflation
1.14
1.14
1.48
L64
1.18
1.07
1.46
1.80.
1.51 .
2.03
2.20 '
2.38

test the theory that inflation (X) -+ presidential approval (Y). I!l a data set ..
researchers analyze only those data or data points that contain measured
values for both the independent variable (Xl and the dependent variable
( (Y) to determine whether the third causal hurdle has been c1eared.
rK~~1 Cross-Sectional Observational Studies

As the name implies, a cross-sectional observational study examines a cross
section of social realitf, focusing on yariatjan between individual spqtial
units - again, like citizens, elected oíficials, voting districts, or cQuntries and explaining the variation in the dependent variable across. them.
For example, what, if anything, is the connection between the p.Jeferences of the voters from a district (Xl and a representative's voting behavior
(Y)? In a cross-sectional observational study, the strategy that a repearcher
~ould pursue in answering this guestion involves comparing the aggregated
preferences oí voters from a variety oí districts (Xl witlukYo.ting record s
of the representatives (Yl. Such an analysis, of course, would ha ve to be
observational, instead oí experimental, beca use this particular X is not at
al! subject to experimental manipulation. Such an analysis might take place
within the confines oí a single legislative session, for a variety oí practical
purposes (such as the absence of turnover in seats, which is an obviously
complicating factor).
Bear in mind, oí course, that observational studies have to cross the
same íour casual hurdles as do experiments. And we have noted that,
unlike experiments, with their random assignment to treatment groups,
observational studies will often get stuck on our íourth hurdle. That might
indeed be the case here. Assuming the other three hurdles can be c1eared ,

¡
j

j

.~,
'j

'~

'1

Research Design

82

consider the possibility that there are confounding variables that cau,!e y
and are al so correlated wlth X, whích make the X-y connecríon spurious.
(Can you think of any such factors?) How do cross-sectional observatignal
studies deal with this critical issue? The answer is that, in most cases, th~
can be accomplished through a series of ratríer strai Titforward stati§.tical
contro s. In particular, in Chapter 10, you willlearn the most common
social science research tool for "contro11ing for" other possible causes of
Y, namely the multivariate regression mode!. What you willlearn there
is rhar multivariare regression can allow researchers to see how, if at all,
controlling for another variable (like Z) affects the relationship between X
and Y.
e

4.3.3

j

Time-Series Observational Studies

The other major variant of observational studies is the time-serie§..pbservational study, which has, at its heart, a comparison over time within a ~gle
spatlal UnIt. Unlike in the cross-sectional varie ,which examines relationips between variables across individual units t ica11 at a sin le time
pOlOt, m t e time~series observational stud
olitical scientists pica11y
examme the variation within one spatial unit over time.s
For example, how, if at aH, do changes in media coverage @out the
econ~my (X) affect"public concern about the economy (Yl?6 To be a ~it
more specific, whenthe media spend more time talkin about the ot nual
problem of inflation, oes t e pu ic show more concern about inflation,
and when the media spend less time on the subject of inflation, does public
concern about inflarion wane? We can measure these variables in aggregate
terms that vary over time. For example, how many stories about inflation
make it onto the nightly news in a given month? It is almost certain ~
that quantity will not be the same each and every month. And how much
concern does the public show (through opinion polls, for example) about
inflation in a given month? Again, the percentage of people who identify
inflation as a pressing problem wi11 almost certainly vary from month to
month.
Of course, as with its cross-sectional cousin, the time-series observational study will require us to focus hard on that fourth causal hurdl~Are
there any other variables (Z) that are related to the varyin volum of news
and pu ic concern about inflationjY)? (The
coverage a out m atlon
third exercise at the end of this chapter will ask for your thoughts on this
subject.) If we can identify any other possible causes of why the public is

s

s The spatial units analyzed in time-series observational studies are usually aggregated.
6 Se~ Iyengar and Kinder (1987).

83

4.4 Summary

sometimes more concerned about inflation, and why they are sometimes
less concerned about it, then we wil! need to control for rhose factors in
our analysis.

t"1~~~ The Major Difficulty with Observational Studies
We noted that experimental research designs carry sorne drawbacks with
them. So, too, do observational Studies. Here, we focus only on one, but
it is a big one. As the preceding examples demonstrate, when we ~..tD
control for the other possible causes of Y to cross the fourth causal hurdle, .
we need to control for 011 of them, not just one. 7 But how do we know
whether we have controHed for aH of the other possible causes of y.? In
many cases, we don't know that for certain. W~ed ro rey, of course. to
control statistica11y for aIl other possible causes that we can, which inyplves
carefully considermg tTie previous research on the subject and gathering as
much data on those other causes as is possible. But in many cases, w~will
simply be unable to do this perfectly.
What a11 of this means, in our view, is thar obseryational aoalysis 1Dl 1sr
be a bit more tentatiye in jrs pronouncements about causality. Indeed, if
we have done the very best we can to control for as many causes of Y, then
the most sensible conclusion we can reach, in many cases, is that X causes
Y. But in practice, our conclusioÍls are rarely definitive, and subsequent
research can modify them. That can be frustrating, we know, for students
to come to grips with - and it can be frustrating for researchers, too. But
the fact that conclusive answers are difficult to come by should only make
us work harder to identify other causes of Y.

SUMMARY

For almost every phenomenon ol interest to political scientists, there is
more than one form of research design that they could implement to address
questions of causal relationships. Before starting a project, researchers need
to decide whether to use experimental or observational methods; and if
they opt for the latter, as is cominon, they have to decide what type of
observational study to use. And sometimes researchers choose more than
one type of designo
Different research designs help shed light on different questions. Focus,
for the moment, on a simple matter like preferences for a more liberal
7 As we will see in Chaptee lO, technically we peed 09<ootrol only foe the factors that m¡g)¡t
~ffect Y and are also related to X. In gractice, though, that is a very difficult distinction

t?~.

84

Research Design

85

or conservative government policy. Cross-sectional and time-series approa ches are both useful in this respecto They simply address different types
of substantive questions. Cross-sectional approaches look to see why some
other individual s prefer more liperal government policies, and why some individuals prefer more conservative government policies. That is a perfecdy
worthwhile undertaking for a political scientist: What causes some people
to be liberals and others to be' conservatives? But consider the time-series
approach, which focuses on why the public as an aggregated whole prefers
a more liberal or a more conservative government at different points in
time. That is simply a different question. Neither approach is inherendy
better or wqrse than the other, but they both shed light on different aspects
of social reality. Which design researchers should choose depends on what
type of question they intend to ask and answer.

3. In the section on time-series observational studies, we introduced the idea of
how varying levels of media coverage of inflation (X) might cause variation in
public conceen about inflation (Y). Can you think of any relevant Z variables
that we will need to control for, statistically, in such an analysis, to be confident
that the relationship between X and Y is causal?
4. In the previous chapter (specifically, Sections 3.2 and 3.3), we gave examples
of re~earch ~roblems. For each of these examples, identify the spatial unit(s)
and time umt(s). For each, say whether the study was an experiment, a crosssectional observational study, or a time-series observational study.

5. Table 4.1 presents data for a test of a theory by use of a cross-sectional observational study. If this same theory were tested by use of a time-series observational
study, what would the data table look Iike?
6. Compare the two designs for testing the preceding theory. Across the two
forms of observational studies, what are the Z variables for which you want to
control?
7. Table 4.2 presents data for a test of a theory by use of a time-series observational
study. If this same theory were tested by use of a cross-sectional observational
study, what would the data table look Iike?

CONCEPTS INTRODUCED IN TWS CHAPTER

aggregate
control group
correlational studies
cross-sectional observational studies
data
data points
data set dimensions
datum
experiment
external validity
internal validity

observational study
random assignment to treatment
groups
random sampling
replication
research designs
sample oE convenience
spatial units
time units
time-series observational studies
treatment group

8. Compare the two designs for testing the preceding theory. Across the two
forms of observational studies, what are the Z variables for which you want to
control?

EXERCISES

1. Consider the following proposed re!ationships between an independent and
a dependent variable. In each case, would it be realistic for a researcher to
perform an experiment to test the theory? If yes, briefly describe what would
be randomly assigned in the experiment; if not, briefly explain why noto
(a) An individual's leve! of religiosity (X) and bis or her preferences for different political candidates (Y)
(b) Exposure to negative political news (X) and political apathy (Y)
(c) Military service (X) and attitudes toward foreign policy (Y)
(d) A speaker's personal characteristics (X) and persuasiveness (Y)
2. Consider the relationship between education leve! (X) and voting turnout (Y).
How would the design of a cross-sectional observational study differ from that
of a. time-series observational study?

Exercises

-:X-

87



Measurement

OVERVIEW

Although what political scientists care about is discovering whether causal
relationships exist between concepts, what we actually examine is statistical associations between variables. Therefore it is critical that we have
a clear understanding of the concepts that we care about so we can measure them in a valid and reliable way. In this chapter we focus on several
examples from the political science literature, such as the concept of political tolerance. We know that political tolerance and intolerance is a "real"
thing - that it exists to varying degrees in the heaIts and minds of people.
But how do we go about measuring it? What are the implications of poor
measuremEmt?

1 know it when 1 see it.
- Associate Justice of the United Sta tes Supreme Court Potter Stewart, in
an attempt to define "obscenity" in a concurring opinion in Jacobel/is v.
Ohio (1964)
-

These go to eleven.
- Nigel Tufnel (played by Christopher Guest), describing the volume
knob on his amplifier, in the movie This Is Spinal Tap

llDI WHY MEASUREMENT MATTERS
We have emphasized the role of theory in olitical science. That is, we care
a out causa re ationships between concepts that interest us as political
sdentists. At this point, you are hopefully starting to develop theories of
your own"about politics. If these original theories are in line with the rules

86

¡

5.1 Why Measurement Matters

of the road that we laid out in chapter 1, they will be causal, general, and
parsimonious. They may even be' elegant and elever.
But at this point, it is worth pausing and thinking about what a theory really is and is noto To help us in this process, take a look baek at
Figure 1.2. A theory, as we have said, is merely a conjecture about the
possible causal relatlonship between two or more coneept~ As scientists,
we must always resist the temptation to view our theories as somehow
supported until we have evaluated evidence from the real world, and unti! we have done everything we can with empirical evidence to evaluate
how we11 our theory does through the four causal hurdles we identified
in Chapter 3. In other words, we cannot evaluate a theory until we have
gone through the rest of the process depicted in Figure 1.2. This chaptet.
deals with o erationaliza i
r the movement of variables from the rath r
a stract conceptuallevel to the very real measured leveI. We can condu~t
"hypothesis tests and make reasonable evaluations of our theorie ni ft r we have gone care 11y through this important process wjth all of our
variables.
If our theories are statements about relationships between concepts,
when we look for evidence to test our theories, we are immediately confronted with the reality that we do not actually observe those concepls.
Many of the concepts that we care about in political science, as we wiIl
( see shordy, are inherently elusive and downright impossible to ob~erve
empiricalIy in a direct way, ~nd sometimes incredibly difficult to measure
quantitatively.

In this chapter, we describe the problem of measurement and the importance oí measurmg the concepts we are interested io. as precisely as
( possible. During this process, you will learn some thinking skills about
~ng the measurement strategies of scholarship that you read, as welI
as learn about creating measures of your own.
We begin with a seetion on measurement in the social seiences generally. We foeus on examples from economics and psychology, two social
sciences that are at rather different levels of agreement about the measurement of their major variables. In political science, we have a complete range
of variables in terms of the levels of agreement about their measurement.
In the remaining sections we discuss the core concepts of measurement and
give sorne examples from polítical science research. Throughout our diseussion of measurement, we focus on the measurements oí variables thar take
on a numeric range of values we leel comfortable treating the way that we
normaIly trear numerie values. In Chapter 6 we will discuss this further and
focus on sorne variable types that can take different types of nonnumeric .
values.

88

Measurement

_ti SOCIAL SCIENCE MEASUR~NT: THE VARYING CHALLENGES
OF QUANTIFYlNG HUMANITY

Measurement is a "problem" in a11 sciences - from the physical sciences
of physics and chemistry to the social sciences of economics, political scié!1ce , psychology, and the res.!:, But in the physical sciences, tbe probJem af
measurement is often reduced to a roblem of instrumentation in whi
SClentlsts eve op we -s ecified rotocols for measurin sayo the amQunt
óf gas re eased in a chemical reaction or the amount of light given Qff by
a star. The social sciences, by contrasto are younger sclence~ and scientific
~~nsus on how to measure our im ortant conce ts is rar«. Perhaps more
crucla ,t oug ,is t e fact that the social sciences deal with an inherently
difficult-to-predict subject matter: human heings.
The problem of measurement exists in a11 of the social sciences. It
would be wrong, though, to say that it is equa11y problematic in a11 of
the social science disciplines. Some disciplines pay little heed to issues of
measurement, whereas others are mired nearly constandy in measurement
controversies and difficulties.
Consider the subject matter jn much research in economics: dollars (or
euros, or yen, or what have you). H the concept of interest is "economic
output" (or "GDP"), which is commonly defined as the total sum of goods
and services produced by labor and ro erty in a given time period, then
it is a re atively strai htforward matter to obtain an em irical observa ti n
ihat lS consistent with the concept of inte~t.l Such measures wiU not.be
controversial among the vast majority of scholars. To the contrary, once
e~nomists agree on a measure pf economic output, t~ey can move on to
the next (and more interestin s e . the s..cieruifi~ocess - to argue ahout
what orces cause greater or less growth in economic output. (That's _where
1the agreement among economists ends.)
I
Not every concept in economics is measured with such ease, however.
, Many economists are conceroed with poverty: Why are sorne indiviQ.uals
poor whereas others are not? What forces cause povert! to rise or falL~ver
time? Despite the fact that we all know that poverty lS a very real thmg,
;;suring who is poor and who is not poor turos out to be a bit tricky.
\
The federal government defines the concept of poverty as "a set of income
cutoffs adjusted for household size, the age of the head of the household,
and the number of children under age 18."2 The intent of the cutoffs is to
describe "minimally decent levels of consumption.,,3 T~e are diffig¡lties

I

1 For details about how the federal govemment measures GDP, see http://www.bea.gov.
2 See http://www.census.govlhhes/www/poverty/poverty.html.
3 Note a problem right off the bat: What is "minimally decent"?Do yo~ s~s~ect that wha:

qu~lified as "minimally decent" in 1950 or 1975 would be consldered mlmmally decent

89

5.2 Social Science Measurement

in obtaining empirical observations of poverty, though. Among them, ~on­
sider the reality that most Westero democracies (including the United S~ates)
have welfare states that provide transfer payments - in the form 'pf cash
payments, lood stamps, or services like subsidized health care their
citlzens e ow sorne income threshold. Such programs, of course, are designed to minimize or eliminate the problems that afflict the poor. When
economists seek to measure a person's income level to determine whether
or not he is poor, should they use a "pretransfer" definition of income a person's or family's income level before receiving any transfer payments
from the government - or a "posttransfer" definition? Either choice carries sorne negative consequences. <;hoosing a pretransfer definition oL.income gives a sense of how much the private sector of the economy is
failing. On the other hand, a posttransfer definition gives a sens.e...of how
much welfare-state programs are falling short and how people are a<;tually IivlOg:,..As the Baby Boom generation in the United States continues
to age, though, more and more people are retiring from work. Using a
pretransfer measure of poverty means that researchers will not consider
Social Security payments - this country's largest source of transfer payments by far - and therefore the (pretransfer) poverty rate should grow
rather steadily over the next few decades, regardless of the health of the
overall economy. This might not accurately represent what we mean by
"poverty" (Danziger and Gottschalk 1983).
.H, owing to their subject matter, economists rarely (but occasionally)
have measurement obstacles, the opposite end of the spectrum would be
the discipline of psychology. The subject matter of psychology - human
behavior, cognition, and emotion - is rife with concepts that are extrellle1y
difficult to measure. Consider a few examples. We all know that the CQoccpt
of "depression" is a real thing; sorne individuals are depressed. and otbcrs
are not. Sorne individuals who are depressed today will not be depressed as
~e passes, and sorne who are not depressed today will become depres§ed.
Yet how is it possible to assess scientificall whether a erson is oc is not
depresse .
y oes it matter if we measure depression accurately? Recall
the scientific stakes described at the beginning of this chapter: Ii we doo.'t
measure depression well. how can we know whether remedies Jike clinical
today? This immediately raises issues of how sensible it is to compare the poverty rates
from the past with those of today. If the Hoor of what is considered minimally dcccnt
continues to rise, then the comparison is problematic at best, and meaningless at worst.
4 Since 1952, the American Psychological Association has published the Diag1lostic and
Statistical Manual of Mental Disorders, now in its fourth edition (called DSM-IV), which
diagnoses depression by focusing on four sets of symptoms that indicate depression: mond,
behavioral symptoms such as withdrawal, cognitive symptoms such as concentration, a nd
soma tic symptoms such as insornnia.

90

Measurement

91

therapy or chemical antidepressants are effectiye~5 Psychology deals with a
variety of other concepts that are notoriously slippery, such as the clinical
focus on "anxiety,':'-or the social-psychological focus on concepts such
as "stereotyping" or "prejudke" (which are also of concern to political
scientists).
Political science, in our view, líes somewhere between the extremes
of economics and psychology in terms of how freguently we encounter
serious measurement problems Sorne subfields in political science operate
relatively free of measurement problems. The study of political economy 'Yhich examines the relationship between the econom and olitical for es
suc as government po icy, elections, and consumer confidence - has much
die same feel as economics, for obvious reasons. Other subfields encouoter
measurement problems regularly. The subfield of olitical s cholo
vi ic stu ies t e way that individual citizens interact with the polítical
world - shares much of the same subject matter as social s cholo , and
hence, beca use o its ocus on the attitudes and feelings of people, it shares
much of social psychology's measurement troubles.
Consider the foHowing list of critically important concepts in the discipline of polítical science that have sticky measurement issues:
o

o

o

l

Judicial activism: In the United Sta tes, the role of the judiciary in the
policy-making process has always been controversia\. Sorne view the
federal coúrts as the protectors of important civil liberties, whereas
others viétv the courts as a threat to democracy, because judges are
not elected. How is it possible to identify an "activist judge" or an
"activist decision"?6
Congressional roll-callliberalísm: With each successive session of the
_U.S. Congress, commentators often compare the leve! of liberalism
and conservatism of the present Congress with that of its most recent
predecessors. How do we know if the Congress is becoming more or
less liberal over time (Poole and RosenthaI1997)?
Political legitimacy: How can analysts distinguish between a "Iegitimate" and an "iIIegitimate" government? The key conceptual issue is more or less "how citizens evaluate governmental authoriry"
(Weatherford 1992). Sorne view it positively, others quite negatively. Is

5.3 Problems in Measuring Concepts of Interest

legitimacy something that can objectively be determined, or is it an inherently subjective property among citizens?
• Political sophistication: Sorne citizens know more about politics and
are better able to process polítical information than other citizens
who seem to know litde and care less about political affairs. How
do we distinguish políticaHy sophisticated citizens from the politicalIy
unsophisticated ones? Moreover, how can we teH if a society's level of
political sophistication is rising or falling over time (Luskin 1987)?
o Social capital: Sorne societies are eharacterized by relatiyely high levels of intereonnectedness, with dense networks of relationships that
make the population cohesive. Other societies, in contrast, are characterized by high degrees of isolation and distrustfulnes~ How can we
measure what social scientists caH social capital in a way that enables
us to compare one society's level of connectedness with another's or
~ society's level of connectedness at varying points. in time (Putnam

IOnU)?
In Seetions 5.4 and 5.5, we describe the measurement controversies surrounding two other eoncepts that are important to political science democraey and polítical tolerance. But first, in the next seetion, we describe sorne key issues that political scientists need to grapple with when
measuring their concepts of interest.

Mjl

PROBLEMS IN MEASURING CONCEPTS OF INTEREST

We can summarize the problems óf measuring concepts of interest in preparation for hypothesis testing as follows: First, you need to make sure that
you have conceptual clarity. Next, settle üñ-a reasonable leve! of.measuremento FinaHy, ensure that your measure is both valid and reliable. After
you repeat this process for eách variable in your theory, you are ready to
test your hypothesis.
. Unfortunately, there is no clear map to follow as we go through these
steps wlth our variables. Sorne variables are very easy to measure. whereas
odiers, beeause of the nature of what we are trying to measure, will always
be elusive .• As we will see, debates over issues of rneasurement are at the
'eore of rnany interesting fields of study in polítical seience.

s In fact, the effectiveness of c1inical "talk" therapy is a matter of sorne contention among
psychologists. See "Married with Problems? Therapy May Not Help," The New York
Times, April19, 2005.
6 In this particular case, there could even be a disagreement over the conceptual definition
of "activist." What a conservative and a liberal would consider to be "activist" might
produce no agreement at aH. See "Activist, Schmactivist," The New York Times, August
15, ~004, for a journalistic account of this issue.

~~1~ Conceptual Clarity
The first step in measuring any phenomenon of interest to polítical scientists
is to ~ave a clear sense of what the eoncept is that we are trying ro measure.
In sorne cases, like the ones we subsequendy diseuss, this is an exceedingly

92

Measurement

revealing and difficult task. It requires considerably disciplined thought to
ferret out precisely what we mean by the concepts about which we are
theorizing. But even in sorne seemingly easy examples, this is more difficult
than might appear at first glance.
Consider a survey in which we needed to measure a person's in~me.
That would seem easy enough. Once we draw our sample 01 adult§., why
not just ask each respondent, "What is your income?" ~nd offer a raQge of
vaTues, perhaps in increments of $10,000 or so, on which respondents could
place themselves. What could be the problem with such a measure? Ima~ine
a 19-year-old college student whose parents are very wealthy, but who has
never worked herself, answering such a guestion. How much income has
that person eamed in the last year? Zero. In such a circumstance, thisjs the
true answer to such a question. But it is not uarticulaDY valic1 measure of
her income. We Iikely want a measure of incom!: that refl~~tuhe.fact that
her parents eam a good deal of money, which affordsJ!er_theJuxury of not
having to work her way through school as many other students do. That
measure should place the daughter of wealthy parents ahead of a relatively
poor student who works 40 hours a week and carries a fullload just to pay
her tuition. Therefore, we might reconsider our see}!11ngl~_simple question
and ask instead,-"What is the total amount of income earned in the most
recently completed tax year by you and any other adults in your household,
i~úding all sources of incorru:]" This measure puts the nonworking child
a'f-:;~;¡thy parents ahead of the student from the less-well-off family. And,
for most social science purposes, this is the measure of "income"_ that we
most theoretíáilIy u~{ul. 7
'
-' -At thls point,it is worth highlighting that the best measure of income as well as that of most other concepts - depends on what our theoretical
objectives are. The best measure of something as simple as a respondent's
income de~ends on what we inten~ to relate that measure to in our hypothesis testing.

wouTdliiid

~1~:~~] Reliability
-

- ~.f
'"

¡~~

,,
,>:..~
~:

\

An operational measure of a concept is said to be reliable to t~ent
that it is repeatable or consistent; that is, applying t~~~!llea~usement
rutes to the same case or observation \viii produce iden!!cal results. An
ünreliable measure, by contrast, would produce inconsistent results for the
-same observat~or obvious reasons, all scientists want IDeir measures
to be relia ble.

93

5.3 Problems in Measuring Concepts of Interest

Perhaps the most simple example to help you understancl_this is your
bathroom scale. Say you step up on the scale _~>ne moming and the scale
teIls you that you weigIl 150 pounds. You step down off the scale and
'ítretUrll¿-to zero. But have you ever not trusted that scale r.l:ading, and
thought to yourself, "Maybe if I hop back up on the scale, I'll get a numbcr
1 like better?" That is a reliability check. Ifyou (immedi~c:ly) ,step back on
the scale, and it tells you that you now weigh }46-PQ.ynd~]0\lL scale is
~Ií:i@e;because repeated meas-Uresorth~alll~E~se_-:- youLhody at thut
particular point in time - produced differ:.~n~r_e_sults.
To take our bathroom scale example to the extreme, we should not
confuse over-time variability with unreliability. If you wake up 1 week later
and weigh 157 instead of 150 that does not necessarily mean that you.I,.scale
IS unreliable (though that might be true). Perhaps you substituted french
fries for salads at dinner in the intervening week, and perhaps you exercised
less vigorously or less often.
Reliability is often an important issue when scholars need Jo code
events or text for quantitative analysis. For example, if a researcher was
trymg to code the text of news coverage that was favorable or uolliYorablc
foward a candidate for office, he would develo,}LS.QIlle-speci.fiu;oding rules
to apply to the text - in effect, to count certain ¡'eference~ªuithe.t~'pro"
or "con" with respect to the candidate. Suppose that, for the coding, the
researcher employs a group of students to code the text - a practice that
is common in political research. A re/iable set of coding rules would imply
that, when one student applies die rules to the text, the results wouldbe the
same as when another student takes the rules and applies them to the same
texto n unre/iable set of codin rules would im Iy the opposite, namely,
that when two different coders try to apply the same rules to the same news
artlcles, they reach different conclusions. 8 The same issues arise when one
codes things such as events by using newspaper coverage.9

~;~~ Measurement Bias and Reliability
S>ne of the concems that comes up with any measurement technigue is
~measurement bias, which is the systematic overreportil}lLor underrepQ[ting
,.!!f values for a variable. Although measurement bias is a serious problem
for anyone who wants to know the "true" values of variables for particular cases, it is less of a problem than you might think for theory-testing

8 Of course, it is possible that the coding scheme is perfectly reliable, but the coders them-

sellles are noto
7 The same issues would arise in assessing the income of retired people who no longer
parti~ipate

in the workforce.

9 There are a variety of tools for assessing reliability, many of which are beyond the scope

of this discussion.

94

Meaaurement

purposes. To better understand this, imagine that we have to choose between two different operationalizations of the same variable. Operationalization A is biased but reliable, and Operationalization B is unbiased but
unreliable. For theory-testing purposes we would greatly pr<:Jer the biased
but reliable Operationalization ,b.!
\
You will be better able to see why this is the case once you have an
understanding of statistical hypothesis testing from Chapters 8 and beyond.
For now, though, l<;eep in mind that as we test our theories we are loo}<ing
for general patteros between two variables.For instance, with hi her values
o X o we tend to see higher values of Y or with hi her values of do
\Ve ten to see lower values of Y? If the measurement of X was.J>iased
upward, the same general pattern of association with Y would be visible.
'Sut if the measurement of X was unreliable it would obscu e the underlying
relations Ip etween X and Y.
..

...
5.3.4

-"

95

j

,

Validity

The most important feature of a measure is that it is valido A valid me ure
;ccurate y represents t e concept that it is supposed to measure, whereas
an invalid measure measures somethin other than what was ori inally
intended. A o this might sound a bit circular. we realize.
\
Perhaps it is useful to think of some important concepts that represent thorny measurement examples in the social sciences. In both social
psychology anJpolitical science, the study of the concept of prejudice has
be en particularly important. Among individuals, the leve! of prejudice can
vary, from vanishingly small amounts to very high leve!s. Measuring prejudice can be important in social-psychological terms, so we can tr to
etermine what factors cause some eo le to be re'udiced wh reas others
!Q.!1O.t. In political science, in particular, we are often interested in what the
attitudinal and behavioral consequences of prejudice might be. Assuming
that some form of truth serum is unavailable, how can we obtain a quantitative measure of prejudice that can tell us who harbors large amounts
of prejudice, who harbors some, and who harbors none? It would be easl:
• enough to ask respondents to a survey if they were prejudiced or noto F~
example, we could ask res ondents: "With respect to eo le who have a
di erent race or ethnicity than you, would ou sa that you are extremely
preju ice ,somewhat prejudiced, mildly prejudiced, or not at all prejudiced
toward them?" But we would have clear reasons to doubt the validity of
their answers - whether their measured responses accurat~eflected their
true levels of prejudice.
There are a variety of ways to assess a measure's validity, though
it is critical to note that all oE them are theoretical and subject to large

5.3 Problema in Measuring Concepta of Intereat

~egrees oE disagreement. There is no simple formula to check for a measure's
validity on a scale of Oto 100, unfortunately. Instead, we rely on several
, overlapping ways to determine a measure's validity. First, and most siI!!Ply,
we can examine a measure's face validi . When examinin a measu
ent
strategy, we can rst ask whether or not, on its face, the measure appear~o
be measuring what it ur orts to be measurin . This is face vali .
an a It more advanced, we can scrutinize a measure's content validitY.
What is the concept to be measured? What are all oE the essential elements
to that concept and the features that d;fine it? And h~ve you excluded aJl
oE the things that are not it? For example, the conc~t of democ~~g sure!y
contains the element of "elections," but it also must incorp~<>..~an
'mere e!ections, because elections are he!d in places like North Korea, which
we know to be nondemocratic. What e!se must be in a valid -~~;;~~~-~f
democracy? iMore on this notion -la ter on.) Basically, content validm on
is a rigorous process that forces the researcher to come u with ist of
a o t e CrItIca e ements t at, as a group, define the concept we wish to
measurc:.:..Finally, we can examine a measure's construct validity: the dwee
to which theÍneasure is related to other measures that theory reguires them
to be related 12. That is, if we have a theory that connects demqcratization
and economic development, then a measure of democracy that is related
to a measure oE economic development (as our theorr reguires) M~JYes
si multaneou si y to confirm the theory and also to validate the measure of
democracy, Of course, one difficulty with this approach is what happens
when the expected association is not presento Is it beca use our measure
of democracy is invalid or because the theory is misguided? There is no
conclusive way to tell.

~.
:."'

,"-

f~K3·ª-:i

The Relationship between Validity and Reliability

What is the connection between validity and reliability? Is it possible to
have a valid but unreliable measure? And is itpossible to ha-;e a reliabl~ut
invalid measure? With respect to the second question, some scientific.debate
exists; there are some \Vho believe that it iS..2.,ossible to have a reliable but
invalid measure. In our view, that is po~sible in abstract terms. But ~se
we are interested in measuring concepts in the interest of ~valul!~~ng causal
theories, we believe that, in all practical terms, any conceivable measures
thatare reliable but invalid will not be useful in evaluating causal theories.
Slmtlarly, it is theoretically possible to have valid but unreliable measures~But those mea sures also will be problematic Eor evaluating causal
theodes, because we \Vill have no confidence in the hypothesis tests that
conducto We present the reJationship between reliability and validity in
Figure 5.1;-where we show that, if a measure is unreliable, there is little

we

96

97

Measurement

Renabllity?

Unrellable measures
lead to unreliable
hypo\hesls tests.

Measures \hat are
both reliable and
valid can be usad
to test hypo\hesas.

An Invalid measure,
desplte reUabillly,
cannot be usad
meanlnglully In
hypothesis testing.

Figure 5.1. Reliability, validity, and hypothesis testing.

point in evaluating its validity. Once we have established that a measure is
reliable, we can assess its validity, and only reliable and valid measures are
useful for evaluating causal theories.

_JI

j

CONTROVERSY 1: MEASURING DEMOCRACY

Although we might be tempted to think of democracy as being. similar
w~pregnancy· that is. a country either is or is not a democracy ~ué
the same way that a woman either is or is not pregnant .: on a bIt .Qf
addltioñál thought, we are probabl better off thinkin oE democrac as a
continuum. That is, there can be varying degrees to which a government
is democr;ic. Furthermore, withj.n democracies, some countries are more
democratic than others, and ácountry can become more or less democratic

5.4 Controversy 1: Measuring Democracy

that make a government more or less democratic? Political philosopher
Robert Dahl (1971 ersuasivel ar ued that there are two core attributes
to a emocracy: "contestation" and "participation." That is, according to
Dahl, democracÍes have competitive elections to choose leaders and broadly
inclusive rules for and rates of participation.
Several groups oE political scientists have attempted to meaglre democracy systematically in recent decades.H The best known - tho'!s!?_by no
means universally accepted - oE these is the Polity IV measure. 12 The proj~
measures democracy with annual scores rangin from -10 str ngl~au­
'tocratic) to +10 (strongly emocratlc) for every country on Earth from
~800 to 2004. 13 In these researchers' operationalization, democracy has
four components:
1.
2.
3.
4.

Regulation of executive recruitment
Competitiveness of executive recruitment
Openness of executive recruitment
Constraints on chief executive

For each of these dimensions, experts rate each country on a particular
scale. For example, the first criteéion, "regulation of executive rccruiunent,"
állows for the following possible values:
.
• +3 = regular competition between recognized groups
• +2 = transitional competition
• +1 = factional or restricted patteros of competition

O= no competition
Countries that have regular elections hetween groups that are more than
ethmc r¡vals will have higher scores. By similar procedures, the ~cholars
assoclated with the project score the other dimensions that comprise their
democracy scale.
Figure 5.2 presents the Polity score for Pakistan from 1947 (when India
and Pakistan were partitioned) through 2003. 14 Remember that highcr
scores represent points in time when Pakistan \Vas more democratic, and
lower scores represent times when Pakistan was more autocratic. There has
11 For a useful review 3nd eomparison of these various measures, see Munek and Verkuilen

(2002).
12 The project's web site, whieh provides aceess to a vast

data, is hnp:l/www.eidem.umd.e.<!u1insC!!I!Qfuy.
two separa te 10-point scales, one for demoeracy
and the other for autocracy. A country's Polity seore lor that year is its democracy seore
minus its autocracy seore; thus. a country that reeeived a 10 on the democraey seale and
a O on the autocracy seale \Vould have a net Polity seore of 10 for that year.
14 Source: hnp:l/www.cidcm.umd.edulinscr/polity/pak2.htm.
13 They derive the seoreS on this seale from

10 This position, though, is controversial within political sdence. For an interesting dis-

cussion about whcther rcsearchers should measure democracy as a binary conccpt or a
conti!1uous one, see Elkins (2000).

..~--

a!}1lJ'J~Le_ountrj'::sp.ecific over-time

38

Measurement

99

5.5 Controversy 2: Measuring Politlcal Tolerance

place, owing to expansion of suffrage, our measures of democracy ought
to incorporate that reality. The Polity IV measure, despite its considerable
strengths, does not fuUy encompass what it means, conceptuaUy, to be more
or less democratic.

-ij CONTROVERSY 2: MEASURING POLITICAL TOLERANCE

Figure 5.2. Polity IV score for Pakistan.

been, as you can see, enormous variation in the democratic experience in
Pakistan, which has been ruled by the military for over half of the country's
existence. In lact, Pakistan's president in 2003, Pervez Musharraf, seized
power in a military coup in October 1999. The steep decline in the most
recent porhon of the trend hne represents the severe restrictions by Islamic
political parries thar the Musharraf rt,!gime has imposed since cooperating
with the U.S. War on Terror.
The Polity measure is rich in historical detail, as is obvious from Figure 5.2. The coding rules are transparent and clear, and the amou~_of raw
information that goes into a country's score {or any given year is impr~ssive.
And yet it is fair to criticize the Polity measure for induding onl one
part o a's definition of democracy. The Polity measure contai~. rich
-information about what Dahl calls "contestation" - whether a country
has broadly open contests to decide on its leadership. But the measure i§
much less rich when it comes to gauging a country's level of what D hl
c.a s "participation':' - the degree to which citizens are engaged in political
processes and activities. This may be understandable, in part, because of the
impressive time scope of the study. After aH, in 1800 (when the Polity time
series begins), very few countries had broad electoral participatio~n. Since
the end of World War 11, broadly democratic participation has spread
rapidly across the globe. But if the world is becoming a more democratic

We know that sorne continuum exists in which, on the one end. sorne individuals are extremely "tolerant" and, on the other end, other individual s
~re extremely "intolerant." In other words, political tolerance and int~~
ance, at the conceptuallevel, are real things. Sorne individuals h v ore
to erance and others have less. Ir is easy to imagine why political scientists would be interested in political tolerance and intolerance. Are there
systematic factors that cause sorne people to be tolerant and others to be
intolerant?
Measuring political tolerance, on the other hand, is far from ea.§}'.
Tolerance is not like cholesterol, lor which a simple blood test can teU us
how much ol the good and how rnuch of the bad we have inside of uso The
naive approach to measurmg pohtlcaI tolerance - conducting a survey and
asking people directly "Are you tólerant or intolerant?" - seems silly rjght
off the bat. Any such survey guestion would surely produce extremely h!gh
rates of "tolerance," because presumably very few people - even intolerant
people - think of themselves as intolerant. Even those who are aware of
their own intolerance are unlike1y to admit that fact to a pollster. Given
this situation, how have political scientists tackled this problem?
During the 1950s, when the spread of Soviet communism~represented
the biggest theeat to America, Sambel Stouffer (1955) conducted a serie.s of
opinion surveys to measure hO\~ people reacted to the Red Scare. He asked
national samples of Americans whether they would be willing to extend
~tain civil Iiberties ~ like being aUowed to teach in a public sch-901, to
be free from having phones tapped, and the like - to certain ul1popular
groups like communists, socialists, and atheists. He found that a Yariety of
people were, by these measures, intolerant; they were not willing t.o grant
these civil Iiberties to members of those groups. The precise amount of
intolerance varied, depending on the target group and the acti0.ty me ntioned in the scenarios, but intolerance was substantial - at least 70%
ol respondents gave the intolerant response. Stouffer found that the best
predictor of an individual's level of tolerance was how much formal educatlOn he or she had received; people with more education ern~rged as
~more tolerant, and people with less education were less tolerant. In the
Dr70s, when the Red Scare was subsiding somewhat, a new group of

100

Measurement

researchers asked the identical qp.estions to a new sample of Americans.
They found that the leyels of iotolerance bad dropped considerabJy_over
the 20-odd years io only one scenario did intolerance exceed 60% and
in tbe majority of scenarios it was below 50% leading sorne to sp_eculate
that polítical intolerance was wanlDg.
However, also in tbe late 1970s, a different group of rese~rchers led
~y polítical scientist lobo Sulljvao questipned the validit.)' of tbe Stouffer
mea sures and bence questioned tbe conclusions tbat Stouffer reachedJ_ Tbe
concept of polítical tolerance, wrote Sullivan, Pierson, and Marcus (1979),
"presupposes opposition.» Tbat is, unless a survey respondent activ~ly opposed communists, socialísts, and atbeists, tbe issue of tolerance or intolerance simply does not arise. By way of exampleo consider asking sucb
questions of an atbeist. ls an atbeist who agrees tbat atbeists sb~uld be
allowed to teacb in publíc scbools politically tolerant? SuIlivan and his
cólleagues tbougbt nota
Tbey proposed a new set oE suryey-based guestions tbar were, in their
view, more consistent with a conceptual understanding of tolerance. If, as
tbey defined it, tolerance presupposes opposition tben resear
ed to
find out wbom tbe survey res pon ent opposes; assuming that the respond!nt might oppose a p~rti~ular ~~oup is no~ a go~d ide~. Tbe~ icLmtified
a variety of groups actIve m pohtlcs at the tIme - mcludmg racIst groups,
both pro- and anti-abortion groups, and even the Symbiaoise 1 iberation
Army and asked respondents wbich one tbef disljked tbe mosto They followed this up with questions that looked very mucb like the Stouffer items,
only directed at the respondent's own disliked groups instead of the ones
Stouffer bad picked out for tbem.
Among other findings, two stood out. First, the levels of intolerance
were S"tiíkin 1 bi h. As man as 66% of Americans were wil i
rbid
~~mbers of their least-liked group from bolding rallies. and fuUf 71 % were
willing to have tbe overnment ban tbe rou alto ether. Second un r
iliis new conceptualization and measurement of tolerance the
d tbat
\ ardndividual's erce tion of the tbreaten'n nature of t
oup,
\ and not their level of education, was the primary predictor of intolerance.
1~"other words, individuals wbo found tbeir target group to be particu.htrl.y
threatening were most likely to be intolerant, whereas those who found t~eJ[
most-disliked group to be less tbreatening were more toler¡p.t. Educatlon
did not directly affect tolerance eitber way. In tbis sense, measuring aD
" important concept differently produced ratber different substantive findings
about causes and effects.1S

1

J
15 But see Gibson 1992.

101

5.7 Conclusions

It is important tbat you see the connection to valid measurement here.
Sullivan and bis colleagues argued tbat Stouffer's survey questions were not
valid measures of tolerance beca use tbe question wording did not accurately
capture wbat it meant, in tbe abstract, to be intolerant (specifically, opposition). Creating measures of tolerance and intolerance tbat more truthfully
mirrored tbe concept of interest produced significantly different findings
about the persistence of intolerance, as well as about the factors tbat cause
individuals to be tolerant or intolerant.

ARE THERE CONSEQUENCES TO POOR MEASUREMENT?

What happens when we fail to measure the key concepts in our theory in a
way tbat is botb valid and reliable? Refer back to Figure 1.2, which highlights the distinction between the abstract concepts of theoretical interest
and the variables we observe in tbe real world. If tbe variables that we
observe in the real world do not do a good ¡ob 01 mirroring the abst;;ct
concepts, then tbat affects our ability to evaluate conclusively a theory's
empirical support. That iso bow can we know if our theory is supported if
we have done a poor job measuring the key concepts that we observe.l If
our empirical analysis is based on measures that do not capture the essence
of the abstract concepts in our tbeory, then we are unlikely to have any
confidence in tbe findings themselves.

-"

CONCLUSIONS

How we measure the concepts that we ca re about matters. As we can sce
from the preceding examples, different measurement strategies can ..and
sometimes do produce different conclusions about causal relationships.
One of the take-home points of this chapter should be tbat mcasurcment cannot take place in a theoretical vacuum. The theoretical purpose of
the scholarly enterprise must inform the process of bow we measurc what
we measure. For example, recall our previous discussion about the various
ways to measure poverty. How we ,~ñt-"to measure this conceet dep~nds
on what our objective is. In the process of measuring poverty, if OUf tbeoretical aim is to evaluate the effectiveness oi different olicies at co hating
poverty, we wou
ave difierent measuremcnt issues than would scholars
whose theoreticaI aim is to study how being poor influences a person's politlcal attltudes. In the tormer case, we would give strong co~ion to
pretranster measures oí poverty, whereas 10 die latter exam le po~ttransfer
measures wou I e y e more applicagle.

102

103

Measurement

CONCEPTS INTRODUCED IN THIS CHAPTER

construct validity
content validity
face validity
rneasurernent

rneasurernent bias
operationalization
reliable
valid

Exercises

5. If you did not yet do Chapter 3, Exercise 5, do so now. For the theory that you
developed, evaluate the measurement of both the independent and dependent
variables. Write ahout the reliability, and the various aspects of validity for
eaeh measure. Can you think of a better way to operationalize these variables
to test your theory?

EXERCISES

1. Suppose that a researcher wanted to rneasure the federal governrnent's efforts to rnake the education of its dtizens a priority. The researcher proposed
to count the governrnent's budget for education as a percentage of the total
GDP and use that as the measure of the governrnent's commitment to education. In terms of validity, what are the strengths and weaknesses of such a
rneasure?
2. Suppose that a researcher wanted to create a measure of media coverage of a
candidate for office, and therefore created a set of coding rules to code words
in newspaper articles as either "pro" or "con" toward the candidate. Instead
of hiring students to implernent these rules, however, the researcher used a
computer to code the text, by counting the frequency with which certain words
were mentioned in a series of articles. What would be the reliability of such a
computer-driven rneasurement strategy, and why?
3. For each of the following concepts, identify whether there would, in measuring
the concept, likél"y be a problern of measurement bias, invalidity, unreliability,
or none of the aboye:

.,.t
"

(a) Measuring the concept of the public's approval of the president by using
a series of survey results asking respondents whether they approve or
disapprove of the president's ¡ob performance.
(b) Measuring the concept of political corruption as the pereentage of politidans in a eountry in ayear who are eonvieted of eorrupt praetiees.
(e) Measuring the concept of democraey in eaeh nation of the world by reading their constitution and seeing if it claims that the nation is "democratic."
4. Download a codebook for a political scienee data set in which you are interested.
(a) Describe the data set and the purpose for which it was assembled.
(b) What are the time and space dirnensionsof the data set?
Read the details of how one of the variables in which you are interested was
coded. Write your answers to the following questions:
Does this seem like a reliable method of operationalizing this variable?
How rnight the reliability of this operationalization be improved?
(d) Assess the various elernents of the validity for this variable operationaliz~tion. How might the validity of this operationalization be improved?

(e)

1

I

'1

1

¡

105

49.846,50.414,48.268,47.76,53.171, 60.006, 54.483, 54.708, 51.682,
36.119, 58.244, 58.82, 40.841, 62.458, 54.999, 53.774, 52.37, 44.595,
57.764,49.913,61.344,49.596,61.789,48.948,44.697, 59.17, 53.902,
46.545, 54.736, 50.265, 51.2. We can see from this example that, Ollce
we get beyond a small number of observations, a listing of values becomes
unwieldy. We will get lost in the trees,and have no idea of the overall sh'!.Re
of the foresto For this reason, we turn to descriptive statistics and descriptive
graphs, to take what would be a large amount of information and reduce
it to bite-size chunks that summarize that information.
Descriptive statistics and graphs are useful tools for helping resea~ehers
to get to know their data before the move to testin causal h othes S.
ey are also sometimes hel fui when writin about onc's rese h. Y OH
ave to make the decision of whether or not to present descriptive statistics
and/or graphs in the body of a paper on a case-by-case basis. It is scÍentificaUy important, however, that this information be made available to
consumers of your research in sorne way.2
One major way to distinguish among variables is the measurement
metric. A variable's measurement metric is the type of values that the variable takes on, and we discuss this in detail in the next section by dcscribing
three different variable trees. We then explain that, despite the imperfect
nature of the distinctions among these three variable types, we are foreed
to choose between two broad c1assifications of variables - categorical or
continuous - when we describe them. The rest of this chapter discusses
strategies for describing categorical and continuous variables.

/


Descriptive Statistics and Graphs

OVERVIEW
Descriptive statistics and descriptive graphs are what they sound like - they
;r'etools that describe variab~s. These tools are valuable, because they can
~ummarize a tremendous amount of information in a succinct fashion. In this
chapter we discuss some of the most commonly used descriptive statistics
and graphs, how we should interpret them, how we should use them, and
their limitations.

-il'

KNOWYOURDATA

In Chapter 5 we discussed the measurement of variables. _A lot of thought
and effort goes into the measurement of individual variables. Once measurement has been conducted, it is important for the researcher to get a
&ood idea of the types of values that the individual variables take...on before
~oving to testing for causal connections between two or more va.riables.
What do "typical" values for a variable look like? How tightly clustered
(or widely dispersed) are the these values?
Before proceeding to test for theorized relationships between two or
morevanables, it is essential understand the roperties and characteristics
oE each variable. To put it differently, we want to learo some!hing about
what the values of each variable "look like." How do we accomplish this?
One possibility is to list aU of the observ;i values of a measured va~able.
For example, the following are the percentages of popular votes for major
party candidates that went to the candidate of the party of the sitting
1
president during U.S. presidential elections from 1880 to 2004 : 50.22,

I

1

This measure is constructed so that it is comparable across time. Because independent or
third-party candidates have occasionally contested elections, we focus on only those votes
for tbe two major parties. AIso, because we want to test the theory of economic voting,

104

6.2 What Is the Variable's Measurement Metric?

*'1 WHAT IS THE VARIABLE'S MEASUREMENT METRIC?

/
:""

There are no hard and fast rules for describingvariables, but a major initial
íuiicture that we encounter involves the metric in which we mearure each
~Ie. Remember from Chapter 1 that we can think of each variable
in terms of its label and its values. The label is the description of the
variable - such as "Gender of surve res ondent" - and its values are the
enominations in which the variable occurs - such as "Male" or "Female •."
For treatment in most statistical analyses, we are forced to divide our
variables into two types accordin to the metric in which the values of he
varia e occur: categorical or continuol!s. In reality, variables come in at
we need to have a measure of support for incumbents. In e1ections in which the sitting
president is not running for reelection, there is still reason to expect that their party will
be held accountable for economic performances.
2 Many researchers will present this information in an appendix unless there is somcthing
particularly noteworthy about the characteristics of one or more of their variables.

106

Descriptive Statistics and Graphs

least three different metric tyQes, and there are a lot of variables that do
not neatly fit into just one of theséSlassjfications. To fielp you to better
understand each of these variable types, we will go through each with an
example. AII of the examples that we are using in these initial descriptions
come from survey research, but the same basic principies of measurement
metric hold regardless of the type of data being analyzed.
,~~6.2:i
1 Categorical Variables
.•
..,_.J

"

Categorical variables are variables for which cases have values t~at are eiil1er dIfferent or the same as the values for other cases, but about which we
cannot make any universall holdin rankin distin . s. If we think conSI er a variable that we might label "Religious Identification," sorne ~lues
for this variable are "Catholic," "Muslim," "nonreli ious" and so on. AIthough tese values are clearly different from each other, we cannot make
universally holding ranking distinctions across them. More casually, with
categorical variables like this one, it is not possible to rank order the ategories from least to greatest: The value" us im" is neither greater nor less
'than "nonreligious" (and-;o on), for example. Instead, we are left knowing
that cases with the same value for this variable are the same, whereas those
cases with different values are different. The term "categorical" expresses
the essence of this variable type; we can put individual cases into categgries
based on their values, but we cannot o an further in terms of ranking or
'ot erwise ordering these values.
_.6.2~~~

Ordinal Variables

Like categorical variables, ordinal variables are also variables for which
cases ha ve values that
nt or the same as the values for
·other cases. The distinction between ordinal and categorical varia es is that
va ues or ordinal variables. For instance, consider the variable labeled
"1tetrospective Family Financial Situation" that has commonly been used
as an independent variable in individual-Ievel economic voting studies. In
the 2004 National Election Study (NES), re~earchers created this variable
by first asking respondents to answer the following question: "We are
interested in how people are getting along financially these days. Would you
say that you (and your family living here) are better off or worse mf than
you were ayear ago?" Researchers then asked respondents who answered
"Better" or "Worse": "Much [better/worse] or somewhat [better/worse.]?"
The resulting variable was then coded as follows:

107

6.2 What ls the Variable's Measurement Metric?

1.
2.
3.
4.
S.

much better
somewhat better
same
somewhat worse
much worse

This variable is pretty clearly an ordinal variable because as we go fr.Q!!1the
. top to the bottom ol the hst we are moving from better to worse eyaluations
oí how md1Vlduals (and their !amilies with whom thev live) have,heen faring
financially in the past year.
As another example, consider the variable labeled "Party Id~tifica­
~." In the 2004 NES researchers created this variable by using each
respondent's answer to the question, "GeneralIy speaking, do you usually
think of yourself as a Republican, a Democrat, an independent, or what?,,3
which we can code as taking on the following values:

1. Republican
2. Independent
3. Democrat

If all cases that take on the value "Independent" represent individu.als whose
vlews he somewhere between "ReRublican" and"Democrat," we ca.n caH
"Party Identification" an ordinal variable. If this is not the case, then this
.
variable is a categorial variable.

An important characteristic that ordinal variables do not have is equal unit
?ifferences. A variable has equar uhlt dltterences II a one-uOlt mcreaseln
the value of that variable alwqys mCans the same thing. If we return to th~
examples from the previous section, we can rank arder the five categories
of Retrospective Family Financial Situatjon from 1 for the best situatjon
to S for the worst situation. But we may not feel very confident wor~
with these assigned values the way that we typically work with numbers.
In other words, can we say that the difference between "somewhat worse"
and "same" (4 - 3) is the same as the difference between "much worse"
3 Almost all U.S. respondents put themselves into one of the /irst three categories. For
instance, in 2004, 1,128 of the 1,212 respondents (93.1 %) to the postelection NES responded that they were a Republican, Democrat, or an independent. For our purposes,
We will ignore the "or what" cases. Note that researchers usually present parrisan identi/ication across seven values ranging from "Strong Republican" to "Strong Democrat"
based on follow-up questions that ask respondents to further characterize their positions.

t
".'

108

109

Descriptive Statistics and Graphs

and "somewhat worse" (5 - 4)? What about saying that the difference between "much worse" and "same" (5 - 3) is twice the difference between
"somewhat better" and "much better" (2 - 1)? If the answer to both questions is "yes," then Retrospective Family Financial Situation is a continuous
variable.
If we ask the same questions about Party Identification, we should
be s~mewhat ske tical. We can rank order the three cate ories of Party
entl cation, but we cannot witP great confidence assign "Re ublican"
va ue o 1, "In epen ent" a value of 2, and "Democrat" a value of 3
and work with these values in the way that we typically work with numbers. We cannot say that the difference between an "Independent" and
'~Republican" (2 -1) is the same as the difference between a "Democrat" and an "Independent" (3 - 2) - despite the fact that both 3 - 2 and
2 - 1 = 1. Certainly, we cannot say that the difference between a "Democrat" and a "Republican" (3 - 1) is twice the difference between an "Independent" and a "Republican" (2 - 1) - despite the fact that 2 is twice as big
as 1.
The metric in which we measure a variable has equal unit differences if
a ~ne-unit increase in the value of that variable indicates the same amount of
change across all values of that variable. Continaoas variables are variables
that do have equal unit differences.4 Ima 'ine for instance a variabl abeled
. 'Age in Years." A one-unit increase in this variable always indicates an
iñdividual who is 1 year older; this is troe when we are talkiQg about a case
wTi:ha value of 21 just as it is when we are talking about a case with a value

can always repeat our analyses under a different assumption and see
how robust our conclusions are to our
choices.
Number
With aH of this in mind, we
Category'
ofeases
Pereent·
present separate discussions of the
Protestant
672
56.14
process of describing a variable's variCatholic
292
24.39
ation for categorical and continuous
Jewish
35
2.92
variables.
A variable's variation is the
Other
17
1.42
distribution of values that it takes
None
181
15.12
across the cases for which we measure it. It is important that we have a
strong knowledge of the variation in each of our variables before we can
translate our theory into hypotheses, assess whether there is covariation
between two variables (causal burdle 3 from Chapter 3), and think ahout
whether or not there might exist a third variable that makes aoy observed
covariation betweeo our independent and dependent variables spurious
(hurdle 4). As we just outlined, <!,escrip.tive statistics.;a9fll2hs¿are uJ>efui summaries of the variation foriifcliJu'al variables. ~nother way in
which we describe distributions of variables is through measures of central
~ndency"Measures oE central tendency tel! us about typical values faL--;
particular variable .
Table 6.1. Frequency table for
religious identification in the
2004 NES

.'

a

Variable Types and StatisticaJ. Analyses

As we saw in the preceding subsections, variables do not always neatly fit ..
into the three categories. When we move to the vast, majority of statistic~l
analyses we must decide betweeq treating each of our variables as thou h
it is categorical or as thoug it is continuous. For sorne variables, this is
a very straightforward choice. Hpwever, for others, this is a very difficult
choice. If we treat an ordinal variable as though it is categorical, we ,~lre
~cting as though we know less about the values of this variable than we
really know. On the other hand, treating an ordinal variable as though it
is a continuous variable means that we are assuming that it has equal unit
differences. Either way, it is critical that we be aware of our decisions. We
4 We sometimes caH these variables "tnterval variables." A further distinction you will

encounter with continuous variables is whether they have a substantively meaningful zero
poin~. We usuaHy describe variables rpat have this characteristic as "ratio" variables.

'

. , . DESCRIBING CATEGORICAL VARIABLES

~T55.""

Fa.2A-l

6.3 Describing Categorical Variables

'::.1.

.(

With categorical variables, we want to understand th~req%ncy ~ith which
each value of the variable occurs in our data. The simplest way of seeing this
is to produce a frequency table in which the values of the cate orical varia le
are isp aye own one column and the frequency with which it occurs
(in absolute number of cases and/or in percentage terms) i,s displayed in
another column(s). Table 6.1 shows such atable for the variable "Re1igious
Identification" from the NES survey measured during the 2004 national
e1ections in the United States.
The only measure of central tendency that is appro riate for a cat gorical vana e IS t e ocle which we define as the most fre uentl occurring
va ue. In Table 6.1, the mode of the distribution is "Protestant," because
there are more Protestánts than there are members of any other single
category.
A typical way in which we present frequency data is in a pie graph such
as Figure 6.1. Pie graphs are useful for seeing the percentage of cases that
fall into particular categories. Bar grap~, such as Figure 6.2, are another
graphical way to iIlustrate frequencies of categorical variables. It is worth

<

110

111

Descriptive Statistics and Graphs

6.4 Describlng Contlnuous Variables

YOT~

percentl1 ••
36.U9
40.841
44.697
49.2n

Figure 6.1. Pie graph of religious
identification, NES 2004.

52.026
56.3B1.5
60.006

61.789
62.458

[[[[]J]

Protestant

mmmm Jewish
bS.SS.'::l

36.U9
40.841
44.595
44.697

Largest
60.006

61.344
61.789
62.458

ob.
SUII

of "otgt.

MeAn

Std. oev.

v¡r1ance
skewness
KurtQs1s

32
32
52.27022
6.071435
36.86232
-.4182924
3.178354

~Catholic

i::!¡g¡m

Other
. [~

None

?-

continuous variables are more mathematicaHy complex than categorical
variables. With continuous variables, we want to know about the ~ntral
tendency and the spread or variation of the values around the centraljendency. With continuous variables we also want to be on the lookout for
óutliers. Outliers are cases for which the value nf tbe variable js extremely
high or low rdative to the rest of the values for that variable. When we
eru:ounter an outlier, we want to make sure that such a case is real and not
created by some kind of error.
Most statistjcaJ software programs have a command for ~ettiog a battery oE descriptive statistics 00 continuous variables. Figure 6.3 shows .!he
output from Stata's "summarize" command with the "detail" o tio
t e percentage of the major party vote won by the incumbent party jQJ!very U.S. presidential dection between 1880 and 2004. The statis i
nt
le t- and side (t e rst t ree columns on the leh) of the computer printout
are what we caH nk statis .es and the statistics on the ri ht-hand side t e
two columns on t e ri ht-hand side) are known as the st tistical moments.
Although both rank statistics and statistica moments are JOten ed to describe the variation of continuous variables, they do so in slightly different
ways and are thus quite useful together for getting a complete picture of
the variatíon for a single variable.

.-

liD DESCRIBING CONTINUOUS VARIABLES
The statistics and graphs for describing continuous variables are considerably more complicated than those for categorical variables. This is beca use

8

IX)

.,"'

5IIIa'1en

g+----------------------------------------

l3

()

~.8 8+---------------------------------------....
E

:::J

Z

~~'~~~~rl Rank Statistics
o

Catholic

Jewish

Other

Figure 6.2. Bar graph of religious identificatian, NES 2004.

Protestant

TEe calculation of nk stati tics be ins with the ranking of the values of a
continuous variable rom smallest tú largest, foHowed by the identification
of crucial junctures along the way. Once we have our cases ranked the
midpoint as we count through our cases is known as the edian case. ~

1

112

Descriptive Statistics and Graphs

t

Remember that earlier in the chapter we defined the variable in Figure 6.3
as the percentage of popular votes for major-party candidates that went to
the candidate from the party of the sitting president during U.5. presidential
elections from 1880 to 2004. We will call this variable "Incumbent Vote."
for short. To calculate rank statistics for this variable, we need to first ut
the cases in or er from the sma
rved valu . This
ordering is shown in Table 6.2. With rank
1
tendency as the edian va ue f the vari bl The median value is the value
of the case t at sits at the exact center of our cases when we rank them
from the smallest to the largest observed values. When we have an eYen
number of cases, as we do in Table 6.2, we average the value of the two
centermost ranked cases to obtain the median v 1 e (in our example we
calcu ate t e median as 51.682t52.37 = 52.026). This is also known aLthe
value of the variable at the 50% rank. In a similar way, we can talk about
the value of the variable at any other percentage rank in which we have
an interest. Other ranks that are often of interest are the 25% and 75,.%
ranks, which are also known as the first and third "quartile ranks" fQ.r
a distribution. The difference between the variable value at the 25% and
the 75% ra;;ks is known as the "interquartile range" or "IQR" of the
variable. In our example variable, the 25% value is 49.272 and the 75%
value is 56.3815. This makes the IQR = 56.3815 - 49.272 = 7.1095. In
the language of rank statistics, the median value for a variable is a measure
of its central tendency, whereas the IQR is a measure of the dispersion, or
spread, of values.
With rank statistics, we al so want to look at the smallest andJargest
values to identify outlie¡s. Remember that we defined outliers at the beginning of this section as "cases for which the value of the variable is extremely
high or low relative to the rest of the values for that variable." If we look
at the highest values in Table 6.2, we can see that there aren't really any
cases that fit this description. Although there are certainly sorne values that
are a lot higher than the median value and the 75% value, they aren't
"extremely" higher than the rest of the values. Instead, there seems to be
a fairly even progression from the 75% value up to the highest value. The
story at the other range of values in Table 6.2 is a little different. We can
see that the two lowest values are pretty far from each other and the rest of
the low values. The value of 36.119 in 1920 seems to meet our definitiOlLOf
an outlier. The ;;Iue of 40.841 in 1932 is alsQ a borderline case. Whenever
we see outliers, we should begin by checking whether we have measured
the values for these cases accurately. Sometimes we find that outliers are
the result of errors when entering data. In this case, a check of our data
set reveals that the outlier case occurred in 1920 when the incumbentparty c~ndidate received only 36.119% of the votes cast for the two majar

113

6.4 Describlng Continuous Variables

parties. A further check of our data
indica tes that this was indeed a correct measure of this variable for
1920.5
Figure 6.4_ presents a box1
, ,36.119
whisker plot of the rank stati§!ics for
2
"40.841
our presidential vote VAAÍable. This
3
',44.595
plot
displays the distribution of the
4
1980
, 44.697
variable
along the vertical dimensiono
5
1992
46.545
6
1896
If we start at the center of the box in
47.76
7
'1892
"48.268
Figure 6.4, we see the median value
. -1976
8
48.948
(or 50% rank value) of our variable
9
1968
49.596
represented
as the slight gap in the
10
1884
49.846
center
of
the
box. The other two ends
11
1960
49.913
of
the
box
show
the values of the 25 %
12
1880
50.22
13
2000
rank and the 75% rank of our vari50.265
14
1888
50.414
able. The ends of the whiskers show
15
2004
51.2
the lowest and highest nonoutlier val16
1916
51.682
ues of our variable. Each statistical
17
1948
52.37
program has its own rules for deal18
1900
63.171
ing with outliers, so it is important to
19
1944
63.774
20
1988
know whether your box-whisker plot
63.902
21
1908
64.483
is or is not set up to display outliers.
22
1912
64.708
These settings are usually adjustable
23
1996
64.736
within the statistical programo Thc
24
1940
54.999
calculation
of whether an individual
25
1956
57.764
case
is
or
is
not
an outlier in this box26
1924
58.244
27
1928
whisktt
plot
is
fairly standard. This
68.8228
1984
69.17
calculation starts with the IQR for
29
1904
60.006
the variable. Any case is defined as an
30
1964
61.344
outlier if its value is either 1.5 times
31
1972
61.789
the IQR higher than the 75% valuc or
32
1936
62.468
if its value is 1.5 times the IQR lower
than the 25% value. For Figure 6.4
we have set things up so that the plot displays the outliers, and we can sec
one such a value at the bottom of our figure. As we already know from
Table 6.2, this is the value of 36.119 from the 1920 e1ection.
Table 6.2. Values of incumbent
vote ranked from smallest to
largest

.~."

.

c(_

"

'.

','

".¡:
.... ,j.

5 An obvious qtiestion is "Why was 1920 such a low value?" This was the lirst presidcntial

e1ection in the aftermath of World War 1, during a period when there was a lot of
economic and political turmoil. The election in 1932 was at the very bcginning of the
large economic downturn known as "the Great Depression," so it makes sense that the
party of the incumbent president would not have done very well during this election.

114

Descriptive Statistics and Graphs

115

- - - - - - r - - - - - highest nonoutlier

6.4 Describing Continuous Variables

which means the sum of the difference between each Y value, Y¡, aud the
mean value of Y, Y, is equal to zero,: The second desirable characteristic of
iIíe mean value is known as the "least-squares propertx':

g+----------------------r--------------------~

n

Ql

,n

"
-2
"
L.,..(Y¡
- Y)
< L.,..(Y¡
- e) 2 Ve#- Y,

W

;=1

~~-I----

1=1

Ql
Q.
Ql

-.~

g
E
Ql

.o
E
o
B"<t
oS

l-------============-~~~~~_i


lowoutlier

ot')

Figure 6.4. Box-whisker plot of incumbent-party presidential vote percentage,

Eecause of these two properties, tne mean valne is also referred to..as the
expeeted value of a variable. Think of it this way: If someone were ro ask
you to guess what the value for an individual case is without giving you
any more information than the mean value, based on these two properties
of the mean-, the mean value would be the best guess.
The next statistical moment for a variable is the mane We represent
and calculate the variance as follows:

1880-2004.

var(Y)
6.4.2 -: Moments

-_
Y
-

L7=1 Y; ,

n

n-l

which means that the variance of Y is equal t~ the sum of the sqnared
differences between each Y value
nd its mean di .
er
o cases minus one. 7 If we look through this formula, what would happen
Twe had no variation on Yat aIl O~ - YV il? In this case. yarjance w~ld
be equal to zero. Eut as individual cases are spread f!lrther and further from
the mean, this calculation would increase Ihis is the logic of yariance: It
conve s the s read
uñd the mean.
'tive measure
of variance is the standard deviatio .

n

where Y, known as "Y-bar," indicates the mean of Y, which is equal to the
sum of all values of Y across individual cases of Y, Y¡, divided by the total
number of cases6 n. Although everyone is familiar with mean or average
values, not everyone is familiar with the two characteristics of the mean
vJlue tbat roake it particularl)' attractive to people. who use statistics. T~
first is known as the "zero-suro property":

L~- (Y¡ - y)2
= vary = s~ = =!.:/_",1,-,--,,--_,--

sd(Y)

= sdy = Sy = Jvar(Y) = L7=1 (Y¡ -

y)2

n-l

Roughly speaking, this is the average difference between values of Y
(Y¡) and the mean of Y [f1 At first glance, this may not be apparent. But

the important thing to understand about this formula is that the purpose.Df
squaring each difference fram the mean and then taking the sguare root of
tIíe resulting sum oí squared deviations is to keep the negative and positive
deviations fram canceling each other 0..Yl. 8

¿)Y¡ - Y) = O,
;=1

6 To undersland formulae like Ihis, it is helpful to read through each of the pieces of the

formula and translale them into words, as we have done here.

The "minus one" in this equation is an adjustment that is made to account for the numbcr
of "degrees of freedom" with which this calculation was made. We will discuss degrees
\
af freedom in Chapter 8.
8 An alternative merhad that would produce a very similar calculation would be to calculare
the average value af the absolute value of each difference from the mean: (1::•• ~Y.-i'1 l7

I

,i
II

116

Descriptive Statistics and Graphs

117

6.4 Describing Continuous Variables

'":

""

C!

¡:.<D

"iñ C!

e

~~
~
e

30

50

60

50
Incumbenl Vole Percenlage

60

Incumbenl Vole Percenlage
'":

""

C!

I

.~~

....
e

OC!

'"C!
Incumbent Vote Percentage

T~arjaDce

ªnd the standard deviation give us a numerical surnmary
2f..tbe distrjhntiOA of cases around the mean value for a variahle. 9 We can
also visually depict distributions. The idea of visually depicting distributions
is to produce a two-dimensional pgure in which the horizont~l dimension
(x axis) displays the values of ~he variable and the vertical dimension
(y axis) displays the relative freguency of cases. One of
ost popurar visual depictions of a variaple's distribution is th histogram such
as Figure 6.5. One problem with histograms is that we (or t e computer
program with which we are working) must choose how many rectangular
blocks (called "bins") are depicte~ in our histogram. Changing the number
of blocks in a histogram can change our impression of the distribution of
the variable being depicted. Figure 6.6 shows the same variable as in Figure 6.5 with 2 and then 10 blocks. Although we generate both of the

_'Fe

e

30

Figure 6.5. Histogram of incumbent-party presidential vote percentage, 1880-2004.

40

Figure 6.6. Histograms of incumbent-party presidential vote percentage 1880-2004
depicted with 2 and then 10 blocks.
"

graphs in Figure 6.6 from the same data, they are fairly different from each
other.

~r-----------------------------~

¡:.
ce

...

";;;'-.;t

9 The skewness and the kurtosis of a variable convey the forther aspects of the distribution

of a variable. The skewness calculation indica tes the symmetry of the distribution around
the mean. If the data are symmetrically distributed around the mean, then this statistic
will equal zera. If skewness is negative, this indicates that there are mace values below
the mean than there are aboye; jf skewness is positive, this indicates that there are more
values aboye the mean than thece are below. The kurtosis indicares the stecpness of
the statistical distribution. Positive kurtosis values indicate very sreep distributions, or a
concentration of values clase to the mean value, whereas negative kurtosis values indicate
a flatter distribution, oc more cases further from the mean value. Both skewneSs and
kurtosis are measures that equal zero for the normal distribution, which we will discuss in
Chapter 7.

O

e

~---------r---------r--------~---30

40

50
Incumbenl Vole Percenlage

60

____~
70

Figure 6.7. Kernel density plot of incumbent-party presidential vote
1880-2004.
percentage,

118

Descriptive Statistics and Graphs

119

Exerclses

wq LIMlTATIONS

Table 6.3. Median incomes of the 50 states, 2004-2005

The tools that we have presented in this chapter are helpful for providing
a first look at data, one variable at a tiroe. Taking a look at your data with
these tools will help you to better know YOur data and make fe~r mistakes in the long runo It is important, however, to note that we cannot test
causal theories with a single variabl~After a1l, as we have noted, a theoO'
iSi tentatlve statement about the ossible causal relationship between two
varia ~ ecause we have discussed how to describe only a single variable, we have not yet begun to subject our causal theories to appropriate
tests.

CONCEPTS INTRODUCED IN THIS CHAPTER

categorical variables
central tendency
continuous variables
dispersion
equal unit differences
expected value
histogram
kernel densityplot
kurtosis
least-squares property
mean value
median value

measurement metric
mode
ordinal variables
outliers
rank statistics
skewness
standard deviation
statistical moments
variance
variation
zero-sum property

State

Ineame '

Alabaina
Alaska '
Atizona
Arkansas
California
Colorado
Connecticut
Delaware:
Florida
Georgia'
Hawaü
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri

: 37,602
56,398
46,279
36,406
51,312,
'61,61B\
66,889
60,445
42,440
44,140
58,B64
45,009
48,008
43,091
46,671
42,233
36,760
37,442
43,317
59,762
64,BB8
44,801
66,09B
34,396
43,266

State
Montana
Nebraska
Nevada
New Hampshire
NewJersey
NewMexico
NewYork
North Carolina
North Dakota
Ohio
Oklahoma
. Dregon

Pennsylv~

Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wiseansin
Wyoming

Income
36,202
46,687
48,496
57,850
60,246
39,916
46,659
41,B20
41,362
44,349
39,292
43,262
45,941
49,611
40,107
42,B16
39,376
42,102
63,693
49,80B
62,3B3
51,119
35,467
45,956
46,B17

Source: http://www.census.gov/hhes/wWw/income/income05/statemhi2.html.

Accessed January 11, 2007.
EXERCISES

1. Collecting and describing a categoricalvariable. Find data for a categorical
variable in which you are interesred. Ger those data ioto a formar thar can be
read by the statistical software that you are using. Produce a frequency table
\ and describe what you see.

4. Moving [rom mathematical [ormulae to textual statements. Write a sentcnce
that conveys what is going on in each of the following equations:

2. Collecting and describing a continuous variable. Find data for a continuous
variable in which you are interested. Get those dara into a format that can be
read by the statistical software that you are using. Produce a rabIe of descriptive
statistics and either a histogram or a kernel density plot. Describe what you
have found out [rom doing this.

5. Computing means and standard devlations. Table 6.3 containsthe median
income for each of the 50 U.S. sta tes for the years 2004-2005. What is the
mean of this distribution, and what is its standard deviation? Show all of your
\ 'work.

I

3. In Table 6.1, why would it be problematic to calculate the mean value of the
variable "Religious Identification?"

(a)

y = 3 V X; = 2,

(b)

Yrotal

= L7=1 Y; = nY.

121



Statistical Inference

OVERVIEW

As researchers begin to consider possible tests of their theoretical propositions. they must make a series of important decisions. In this chapter we
provide a discussion of choices of population and samp!2 and inferences
f~om sampfes about populations. We introduce this topic by using examples
familiar to political science students - namely. the .. plus-or-minus" error figures in presidential horse-race polls, showing where such figures come from
and how they illustrate the principIes of building bridges between samples
we know about with certainty and ~he underlying population of interest.

How dare we speak of the laws of chance? 1s not chance the antithesis of
alllaw?

- Bertrand Russell

POPULATIONS AND SAMPLES -

¡;,.

In Chapter 6, we discussed how to use descriptive statistics to summarize
large amounts oE information about a single variable. In particular, you
lemed how to characterize a distribution by computing mea sures of central
ten ency 1 e t e mean and measures o is ersion r the standard
deviation). For example, you can implement these formulae to characterize
the distribution of income in the United States, or, for that matter, the
scores of a midterm examination your professor may have just handed
back.
But it is time to draw a critical distinction betw~.J;wQ ~a
sets tbat social scientists might use. T~t ~!pe is data ahout the populat~- that is, data for every possible relevant case. In your experience, the
example of population data that might come to mind first is that of the U.S.

120

7.1 Populations and Samples

Census, an attempt by tbe U.S. government to gather sorne critical bits of
data about the entire U.S. population once every 10 years. 1 It is a relatively
rare occurrence that social scientists will make use of data pertaining to the
entire population. 2
The second type of data is drawn from a samp!,e - a ~bset oi cases
that are drawn from an underlying population. Because of tbe proliferation
oi public-opinion polls today, many oi you mi&h.t assume that the word
"sample" implies a random sample. 3 It does noto Researchers may draw a
sample oi data on the basis of randomness - meaning that each member
of the population has an equal probability oi being selected in the samplc.
But s~ples may also be nonrandom, which we refer to as samples of convenience. 1'he vast majority of all analyses undertaken by so<;ial scientists
is done on sample data, nor pºpnlatjon data..
Why make this distinction? Even tbougb the vast majority oi social
science data sets are based on samples, not the population, it is critical
to note that we are not interested in the properties oi th
we are interested in the samp e on y mso ar as it helps us to lear
the un erl .
e ir

ferene because weuse what we know to be true about one thing (tbe
sample) to infer what is likely to be true about another tbing (tbe population).
There are implications for using sample data to learn about a population. First and ioremost is tbat this process oí statistical iníeren ce imcQlves,
at its eore, sorne degree oi uneertainty. Tbat notion is relatively straigbtíorward: Any time that we wish to learn something general based on sometbing
specific, we are going to encounter sorne degree oi uncertainty. In tbis chapter, we discuss tbis process oí statistical iníerence, including the tools that
social scientists use to learn about tbe population that they are interestcd
in by using samples oí data.

1 The

Bureau of the Census's web site is http://www.census.gov.

2 But we try to make inferences about sorne population of interest, and it is up to the

researcher to define explicitly what that population of interest is. Sometimes, as in the
case oE the U.S. Census, the relevant population - all U.S. residents - is easy to understand.
Other times, it is a bit less obvious. Consider a preelection survey, in which the researcher
needs to decide whether the population oE interest is all adult citizens, oc Iikely voters, or
something else.
3 When we discussed research design in Chapter 4, we distinguished between the experimental notion oE random assignment to treatment groups, on the one hand, and random
sampling, on the other. See Chapter 4 iE you need a refresher on this difference.

122

Statistical Inference

_

LEARNING ABOUT THE POPULATION FROM A SAMPLE:
THE CENTRAL LIMIT THEOREM

The reasons that social scientists rely on sample data instead of on population data - in spite of the fact that we care about the results in the
population instead of in the sample - are easy to understand. Consider
an election campaign, in which the media, the public, and the politicians
involved all want a sense of which candidates the public favors and by how
mucho Is it practical to take a census in such circumstances? Of course noto
The adult population in the United States is approximately 200 million
people, and it is an understatement to say that we can't interview each and
every one of these individuals. We simply don't have the time or the money
to do that. There is a reason why the U.S. government conducts a census
only once every 10 years. 4
Of course, anyone familiar with the ubiquitous public-opinion polls
knows that scholars and news organizations conduct surveys on a sample
of Americans routinely and use the results of these surveys to generalize
about the people as a whole. When you think about it, it seems a little audacious to think that you can interview perhaps as few as 1000 people
and then use the results of thos
s ro vene lize to the beliefs
OplnlOnS o the en tire 200 million. How is th
. e?
e answer les lO a undamental result from statistics called the centrallimlt theorem, which Dutch statistician Henk Ti"ms 2007 calls "the

-

utlon.

7.2.1! The Normal Distribution

To say that a particular distribution is "normal" is not to say that it is
"typical" or "desirable" or "good." A distribution that is not "norJ!lal"
is not somcthing odd like the "deviant" or "abnormal" distribution. It is
worth emphasizing, as well, that normal distributions are not necessarily
cornmon in the real world. Yet, as we will see, they are incredibly useful in
the world of statistics.
The normal distribution is often called a "bell curve" in common
language.1t is shown in Figure 7.1 and has several special properti~. Fir;,
4

You might not be aware that, even though the federal government conducts only one
census per 10 years, it conducts sample surveys with great frequency in an attempt to
measu~e population characteristics such as economic activity.

123

7.2 The Central Limit Theorem

0.5
0.45
0.4
0.35
0.3

~

~ 0.25

o

0.2
0.15
0.1
0.05
O

-4

-2

O

2

4

Standard Oeviations trom Mean
Figure 7.1. The normal probability distribution.

it is symmetrical about its mean,s such that the mode, median, and
ce
are t e same. econd, t e norma distribution has a pre lcta e area
the curve within specified distances of the mean. tarting from the !!!Wl
an gOlOg one stan ar
. l ca ture 68 0 0 of
the area un er t e curve. Going ohe additional standard deviatiqn..in each
direction will capture a shaéle over 95% of the total area under the curye.6
And going a third standard devjatjon jn each djrectjon will capture more
than 99% of the total area under the curve. This is commonly referred to
as the 68-95-99 rule and is illustfated in Figure 7.2. You should bear in
mind that this is a special feature bf the normal distribution and does not
apply to any other-shaped distribbtion. What do the normal distribution
and the 68-95-99 rule have to do with the process of learning ;bout
population characteristics based on a sample?

s Equivalently, but a

bit more formally, we can characterize the distribution by its mean
and variance (or standard deviation) - which implies that irs skewness and kurtosiS are
both equal to zero.
6 To get exactly 95% of the area under the curve, we would actually go 1.96, nor 2,
standard deviations in each direction from the mean. Nevenheless, rhe rule oE fWO is a
handy rule oE thumb for many statistical calculations.

I

I
'~. "¡

~~

!

124

125

Statistical Inference

7.2 Tbe Central Limit Tbeorem

distribution.7 If we roll a fair six-sided die 600 times, how many 1's, 2'5,
etc., should we see? On average, 100 each, right? That's pretty clase ro
what we see in the figure, but only pretty c1ose. Purely because of chance,
we roUed a couple too many l's, for example, and a couple too few 6's.
What can we say about this sample of 600 roUs of the die? And, !!l0re
!o the point, from these 600 rolls of the die, what can we say about the
underlying population of all rolls oE a fair six-sided die? Before we a!lswer
the second question, which wjl! [eq;~:::me inEerence. let's an:er the
first, which we can answer with certainry We can calculare tbe meaA of
these rolls of dice in the straightforward way that we learned in Chapter 6:
Add up all of the "scores" - that is, the 1'5, 2'5, and so on - and divi!k by
the total number of rolls, which in this case is 600. That will ¡ead to rhe
. following calculation:
Standard Deviations from Mean

Figure 7.2. The 68-95-99 rule.

= DI x 106) + (2 x 98) + (3 x 97) + (4 x 101) + (5 x 104) + (6 x 94)
600

A distribution of actual scores in a sample - called a frequency distribution, to represent the freguency of each value of a particular variable on any variable might be shaped normally, or it might not be. Consider
the frequency distribution of 600 roUs of a six-sided (and unbiased) die,
presented in Figure 7.3....Note something about Figure 7.3 right off the
bat: That frequency distribution does not even remotely resemble a normal

100

,g¡ 80
o
oc
'O

ji

60

E

:;¡

z

40

20

O

2

4

3

5

_. 7.

FoUowing the formula for the mean, for our 600 rolls of rhe die, in rhe
numerator we must add up all of the l's (106 of i:hem), all of the 2's (98 of
them), and so on, and then divide by 600 to produce ou!,!esult of 3:11We can also calculate the standard deviation of this distribution:
Sy

120

= 34

=

n
- 2 =
Li=l(Y;-Y)
n -1

f!IW-

1753.40=171
599' .

Looking at the numerator for the formula for the standard deviation that
we learned in Chapter 6, we see that L(Y; - y)2 indicates that, for cach
observation (a 1,2,3,4,5, or 6) we subtract its value from the mean (3.47),
then square tbat difference, then add up all 600 squared deviations from
the mean, which produces a numerator of 1753.40 beneatb tbe squarc-root
sign. Dividing that amount by 599 (that is, n -1), tben taking tbe square
root, produces a standard deviation of 1.71.
As we noted, the sample mean is 3.47, but wbat should we ha ve
expected the mean to be? If we had exactly 100 rolls of each side of rbe die,
0e mean would have been 3.50, so our sample mean is a bit lower tban
we would have expected. But then again. we can see tbat we mUed a few
"too many" l's and a few "too few" 6's. so rhe fuer tbat D1J[ mean is ;¡..bit
below 3.50 makes sense.
What wC;;¡d bappen, though, if we rolled thar same die anotbcr 600
times? What would the mean value of rhose roUs be? We can't say.Jor

Value

Figure 7.~. Frequency distribution of 600 roUs oi a die.

7 In fact, the distribution in the figure very c10sely resembles a uniform or flat distribution.

126

/

127

Statistical Inference

certain, of course. Perhaps we would come up with another sampl~ mean
of 3.4 7, or perhaps it would be a bit aboye 3.50. or perhaps the mean would
hit 3.50 on the nose. Suppose that we rolled the die 600 times Iike this not
once, and not twice, but an infinite number of times. Let's be cIear: We,Ao
not mean an infinite number o[ rol/s, we mean rolling tbe die 600 times [or
an infinite numberoftimes. That distinction is critica!. We areimagining
that we are taking a sample o{ 600, Dot once, but an infinite number of
times. We can refer to a h othetical distribution oí sam le means suc as
t is, as a sampling distribution. It is hypothetical because scientists almost
never actually draw more than one sample írom an underlying population
at one given point in time.
If we followed this procedure, we could take those sample means and
plot them. Sorne would be aboye 3.50, sorne below, sorne right on it. Here
is the key outcome, though: The sampling distribution would be normally
shaped, even thougb the I!nderlyjp~ frequeDcy djstrjhution is cIearly not
normally shaped
That is the insight of the centrallimit theorem. If we can envision an
infinite number of random samples and plot our sample means to each of
these random samples, those sample means would be distributed normallY.
Furthermore, the mean oí the sampling djstrjbutjon would be eQual to
the true population mean. The standard deviation of the sampling distribution is
Sy

ay

s+.

I~

7.2 The Central Limit Theorem

= .¡ñ'

where n is the sample size. The standard deviation of the
'st 'butioo of sam e means
ndard error oí the me
(or simpl}(Siándard errw,l, is simply egual to rhe S2mple standard deviation
divided by the sguare root of the sample size. In the preceding die-rolling
example, the standard error of the mean is
1.71
ay = .J600 = 0.07.

Recall that our goal here is to learn what we can about the under.ijring population based on what we knQ'" "'i tb cerrajnty about our sample.
We know that the mean of our sample oí 600 rolls of the die is 3.47 and
its standard deviation is 1.71. From those characteristics, we can imagine that, if we rolled that die 600 times an infinite number oí times, the
resulting sampling distribution would have a standard deviation of 0.07.
Our best approximation of the population mean is 3.47, because that is

the result that our sample generated. 8 But we realize that our sample of
600 might be different from the true population mean by a little bit, either
too high or too low. What we can do, then, is use our knowledge that the
~ampling distribution is shaped normally and invoke the 68-95-99_rule
to create a confidence intervaI about the Iikely location of the population
,!!lean.
!:!ow do we do that? First, we choose a degree of confidence that we
want to have In our estimate. Althohgh we can choose any confidence range
up (rom just aboye O to just below 100, social scientists traditionally rely
on the 95% confidence level. If we follow this tradition - and because our
sampling distribution is normal we would mere! start at our mea; 3.47)
an move two standard errors of the mean in each direction to produce
the mterval that we are approximately 95% confident that the population
mean lies within. Wby twa standard errors? Because ¡ust over 95% _of
the area under a norma! curve Hes w;th;o two standard errnes of tbe lRt'an.
Again, to be preciselv 95% confident, we would move 1.96 por 2, standard
errors io each direction. But the rule of thumb of two is commonly used in
practice. In other words,

y±2

)

x ay = 3.47 ± (2 x 0.07) = 3.47 ± 0.14.

Ihat means, from our sample, we are 95% confident that the pop..ulation
mean for our rolls ol the die hes somewhere on the interval between 3.33
and 3.61.
-

2. 00 chance that the population mean is less than 333, 2nd a 2.5% chance
that the population mean is greater than 3.61, for a total of a 5% chance
that the population mean is not in the interval from 3.33 to 3.61. For a
variety of reasons, we might like to have more confidence in our estimate.
Say that, instead of being 95% confident, we would be more comfortable
with a 99% level of confideoce lo tbat case. we would simply move tbree
(instead oí twol standard errors in eácb direct;on (roro 0))[ sample mean of
3.47, yielding an interval of 3.26-3.68.
Throughout this example we have been helped along by the fact that
we knew the underlying characteristics of the data-generating process (a fair
die). In the real world, social scientists almost never have this advantage.
In the next section we consider such a case.
8 One might imagine that our best guess should be 3.50 beca use, in theory, a fair die oughr

to produce such a resulr.

129

128

Statistical Inference

Mi,

EXAMPLE: PRESIDENTJAL APP~OVAL RATINGS

On October 5 and 6, 2006, Newsweek magazine sponsored a survey in
which 1004 randomly selected Americans were interviewed about their
political beliefs. Among the questions they were asked was the following
item intended to tap into a respondent's evaluatíon of the president's ¡ob
performance:

7.3 Example: Presidential Approval Ratings

But what can we say about the population as a whole? Obviousl unIike the samp e mean, t e population mean cannot e known with certainty.
But if we imagine that. instead of one sample of 1004 respondents, ~ had
an infinite number oí samples oí 1004. then the central limit theorem tells
us that those sample means would be distrjbnted normally. Our best gUt;SS
oí the population mean, oí course, is 0.33, because it is our sample mean.
The standard error of the mean is

Do you approve or disaoprove oí the way George W. Bush is handling his
'job as preside~t?
This question wordin 1s the indus standard used íor v a half-ce tury
y almost all ;:lling organization~ In early Octobe~, 2006, 33% of the
sample appro_~d of Bush's ¡ob performance. 59% dlsapproyed, and 8%
were unsure.2° .
Newsweek, of course, is not inherently interested in the opinions of
those 1004 Americans who happened to be in the sample, except insofar as
they tell us something about the adult population as a whole. But we can...,
use these 1004 responses to do precisely that, using the logic of the ceotral
limit theorem and the tools previously describe4...
. To reiterate, we know the properties of our randomly drawn sam~le
of 1004 people with absolute certainty. If we consider the 331 approvmg
responses to be 1's and the remainin 673 res onses t
'
we
calculate our sample mean, y, as follow¡l1:

y _ E7=1 Y; _ E(331 x 1) + (673 x O)
-

n

-

= 0.33.

=

9

331(1 - 0.33)2 + 673(0 - 0.33)2
1004 -1

=

336.8096
1003

= 0.58.

The only changes, of course, are for the name of the current president.

10 The source for the survey was http://www.pollingreport.com/BushJob1.htm. accessed
11

0.58
../1004

= 0.018,

which is oue measure of uncertainty about the po ulation mean. If w se
the rule of thumb and calculate t e 95% confidence interval by usingJwo
standard errors in either direction from the sample mean, we are left with
the following interval:_

y ± 2 x ay = 0.33 ± (2 x 0.018) = 0.33 ± 0.036,
or between 0.294 and 0.366, which translates ¡nto being 95% confiaent that the population value oí Bush approval is between 29,4%.. and
~o.

And this ¡s where the "plus-or-m¡nus" fi ures that we alw
see in
public opmlOn po s come froID.!2 The best guess for the population mean
\ value is the sample mean value, plus or minus two standard errors. So thc
plus-or-minus figures we are accustomed to seeing are built, typically, on
the 95% interval.

1004

We calculatethe sample standard deviation, Sr, in the íollowing may:
Sy

ay =

.
October 8, 2006.
There are a variety of different ways in which to handle mathematlcally the 8% of
"uncertain" responses. In this case, because we are interested in calculating the "approval"
rating for this example, it is reasonable to lump the disapproving ~n~ unsur~ answers
together. When we make decisions lilce this in our statistical work, It IS very Important
to communicate exactly what we have done so that the scientific audience can make a
reasoned evaluation of our work.

[J,~~~l] What Kind of Sample Was That?

If you read the preceding example carefully, you will have noted that the
Newsweek poli we described used a random sample of 1004 individuals.
That means that they used sorne mechanism (like random-digit telephone
dialing) to ensure that all members of the population had an egual proba~i1ity of being selected for the survey We wa nr ro reiterare the importaoce
of using random samples ¡he centrallimit theorem appIjes onÓ! to samples
that are selected randomly. Wjth a sample of conyenience. by contrasto we
cannot invoke the centrallimit theorem to cogstruct a sampliog di,trieution
and create a confidence interW.
This lesson is critica!: A nonrandomly selected sam le of con ve ni ce
does very itt e to help us ui d bridges between the sample and !be
12 In practice, most polling firms have their own additional adjustments that they make to

these calculations, but they start with this basic logic.

130

Statistical Inference

population about which we want to learn. This has a11 sorts of im IicatlOns a out po s" t at news organizations conduct on their web sit~.
What do such "surveys" say about the population as a whole? Because
their samples are dearly not random samples of the underlying population,
the answer is "nothing."
There is a related lesson involved here. The preceding example represcnts an entirely straightforward connection between a sample (the 1004
people in the survey) and the population (all adults in the United States). Oftcn the link between the sample and the population is less straightforward.
Consider, for example, an examination of votes in a country's legislature
during a given year. Assuming that it's easy enough to get all of the roll-call
voting information for each member of the legislature (which is our sampie), we are left with a slightly perplexing question: What is the population
of interest? The answer is not obvious, and not all social scientists would
agree on the answer. Sorne might daim that the data don't represent a
sample, but a population, because the data set contains the votes of every
member of the legislature. Others might daim that the sample is a sample of
one year's worth of the legislature since its inception. Others still might say
that the sample is one realization of the infinite number of legislatures that
could have been happened in that particular year. Suffice it to say that there
is no c1ear scientific consensus, in this example, of what would constitute
the "sample" and what would constitute the "population. "]

131

7.4 Relationships Between Variables

In the preceding example, if instead of having our sample of 1004, we
had a much larger sample - say, 2500 - our standard errors would have
been
0.58

e-

7.3.2. J A Note on the Effects of Sample Size

As the formula for the confidence interval indica tes, the smalIer the sta!};
dard errors, the "tighter" our resulting confidence intervals will be; larger
standard errors will produce "wider" confidence intervals. If we are inltrested in estimating populatjoD yalnes, based on our samples. with as much
precision as possible. then it is desjrable to have tighter instead of wider
confidence intervaJs.
How can we achieve this? From the formula for the standard error
of the mean, it is dear through simple algebra that we can get a smaller
quotient either by having a smaIler numerator or a larger denominator.
Because obtaining a smaIler numerator - the sample standard deviation - is
not something we can do in practice, we can consider whether it is possible
to have a larger denominator - a larger sample size.
Larger sample sizes will reduce the size of the standard errors, and
smaller sample sizes will increase the size of the standard erro es This, we
\ hope, makes intuitive sense. If we have a large sample, then it should be
easier to make inferences about the population of interest; smaller samples
should produce less confidence about the population estimate.

ay

= .J2500 = 0.0116,

which is less than two-thirds the size of our actual standard errors of 0.018.
You can do themath to seé thatgoing two standard errors of 0.011 in either
direction produces a narrower int~rval than going two standard errors of
0.018. But note that the cost of reducing our error by about 1.5% in either
direction is the addition of nearly another 1500 respondents, and in many
cases that reduction in error wiII not be worth the financial and time costs
involved in obtaining all of those extra interviews.
Consider the opposite case. If, instead of interviewing 1004 individuals,
we interviewed only 400, then our standard errors would have been
0.58
ay = .J400 = 0.029,

which, when doubled to get our 95% confidence interval, would leave a
plus-or-minus 0.058 (or nearly 6%) in each direction.
We could he downright silly and ohtain a random sample of only 64
people if we liked. (We're sure that you notice that aU of our hypothetical
sample sizes are perfect squares.) That would generate sorne rather wide
confidence intervals. The standard error would be
ay =

0.58
177 = 0.0725,
",64

which, when doubled to get the 95% confidence interval would leave
a rather hefty plus-or-minus 0.14S(or 14.5%) in each dir~ction. In this
circumstance, we would guess that Bush approval in the population was
33%, hut we would be 95% confident thar it was between 18.5% and
47.5% - and that alarmingly wide interval would be ¡ust too wide to be
particularIy informative.
In short, the answer to the question, "How big does my sample net;4 to
be?" is another guestion: "How ticht do you want yQur confidence interxals

!2,.he?"
A LOOK AHEAD: EXAMINING RELATIONSHIPS
BETWEEN VARIABLES

Let's take stock for a momento In this book, we have emphasized that
political science research involves evaluating causal explanatjQos, which

132

I

Statistical Inference

133

entails examining the relationships between two or more variables. Yet,
in this chapter, aH we have done is talk about the process of statistical
inference with a single variable. This was a necessary tangent, because yy.e
had to teach you the logic of statistical jnference that is, how we use
samples to learn something about an underlyjng population.
In Chapter 8, you willlearn three different ways to move into the world
of bivariate hypothesis testing. We will examine relationships between two
variables, typicaHy in a sample, and then make probabilistic assessments
of the likelihood that those relationships exist in the population. The logic
is identical to what you have just learned; we merely extend it to cover
relationships between two variables. After that, in Chapter 9, you will
learn one other way to conduct hypothesis tests involving two variables the bivariate regression modelo

Exercises

5. If we take a representative draw of 1000 respondents from the population oE

the United States for a particular survey question and obtain a 95% confidence
margin, how many respondents would you need to draw from the population oE
Maine to obtain the same interval, assuming that the distribution oE responses
is the same for both populations?

CONCEPTS INTRODUCED IN THIS CHAPTER

68-95-99 rule
census
centrallimit theorem
confidence interval
frequency distribution
normal distribution
population

random sample
sample
sampling distribution
standard error
standard error of the mean
statistical inference

,
J

j

EXERCISES

1. Go to http://www.pollingreport.com and find a polling statistic that interests
you mosto Be sure to c1ick on the "ful! details" option, where available, to get
the sample size for the survey item that most interests you. Then calculate the
95% and 99% confidence intervals tor the population value of the statistic you
ha ve in mind, showing al! of your work.
2. For the same survey item, what would happen to the confidence interval if the
sample size were cut in half? What would happen instead if it were doubled?
Show your work.
3. Are more data always better than less data? Explain your answer.

4. Refer back to Table 6.2, which shows the incumbent vote percentage in U.S.
presidential elections. Calculate the standard error of the mean for that distribution, and then construct the 95% confidence interval for the population mean. What does the 95% confidence interval tell us in this particular
case?

1

]

i
!

135

8.2 Choosing the Right Bivarlate Hypothesis Test

Table 8.1. Variable types and appropriate bivariate hypothesis tests

.

Independent variable type



Bivariate Hypothesis Testing

,variable type

Categorical
Continuous

CategoriÓal "

Continuous

Tabular anaIysis

Probitllogit

.Difference 01 means

Corre1ation coeflicient;

bivariate regression model

Not~: TeSts in italics arediscussed ~ ·thischaPte~.

~ÓYERVIEW .

.

.

,.'

the first principIe for establishing causal relationships. Namely, bivariate
hypothesis tests help us to answer fhe guestion, "Are X and Y related?" By
definition - "bivariate" means "two variables" - these tests cannot help us.
with the important guestion, "Is there sorne confounding variable Z dw-t
is related to both X and Y and makes the observed assocíation betweeJl-X
and Y spurio!!s?"
Despite their limitations, the technigues coyered in this chapter are important starting points for understanding the underlring logic of statistical
hypothesis testín&- In the sections that folIow we discuss how one chooses
which bivariate test to conduct and then provid~ detailed discussions of
three such tests. Throughout this chapter, try to keep in mind the main
purpose of this exercise: We are attempting to apply the lessons of the previous chapters with real-world data. We wiII eventuaIly do this with more
appropriate and more sophisticated tools, but the lessons that we learn in
this chapter wiIl be crucial to our tmderstanding of these more advanced
methods. Put simply, we are tryirlg to get up and walk in the complicated world of hypothesis testing wirh real-world data. Once we have mastered walking, we wiII then begin to work on running with more advanced
techniques.

. .

Once we have set up a hypothesis test and collected data, how do we
evaluate what we have found? In this chapter we provide hands-on discussions of the basic building blocks used to make statistical inferences
about the relationship between two variables. We deal with the oftenmisunderstood topic of "statistical significance" - focusing both on what i¡
is and what it is not - as well as tbe nature of statistical uncertainty. We introduce three ways to examine relationships between two variables: tabular
~~~~~~~~~~~~~~~~~~~~~~~~~~s.
technique, bivariate regression analvsis. in

t .:0
~

BIVARIATE HYPOTHESIS TESTS AND ESTABLISHING
CAUSAL RELATIONSHIPS

In the preceding chapters we introduced the core concepts of hypothesis
tesring. In this chapter we discuss the basic mechanics of hypothesis testiug
wirh three different examples of biyarjate h)!'pothesis testing. It is worth
noting that, although this type of analysis \Vas the main form of hypothesis
tcsring in rhe professional journals up through the 1970s, ir is seldom used as
the primary means of hypothesis tesring in the professional journals today.!
This is rhe case because these techniques are good at helping us with only
1 By definition, researchers conducting bivariate hypothesis tests are making one of two

assumptions about the state of the world. They are assuming either that there are no
other variables that are causally related to the dependent variable in question, or that,
if thcre are such omitted variables, they are unrelated to the independent variable in the
mode!. We will have much more to say about omitting independent variables from causal
models in Chapter 10. For now, bear in mind that, as we have discussed in previous
chapt<¡rs, these assumptions rarely hold when \Ve are describing the political world.

134

_:"

CHOOSING THE RIGHT BIVARIATE HYPOTHESIS TEST

As we discussed in previous chapters, and especialIy in Chapters 6 and
7, ~searchers make a number of critical decisions before theL test their
hypotheses. Once they have collected their data and want to g:>nduct a
!>ivariate hypothesis test, they need to coosider the nature of their de~ndent
and independent variables. As we discussed in Chapter 6, we caJl cl_fy
~aríables in terms of the trpes of valiIes that cases take on. Table 8.1 shows
four different scenarios for testing a bivariate hypothesis; whích o~s
most appropriate depends on the variable o/pe of the independent variable
and the dependent variable. For eách case, we have lísted one or more
appropriate type of bivariate hypothesis tests. I~ cases in whjch we .caH

136

Bivariate Hypothesis Testing

137

/

8.3 AH Roads Lead to p

~

our sample data if there were truly no relationship between them jn thc
unobserved population. Thus, the lower the p-value, the greater confidl!nce
we have that there is a systematic relationship between the two variables
{or which we estimated the particular p-value.
One common characteristic across statistical techniques is that, for a
particular measured relationship, the more data on which the measureOlcnt
is based, the lower our p-value will be. Tbis is consistent with one of the
lessons of Chapter 7 about sample size: The larger the sample size •...the
more confident we can be that our sample will more accurately represent
the population.2 (See Subsection 7.3.1 for a reminder.)

m~ The Limitations of p-Values
_ : . , ALL ROADS LEAD TO P

One common element across a wide range of statistical hypothesis tests is
the p-value (the stands for" robabili ." This value ran in
an 1, is the closest thing that we have to a bottom ¡¡ne in statistic~But
it is often misunderstood and misused. In this section we discuss the basic
togic of the p-value and relate it back to our discussion in Chapter 7 of
using sample data to make inferences about an underlying population.

¡..8.3:1 : The Logic of p-Values

J

If we think back to the four principies for establishing causal relationships
that were discussed in Chapter 3, the third hurdle is the guestion "Is there
apply
covariation between X and Y?"
stan ards to real-world data fordeterminin whether there a e s to be
a relations i between our two variables the inde endent variable X and
the ependent variable Y The tests listed in the ceUs in Table 8.1 are
commonly accepted tests for each possible combination of data type. In
each of these tests, we foUow a common logic: We compare the agual
relationship between X and Y in sam le data with what we would ex ec
to n i
an
were not related in the underl in o ulation. T e
more i erent the empiricaUy observed relationship is from what we would
expect to find if there were not a relationship, the more confidence we have
that X and Y are related in the populatiQJl. The logic of this inference from
sample to population is the same as what we used in Chapter 7 to make
inferences about the population mean from sample data.
The statistic that is most commonl associated with thi
exercise is th -value The whi h an es
e
probability that we would see the relationshi that we are findin because
o random chance.. Put another way, the p-value tells us the probability
that we would see the observed relationship between the two variables in

l

I

Although p-values are powerful indicators of whether or not twQ variables
are related, they are limited. In this subsection we review the limitatioos of
p-values. It is important that we also understand what a p-value is noto The
logic of a p-value is not reversible, In otber words, p - GOl does not mean
that there is a .999 chance that somethin s stematic is
s
important to rea ize that. although a p-yalue tells us something abQu,t our
confidence that there is a relationship between two variables, it does not
teU us whether that relationship is causal.
In addition, it might be tempting to assume that, when a p-valu~ is
very close to zero, this indica tes that the relationship between X and Y is
very strong. This is not necessarily true (though it might be truel. As ~c
previously noted, p-values represent our degree of confidence that there.. is
a relationship in the underlying population. So we should naturally exru:ct
~aller p-values as our sample sizes increase. But a larger sample size dg,es
not magically make a relationship strongerj it does increase our confide(lce
that the observed relationship in our sample accurately represents the .llnderlying populatioIk We saw a similar type of relationship in Chapter 7
when we calculated standard errors. Because the number of cases is ill. the
~enominator of the standard error formula, an increased number of cases
leads to a smaller standard error and a more narrow confidence interval
~r our inferences about the population.
Another limitation of p-values is that they do not djrectly refleq the
gualjty of the measurement procedure for our variables. Thus, if 'W! are
more confident in our measurement
l be
ent
a
partjcu ar p-value. The fljp side oE this is that. iE we are nor very confident
i~ our measuremenr of one or both oE our variables, we shpuld be less
confident in a particular p-vall1e.
2 AIso, the smaller the sample size, the more

very representative of [he population,

Iikely it is that we will get a result that is not

138

Blvariate Hypothesls Testlng

139

Finally, we should keep in mind that p-vaIues are aIways based on
the assumption that you are drawing a perfectIy random sample from the
underlying population. Mathematically, this is expressed as

Pi

about what we wouId expect to observe if our theory was incorrect," Thus,
following the Iogic that we previousÍy outlined, if our theory-driven hypQlltesis is that theee is covariation between X and y, then rhe corresponding
nun hypothesis is that there is nocovariation between X aod y In this
context, another interpretation of the -value is that it conve s the 1 I of
coofidence with whic we can rejed the null hypothesis.

= P Vi.

This translares into "rhe probability of an individual case from our population ending up in our sample, P;, is assumed to egual P for a1l of the
individual cases i." If this assumption were valid, we would have a truly
random sample. Because this is a standard that is almost never met, we
should use this in our assessment of a particular p-value Tbe 6Jttkf we
are from a truly random sample. the less confidence we should have in our

_:,1

As we outlined in the preceding subsection, lower p-values increase..our
confidence that there is indeed a relatjonshjp between tbe tWQ variables in
quesrion. A common wa of referrin
t the
relationship between the two variables is tatisticall si nifica Although
tbis type of statement has ariO(: of authoritative finality. it is always a
qualified statement. In other words, an assertion of statistical significance
depends on a number of other factors, Qne oí tbese factors is tbe se.t of
assumptions from the previous section. "Statistical sjgnjficance" kachieved
only to the extent that the assumptjons underlyjng tbe calcularion of the
p-value hold. Inaddition, there are a variety of djffereot standaros for what
is a statistically significant p-yalue. Most social scientists use tbe standard
oí a p-value 'of .05. If pis less than .05, they consider a relationship to be
statistically significant,Others use a more strin ent standar f .01, or a
more oose standard of,J.
We cannot emphasize strongly enough that finding that X and Y have
a statistically significant relationship does not necessarily mean that the
relationship between X and Y is strong or especially that the relationship is
causal. To evaluate the case fur a causal relationship, we need to evaluate
how well our theory has performed in terms of all four of the causal hurdles
from Cbapter 2.

¡

I
f
l

8.3.4

The Null Hypothesis and p-Values

In Chaptee 1 we introduced the concept of the null hypothesis. Our definition was "A null hypothesis is also a theory-based statement bu.tJt is

THREE BIVARIATE HYPOTHESIS TESTS

We oow turo to three specific bivariate hypothesis tllstS. In each case, we ~e
testiog lor whether there is a re1ationship between X and y, We are doing
this with sample data, and then, based on what we find, making,infereoces
about the uoderlyiog population.

p-~.

8.3.3: From p-Values to Statistical Significance

8.4 Three Bivariate Hypothesis Tests

lt.!~:¡;!]

Example

Tabular preseotations of data 00 two variables are still used guite widejy.
In the more recent political science literature. schoIars use them as steppihg
stones 00 the way to multivariate analyse&. It is worth noting at this point
in the process that, in tables and graphs, most of the time the dependent
variable is displayed in the rows along the vertical dimension whereas the
independent variable is displayed in the columns across horizontaLdimenAny time that you see atable, it is very important to take sorne time
to make sure that you understand what is being conveyed in the table. We
can break this into the following three-step process:

e

1. Figure out what the variables are that define the rows and columns of
the ta.,hl.e.
2, Figure out what the individual cell values represento Sometimes they
will be the number of cases that take on the particular row and coluron
values; other times the cell values will be proportions (ranging from
O to 1.0) or percentages (ranging from O to 100), If this is the case,
it is critical that you figure out whether the researcher calculated the
percentages oc proportions for the entire tabIe or for each coIumn or
row.
3. Figure out what, if any, general patteros you see in the table.
Let's go through these steps with Table 8.2. In this table we are testing
the theol)' that affiliadon with trade unioos make people more likely to
support left-Ieaning candidates." We can tell from the title and the coluron
4

3 More ,recently, there has been a trend toward reporting the estimated p-value and letting

readers make their own assessments of statistical significance.

l:~ar An~

Take a moment to assess this theory in terms of the first two of the four hurdles that
we discussed in Chapter 3. The causal mechanism is that left-Ieaning candidates tend to
support policies favored by trade unions. Is this credible? What about hurdle 2? Could

I

l'
I

140

Bivariate Hypothesis Testing

141

"1,--

Table 8.2. Uníon households and vote in the 2004
U.S. presidentíal election

8.4 Tbree Bivariate Hypothesis Tests

Table 8.3. Gender and vote in the 2004 U.S.
presidential election: Hypothetical scenario

': ".-:

~ From á umón:"<,· ': ,~ot .~~~ "á. ,;~~~¡>~
' housepold ,'~',~"r':' ' :~oku '

Candidate>'househ~ld
Kerry

'64.24,

Bush

35.76

Total

, 100.00

"

45.73
54.27'
100.00

49.19
50.81
100~ÓO

Kerry'

?

? .

Bush
Column total

?

?

100.00

'100.00'

49.20
50.80
100.00

Note: Cell entries are column percentages:
Note: Cell entrles are column percentage~.

and row headings that this table is comparing the votes of people from
union households with those not from union households in the 2004 U.S.
presidential election. We can use the information in this table to test the
hypothesis that voters from umon households were more hkely to Su"pport Democratic Party presidential candidate john Kerry. s As the first step
in reading this table, we determine that the columns indicate value00r
the independent variable (whether or not the individual was from a union
household) and that the rows indicate values for the dependent variable
{presidential vote). The second step is fairly straightforward¡ the table.J:ontains a footnote that tells us that the "cell entries are column percentage§.."
This is the easiest format for pursuing step 3, beca use the column percentages correspond to the comparison that we want to make. We want to
compare the presidential votes of people from union households with the
presidential votes of people not from union households. The patteen is fairly
c1ear: People from the union households overwhelmingly sUPPQrted Ken:y
(64.24 for Kerry and 35.76 for Bush), ~hereaspeople from the nonu.!ll.on
households strongly favored Bush (45.73 for Kerry and 54.27 for Bush).
Ir we think in terms of independen~ (X) and dependent (Y) variables, the
comparison that we have made is between the distribution of the dependent
variable (Y = Presidential Vote) across values of the independent variable
(X = Type ofHousehold).
In Table 8.2, we follow the silll.ple convention of placing the values of
the independent variable in the columns and the values of the dependent
variable in the rows. Then, by calc~lating column percentages for the cell
values, this makes comparing across the columns straightforward. It is
wise to adhere to these norms, because it is the easiest way to make the

-

support for leh-Ieaning candidates make one more Iikely to be affiliated with a trade
union?
5 What do you think about the operationalization of these two variables? How well does
it stand up to what we discussed in Chapter 4?

comparison that we want, and because it is the way many readers will
expect to see the information.
In our next example we are going to go step by step throu ha bivariate
t,est of the hypot eSls t at gen er
IS re ate to vote (Y) in U.S. presidential elections. To test thls hypothesls about gender and presidentia1llUtc, ~e
are going to use data from the 2004 NES. This is a D appropriate set of data
for testing this h othesis beca use these data are from a randoml selectcd
samp e of cases from the underlying population of interest (U.S. adults).
Before we look at results obtained by using actual data, think briefIy abour
the measurement of the variables of interest and what we would expect to
find if there was no relationship betweenthe two variables.
Table 8.3 shows partíal information from a hypothetical example in
which we know that '!2,.2 % of our sample respondents report having votcd
for John Kerry and 50.8% ol our sample respondents re!?ort having voted
for George W. Busb. But, as the guestion marks jnside tbis rabIe indicate,
we do not know how voting breaks down in terms of gendec. If there was
no re1ationshi between gender and presidential voting in 2004, consider
w at we would expect to see given w at we know fmm Table 8.3. In other
words, what values should re place the guestion marks in Tabh:. 8.3 if there
were no relatjonsbip between OIJ[ iDdepeDdent variable PO and depcndent
v,rujabJe (~
If there is not a relatjoDsbip betureen.gender and presidential vote.. then
we should expect to see n ma'or differences between males and f males in
terms of how t ey vote fue JOAA I(;e~nd George W. Bush. Because we
know that 49.2% of our cases voted for Kerrv and 50.8% for Bush, what
s~ould we expect to see for males and for female~ We should elCpect ro
see the same proportions of males and females yoting fue each C3odidate.
In other words, 'f!: should expect to see the guesti oD marki reflleeed with
,lhe values in Table 8.4.
Table 8.5 shows the total numbers of respondents who fit into.each
column and row from the 2004 NES. If. we do the calculations , we can see
t~at the numbers in the rightmost column ofTable 8.5 correspon~the

142

143

Bivariate Hypothesis Testing

8.4 Three Bivariate Hypothesis Tests

1I
I

¡,

Table 8.8. Gender and vote in the 2004
U.S. presidential election

, Table 8.4. Gender and vote in the 2004 U.S.
)iresidentlal eleetion: Expeetations for hypothetical
:,~cenario if there were no relationship
Candidate'

Male

Female'

Kerry
Bush
Column total

49.20
50.80
100.00

49.20
50.80
100.00

Candidata':;' . Male

"

, , ; • ,,' ;-:~ Female

. Rowtotal
Kerry
Bush

49.20
50.80
100.00

" 0== 170; E =181"
0= 229; E;= 215
0:'7204; E=190 : ' , 0= 208; E =222

Note: Cen enm~;:ei~ ~he ~~er obsexV8d (O); the number
expectedif therew~r~ no réu;,tiO~hip(E).:;
,

"~"~

Note: cen entrlesare column percentages.

;Table 8.5. Gender and vote in the 2004
:'Ú:S:-presidential
election
"\'~

Candidate,

Male

Female

Rowtotal

Kerry
Bush
Column total

?
?
374

?
?
437

399
412
811

Note: cen enmes are number of respondents.

,. Táble 8.6. Gender and vote in the 2004 U.S.
;; ptesidelltial electioll: Calculating the expected cell
t, values if gender alld presidential vote are unrelated

I
i,

I

~

I

¡
t

" ,.....

Candidate

Male

Female

Kerry

(49.2% of 374)
0.492 x 374 184
(50.8% of 374)
= 0.508 x 374 = 190

(49.2% of 437)
0.492 x 437
(50.8% of 437)
= 0.508 x 437

Bush

,I

=

=

=

¡

l
1:

f

=215
= 222

Note: cen entries are expectation calculstions if these two
variables are unreleted.

To answer this uestion we turn tó
chi-s uare
) ~tfortabl113r
,association. Karl Pearson originally developed this test when he was testing
theories about the influence of nature vérsus nurture ar rbe begioojog of.the
20th century. His formula for the X statistic.is

Table 8.7. Gender and vote in the 2004
U.S. presidential election
Candidate

Male'

Female

Rowtotal

Kerry
Bush
Column total

170
204
0.4612

229
208
0.5388

0.492
0.508
1.0

Note: cen entries are number of respondents.

I

percentages from Table 8.3.. We can now combine the information trom
Table 8.5 with our expectations from Take 8.4 to calculate the number
of respondents that we would expect to see in each cell if gender and
presidential vote were unrelated. We display these calculations in Table
8.6. In Table 8.7, we see the actual numbers of respondents that fell into
each of the four cells.
Finally, in Table 8.8. we compare the observed number of cas§in each
cell (O) with the number of cases that we would x ct to see if ere was
no re ationship between our independent and dependent variables (E).
We can see a pattern. Among males, the proportion observed voting
for Kerry is lower than what we woulcl expect if rbere were no relatiollship
between the two variabks. Also, among men, the proportion voting for
Bush is higher than what we would expect if there were no relationship.
For females this pattern is reversed - the ro ortion voting for Ker
h)
is ig er (Iower) t an we would ex ect if t er
'0 hi
tween gen er an vote or U.S. presidént. The pattern of these differences
is in line with the theory that women suPPOrt Democratic Party candidates
more than men do. Although these differences are present, we have not yet
determined that they are of such a magnitude that we should now have increased confidence in our theory. In other words, we want to know whether
.
• . 11 i .

The summation sign in this formula signifies that we sum over each
cell in the table; so a 2 x 2 table wOllld baye foyr cells ro add IIp If we
think about an individual cell's cOntribution 10 {his formula. we can see the
underlying logic of the X2 test. !!.!he value observed, 0, is exactly egual.to

¡'

145
144

we have cleared the third hurdle, by demonstrating that X e er) and
y (vote) coyaqt. From what we know a out politics, we can easily cross
hurdle 1, "Is there a credible causal mechanism that links X to Y?" Wo en
might be more 1 e t
r er beca use am
en
depend on the social safety net of the welfare state more than men do. If
we turn to hurdle 2, "Does Y cause X?," we can pretty easily see that we
have met this standard through basic logic. We know with confidence that
changing one's vote does not lead to a change in one's gender. We hit"'the4"
most serious bump in the road to establishing causalit,v for tbi5:relationship
when we encounter hurdle 4, "Is there sorne Z associated wjJ:h X and Y
that makes tbe relationshill between X and Yspurious?" Unfortuoatelr, our
answer here is that we do not yet know;. In fact, 'ri.!h a bivariate anal~s,
we cannot know whether sorne other variable Z is relevant beca use, by
definition, there are only two variables in such an analysis.

the expected value if there were no ,:elationshi between the two variables,
E, t en we would get a contributión of zero from that ce11 to the overa11
formula (because a E would be zero). Thus, if a11 observed valu~s we~e
exactly equal to the values that we expect if there were no relatlOnS p
between the two vana es, t en X
O. T e more the a values differ from
r
the E values, the greater the value will be for X2. Because the nu.merato
on the right-hand side of the X2 formula a - E) is squared, any dlfference
wi cont'
osi .
vera11
value.
etween a an
Here are the calculations for X2 mode with the values in Table 8.8:
2

,,(a-Ef

X=~

=
-

8.4 Three Bivariate Hypothesis Tests

Bivariate Hypothesis Testing

(170 _ 184)2
184
196

,..1.

E

196

+

(204 - 190)2
190

196

+

(229 - 215)2
215

196

+ 190 + 215 + 222
= 1.065 + 1.032 + 0.912 + 0.883 = 3.892.

= 184

+ (208 - 222)2
222

t.!E.;¡;~~ Example 2: Difference of Means

I

In our second exam le we examine a situation in which we have continuous de endent variable and a li ite In e e en varia. In this type
o lvanate hypothesis test, ~ are ooking to determine wheth~ the di stribution of the dependent variable is d¡fferent aCtQss tbe vallles- of the
ind~pendent variable We follow the basic logic of hypothesis testing: ~-I
panng our real-world data wjtb wbat we would eXllect to find if there were
no relationship between our indenendent and dependent variables.
aur theory in this section will come from the study of parliamentary
governments. When polítical scientists study phenomena across different
forms of government, one of the fundamental distinctions that they draw
between different types of democracies is whether the regjme is parliamentary or no!:....A democratic regime is labeled "parliamentary" when the
lower house of the legisla tu re is the most powerful branch of government
and directly selects the head of the government7 ane of the interesting
7

6

We define degree of freedom in the next section.

An important part of research design is determining which cases are and are not covered
by our theory.ln this case, our theory, which we will introduce shortly, is going to apply
to only parliamentary democracies. As an example, consider whether or not the United
Sta tes and the United Kingdom fit this description at the beginning of 2007. In the United
States in 2007, the head of government was President George W. Bush. Because Bush
was selected by a presidential election and not by the lower branch of governmenr, we
can already see that the United States at the beginning of 2007 is not covered by our
theory. In the UK, we might be tempted at /irst to say that the head of government at
the beginning of 2007 was Queen Elizabeth. But, if we consider that British queens and
kings have been mostly ceremonial in UK politics for some time now, we then realize that
the head of government was the prime minister, Tony Blair, who was se1ected from the
lower house of the legislature, the House of Commons. If we further consider the relative

146

Bivariate Hypothesls Testing

features of most parliamentary regimes is that a vote in the lower house
of the legislature can remove the government from power. As a result, political scientists have been very interested in the determinants of how long
parliamentary governments last when the ossibili of such a vo~.
One factor that is an important difference across parliamentary emocracies is whether the party or parties that are in government occupy a majority of th~eats in the legislature. 8 By defimtlOn, the opposition can vote
out oE office a minority government, because it does not control a majority
of the seats in the legislature. Thus a pretty reasonable theorr about goyernl!!ent duration is that majori'1' goyernments willlast longer than. minority
governments.
We can move from this theory to a hypothesis test by using a data s,tl
produced by Michael D. McDonald and Silvia M. Mendes titled "Gover!lments, 1950-1992'." Their data set covers governments from 21 Western
countries. For the sake of comparability, we willlimit our sample to those
governments that were formed after an election. 9 Q.ur inde.peodeor yarjaWe,
"Government Te," takes on one of two values: "ma'ori
overnment"
oc "minorit
ov
e endent variable "G
uration," is a continuous variable measuring the number of days tbat each
government lasted in office. Although this variable has a hypothetical range
~m 1 dar to 1461 days~ the actual data vary from an Italian government
that lasted foc 31 days i!11953 to a Dutch government that lasted for 1749
days in the late 1980s and early 1990s.
To get a better idea of the data that we are comparing, we can turn to
two graphs that we introduced in Chapter 6 foc viewing the distribution of
continuous variables. Figure 8.1 presents a box-whisker piot of government
duration for minority and majority governments, and Figure 8.2 preseots a
kernel density plot of government duration tor mjoority and majority governments. From both of these plots, it appears that majority gov~rnments
last longer than minority governments.
T o determine whether the

147

8.4 Three Bivariate Hypothesis Tests

majority

Figure 8.1. Box-whisker plat oE Governmenr Duration for majority and minority governmenrs.

co

8

C!

,,
,

<D

.~
U)
c:

o
o
C!

,,

ID

E

::2

,,
,, ,
,
,,
,,
,

,

Q)

o

\,

1'"

'.

~

'- ,,'

o

o
C!

'

,,

,,
,,
,,
,,
,
,,
,,
,,
,
,,

C\I

O

~".:

.•

:~

. :~
,'
¡

,,

,,
,,,
,
,,

o
o
C!

power oE the House oE Commons compared with the other branches oE government at
the beginning oC 2007, we can see that the United Kingdom met our criteria Cor being
c1assified as parliamentary.
.
8 Rescarchers usually define a party as being in government iC its members occupy one or
more cabinet posts, whereas parties not in government are in opposition.
9 We have also limited the analyses to cases in which the governments had a legal maximum
oC Cour years beCore they must call Cor new elections. These limitations mean that, strictly
~peaking, we are only able 10 make inCerences about the population oC cases that also fit
these criteria.

. ·.minority

:i

,,
,,
,

i

:1
500

1,000
1 500
Number 01 Days in Governmenl '

1--- minority

- - - - -- majority

2,000

I

Figure 8.2. Kernel density plot oE Government Durarion for majority and minority
governments.

148

:I

Bivariate Hypothesis Testing

/

if there were no relationship between Government Type and G?vernment
Duration. If there were no relatiollship between tbese two variables, then
the world would be su~h rbat the durarion of governments of both D'pes
~re drawn from tbe same underlyjng distrjbyriOD.. If tbis were the case,
t~ mean or average yalue of Goyeroment Duration would be thvame for
minority and majority governm~s.
To test the hypothesis that these means are drawn from the same
underlying distribution, we use another test developed by Karl PearsonJor
these purposes. The test statistic for rbis is known as ij¡esp because it
follows the t-distribution. The formula for this particular t-test is

where Yl is the mean of the dependent variable for the first value of the
independent variable and Y2 is the mean of the dependent variable for the
second value of the independent variable. We can see from this formula
that the greater the difference between the mean value of the de endent
variable across the two values o e ID epen ent variable, the g!Silter the
value of t.
In Chapter 7 we introduced the notion of a standard error, which
is a measure of uncertainty about a statistical estimate. The basic log,ic
of a standard error is rbar rbe lareer it is. the more uncertaintx (or les s
confidencel we have in our ability ro make precise s!atements. Similarly,
the smalle r tbe standard e¡:Fer, tae greate r our confidence about our :!illity
to make precise statements-The standard error of the difference between
1\ and Y2, se(Yl - Y2), is calculated from the following formula:
se(Y¡ - Y2) =

sr
nsI
(n ¡) + ( 2) ,

where SI is the standard deviation of Y¡, S2 is the standard deviation of Y2,
is the number of cases in the first category of the independent variable,
and n2 is the number of cases inthe second category of the independent
variable. We can see from this formula that the standard error of the difference bctween Yl and Y2, se(Yl - Y2), combines the standard deviations
for both Y¡ and Y2. ~e larger these standard deviations, the !irger the
standard error.
To better understand the contribution of the top and bottom parts of
the t-calculation for a difference of means, look again at Figure 8.2. T.k
furt'rr apan the two means are and the less dispersed the distribution!1as
nI

149

8.4 Three Bivariate Hypothesis Tests

Table 8.9. Government type and government duration
Goyemment

Numberof
:obseIVations

type·

Majority
Minority
Combined

124
53·
177

Mean
duration

Standard ..
deviation

930.5
674.4
853.8

466.1
421.4
467.1

measured by the standard deviations S and s the reater con dence we
have t at - 1 an Y2 are different from each other.
Table 8 9,presents the descriptive statistics for goveroment duratíon by
government type. From the values displayed in this table we cao cakulate
tbe t-test statistic foc 011[ hypothesis test. SUrt wirb rbe standard error for
th~díffere~e:

se(Y¡ - Y2)

=
=

(~) + (*) = (~) + (42;~42)
(217249.21)
124

+

(177577.96)
53

= .J1752 + 3350.5 = .J5102.5 = 71.43.
Now that we have the standard error, we can calculate the t-gatistic:
Yl - Y2

t

= se(Y¡ _

Y2)

=

930.5 - 674.4
71.43

256.1

= 71.43 = 3.59.

Now that we have calculated this t-statistic, w~ need one !!l0re piece
of information before we can get to our .p-value. This is called !be degrecs
of freedom (df's). Degrees of freedQQ.l reflect the basic idea rbar we wilJ
gain confidence in an obseryed partero as the amount of data 00 wbjch that
pattern is based increases In other words, as our sample size increases, we
becom~dent about our ability to say things about rbe underlying
population.1f we turn to Appendix B, which is a ta ble of critical va IJles
f,2!:J. we can see that it reflects this logic. This table also follows rhe same
~c logic as [be X2 rabie ¡he way to read such a ta6fe is thar the CQIUffiUS
~ defined by targeted p-values, and, to achieve a particular targer p-value,
~u need to obtain a particular value of t. The rows in the Hable indicate
the number of degrees of freetiom-As the number of de~rees of frcedom
goes up, the t-statistic we need to obtain a particular p-value goes down.
We calculate the degrees of freedom for a difference of means t-statistic

150

Bivariate Hypothesis Testing

based on the smaller oE the two samples in terms of number oE cases min!Js
one. In this case, beeause we baye 53 mjnority ~overnments and 1241!lajority governments, our degrees of freedom egual 53 - 1, or 52. From the
p-value, we can look aeross the
r w ieh df - 50 and see the minimu
tova ue needed to aehieve eaeb targete!l value of p.lO In the seeond eolumn
of the Hable. \Ve can see rbat, ro baye a p-vaJue of .10 (meaning that there
is a 10%, or 1 in 1o, cbance tbat we would see tbis relationship randomly
in our sample jf there was no relationship between X and Y in the underlying population), \Ve must bave a t-statistjc greater than or egual to 1.2,2,9.
Because 3.59 > 1.299, we can roeeed to the next column for p = .05 d
see t at 3.59 is also greater than 1.676. In fact, if we o all the way to
t le nd oÍthe row for dí SO, we can see that our t-statistic iureater
than 3.261, which is the t-value needed to aehieye p - QJ)L(meaning that
iliere is a 0.1 %, or 1 in 1000, chance that we \Vould see this relationsbip
randomly in our sample if there were no relatjonshjp between X and y in
the underlying populati2,n).

151

8.4 Three Bivariate Hypothesis Tests

..
• •

- ...



...





••

••

"

•• •••

..-. ••
••





~.




o
M

Lr--------r--------r--------.--------r--------.----

-15

-10

-5

O

5

10

Percentaga Changa in Real GDP Par Capita

Figure 8.3. Scatter plot oí change in GDP and incumbent-parry vote share.

8.4.3j Example 3: Correlation Coefficient

eontinuous. We test e hypothesis that there is a positive relationship bef
. U..
S preS!' d
' I
tween~
eeonomic growt h an d'mcum bent-partyortunes
m
ent1a
electio..!!§,..
In Chapter 6 we discussed tbe variation (or variance) of a single variable, and in Cha ter 1
statistical measure of covari
we have looked at so far, we have
ing rom a union household and presidemiaJ vote ¡:ender and presidential
vote, and government type and government duration. AH of these examples
used at least one limited variable. When we have an independent variable
and a dependent variable tbat are b~ continuous... we can visu~ detect
Zovariation pretty easily in graphs. Consider the graph in Figure -ªJ, which
shows a scatter plot of incumbent yate aDd ecpnoroic ~rowth. When we
look at this graph, Wk.8eneraJly: see a pattern tbar mDs from lower:left to
upper-right. This indicates that, as expected by our hYE,othesis. when me
economy is doing better (more rightward values on the horizontal axi!if,

,lo<::;;:>

our degrees of freedom equal 52, we are using the row for dE = 50. With a
computer program, we can calculate an exact p-value.

10 Although

we also tend to see higher yate percenta~es for the incumbent party in U.S.
presidential elections hi her values on the vertical a is).
ovarlance is a statistical wáy of summarizing the general pattern of
association (or the lack thereof) hetween two variables. The formula for
\ covariance between two variables X and Y is
cov)(Y

= E7=1 (X¡ - n X)(Y¡ -

Y) .

To better understand the intdition behind the covariance formula, it is
helpful to think of individual cases in terms of their values relative to the
mean of X (X) and the mean of y (ir). If an individual case has a value fur the
independent variable that is greater than the mean of X (X - X > Ol and its
value for the dependent variable is greater than the mean of Y (1'; - 12:-0),
that case's contribution to the mÍmerator in the covariance eguation will
be positive. If aD indiyidual case has a value for the independent variable
that is less than tbe AleaR e{..X ( X;
X < O) and a valulI gf tbe de~ndent
variable that is less than the mean Y(l:i - Y < O). that case's cQDtribution
tsuhe nume!.í1J;9t' in che col'ariance eqbatisA mill aJs~ecause
~Itiplying two negative numbers yields a positive producto If a case has
a combmabon of one value greater than the mean and one value less than
the mean, its contribution to the numerator in the covariance equation will .
be negative because multiplying a positive number by a negative number
yields a negative producto Figure 8.4 illustrates this; we see the same plot of

of

152

CI)



CI)
o.. o10



.SI

g

,'<



•••••



~
as
o..

i:

.o ~
CI)



(- -)- +

(+ +)- +

••

• • • •• •
It



(+-)-



"

E

,

o

M

-15

o
5
-5
-10
Percentage Change in Real GOP Per Caplta

10

Figure 8.4. Scatter plot oE change in GDP and incumbent-party vote share with meandelimited quadrants.

growth versus incumbent vote, bu~ with the addition of lines showing the
mean value of each variable. In each of these mean-delimirerJ 'lHftdrants we
can see the contribution of the rases ro tbe numerator. If a plot con~
cases in mostly the upper-right and lower-Ieft guadrants, tbe ..covariance
will tend to be posi~. On the other hand, if a plot contains cases in
~be lower-rjght and upper-Ieft quadrants. the coyaJ:i.ance will tend
tg..be negative. If a plot contains a habn~ ~~: ~ ~I fou~~uadrants,
the covariance cakulation will be close
the positive and
negative values will cancel out each other. When the covariance between
two variables is positive, we describe this situation as a positive relationship
between the variables, and when the covariation between two variables is
negative, we describe this situation as a negative re\ationship.
Table 8.10 presents the calculations for each year in the covariance
formula for the data that we presented in Figure 8.4. For each year, we
have started out by calculating the difference between each X and X and
the difference between each Y and Y. If we begin with the year 1880, we
can see that the value for growth (X1880) was 3.879 and the value for vote
(Y1880) was 50.22. The value for growth is greater than the mean and the
value for vote is less than the méan, X1880 - X = 3.879 - 0.628 = 3.251
and Y1880 - Y= 50.22 - 52.27022 = -2.050217. In Figure 8.4, the dot
for 1880 is in the lower-right quadrant. When we multiply these two mean
deviatlons together, we get (X1880 - X)(Y188o - Y) = -6.665254. ~

to ;rojl;Cause

r

Tabla 8.10. Contributions of individual election years to the covariance calculation ¡

"é J900,-1.425~

E
::1

8.4 Three Bivarlate Hypothesis Tests

'¡I~~~¡)i~jjir'~¡:~fC;Y~:~~~~. .·(]:~t~

••

(-+)--.

Ol

~
~



••


• •

o

<D

ti

153

Bivarlate Hypothesis Testing

"1904,
":"2.42f:·
-';1908'-'-6.281':
'191.2
4.164
, 1916,
2.229
'1920 -11.463
1924
-3.872
4.623 '
1928
1932 -14.557
1936
11.677
1940
3.611
1944
4.433
1948
2.858
0.84
1952
-1.394
1956
1960
0.417
1964
5.109
1968
5.07
1972
6.125
1976
4.026
1980
-3.594
1984
5.568
1988
2.261
2.223
1992
2.712
1996
2000
1.603
2004
2.9
X=0.628

ñ

53.171:, ",,'~2.053")' 0.900783'5
-1.849309
60.00f·'{',
-:-3.049,7.735783
,-23.5864
54.483 ::'~~:; i" -6.909', 2.212784".' -15.28812
54.708
3.536 ,,: 2.437782
8.619998
1.601 : ' ':"'0.5882187
51.682
-0.9417382
-12.091 -16.15122
36.119,:.
195.2844
58.244 "
-4.5
' 5.973782
-26.88202
58.82
3.995
6.549782
26.16638
40.841
'-15.185 -11.42922
173.5527
62.458
11.049
10.18778
112.5648
54.999
2:983
2.728783
8.139958
3.805
53.774
1.50378
5.721884
' 0.099781
52.37
2.23
0.2225117
44.595
0.212
-7.675217
-1.627146
57.764
-2.022
5.493782
'-11.10843
49.913
-0.211
-2.35722
0.4973734
61.344
4.481
9.073784
40.65963
49.596
4.442
-2.674217
-11.87887
9.518784
61.789
5.497
52.32475
48.948
3.398
-3.322216
-11.28889
44.697
-4.222
-7.573219
31.97413
59.17
4.94
6.89978
34.08491
53.902
1.633
1.631783
2.664701
46.545"
1.595
-5.72522
-9.131725
54.736
2.084
2.465782
5.13869
50.265
0.975
-2.005219
-1.955088
51.2
2.272
-1.070217
-2.431533
y= 52.27022
DXi - X)(Y¡ - y)
= 641.6768

'!e repeat this same calculation for every case (presidential election
~ar). Each negative calculation like cbis contributes evidence that the

overall relationship between X and Y is negatjye. whereau:ach positive
calculation contributes evidence that tbe overall relationship between X
and Y is positjve. The sum across aU 32 years of data in rabie 8.10 is
641.6768, indicating that the positive vaJues have outweighOO the negative values. ,When we divide this by 32. we have the sample covariance,

154

155

Bivariate Hypothesis Testing

which equals 20.05. This teUs us that
we have a positive relationshi D. but
it does not teU us how confident_we
can be that this relationshi; is difierent from what we wQllld re ií O
Vote
Growth
in ependent and dependent variablss
Vote
36.86
were not related in our underlying
. 20.06
Growth
30.68 .
popularion oí interest. To see this,
we turo to a third test developed Karl
Pearson, Pearson's corcelario" coefficjeor. This is also koown as ~
~ formula for which i!!.,

values is a particularly useful improvement over th
.
"
11
e eovanance ealculaliol1
Add mona
y, we can calcula te a t-statistie for a eorr l '
ro .
.
e anon eoe Clenl:a~

Table 8.11. Covariance table for
economic growth and
incumbent-party presidential
yate, 1880-2004

r =

8.5 Wrapping up

t,.=

n-2
1-r

'Irlx--,
2

with n - 1 degrees of freedom, where n is the number of cases. In this case,
our degrees of freedom equal 32 - 1 = 31.
For the current example,

t,.=

-;::::====

n-2

Ir1x-1
2'
-r

4= 10.5961
co umn and the row refere"ce rbe same variable In rhis case the ceU entry
is the variance for the referepced variable. AU of the ceUs off of tbe main
diagonal represent tbe CDvaria nce for a Dair oí variables. In coyariance
tables, the ceUs aboye the maip diagonal are often lefr blank. because the
values in ~e cel~re a mirror image of rhe values io the corresponding
cells below the main diagonal. Forinstaoce, in Table 8.11 the covariance
between growth and vote is the same as the covariance between vote and
growth, so the upper-right ceU in this table is left blank.
Using the entries in Table 8.11, we can calculate the correlation coefficient:
covxy
r = -.;-;::v=a=rx=va=r=y'

4=

20.05

(~~552) ,

~.

4 = ../0.596 x 46.5261,
t,. = ../27.n96,
t,. = 5.2659.

With the degrees oí freedomequal to 32 (n - 32) miIDIs ooe, or 31,
we can now tum to tbe t-table in Appendix B. Looking., across the row
(or df = 30, we can see that oúr ealcúlated t oí $.2659 is greater even
t~n the critical t at the p-value oí .001. Thus we are quite confidgll: that
a relationship exists between economic growth and iocnwbeat-party vote
sh:re and that our theory has successfully cleared our third causal hurdle. 11

20.05
,
../36.86 x 30.68
20.05
r = -';-;::1=13;::;:0==.8==64==8'

= 33.6283'
r = 0.596.

0.596 x 1 _

4 = )0.596 x

r=

r

32-2
x 1 _ (0.596)2 '

M:I'

WRAPPING UP

We have introduced three methods to conduct bivariate h ot
est ta ular analysis, difference oí means tests, and correlation coefficie~ts.
W~h test is most appropriate in any ~iven situatioo depends on"'!pe
11 The first causal hurdle is pretty well deared if we refer back to the discussion of the theory

of economic voting in earHer chapters. The second causal hurdle also can be pretty well
deared logically by the timing of the measurement of each variable. Because economic
growth is measured prior to incumbent vote, it is difficult to imagine that Y caused X.

I

156

157

Bivariate Hypothesis Testing

Exercises

Table 8.12. Incumbent
reelectíon rates in U.S.
congressíonal electíons.

1964-2006
Year'~i'

1964
1966 ,
1968
1970
1972
1974
1976
1978
1980'
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006

Figure 8.5. What is wrong with this table?

~asurement me trie of your independent 2nd dependent yariables.
Table 8.1 should serve as a helpful reference for you on this front.
We have yet to introduce the final method for conducting bivariate
hypothesis tests covered in this book, namely bivariate regression analysis.
That is the topic of our next chapter, and it serves as the initial building
block for multiple regression (which we will cover in Chapter 10).

CONCEPTS INTRODUCED IN THlS CHAPTER

chi-squared (X2 )
correlation coefficient
covariance
critical value
degrees of freedom
difference of means

difference of means test
Pearson's r
p-value
statistically significant
tabular analysis
truly random sample

Senate

88
97
85
94
88
96
94
91
90
95
98
98
96
88
90
94
98
98
96
98
94

85
88
·71
77
74
85
64
60
55
93
90
75
85
96
83
92
91
90
79
86
96
79

EXERCISES

1. Take a look at Figure 8.5. What is the dependent variable? What are the
independent variables? What does this table tell us about politics?
2. What makes the table in Figure 8.5 so confusing?
3. Build a crosstab from the information presented in the following hypothetical
discussion of polling results: "We did a survey of 800 respondents who were
Iikely Democratic primary voters in the state. Among these respondents, 45%
favored Obama whereas 55% favored Clinton. When we split the respondents
in half at the median age of 40, we found some stark differences: Amoog the
younger half of the sample respondents, we found that 72.2% favored Obama
to be the nominee and among the older sample respondents, we found that
68.2 % favored Clinton."
4. For the example in Exercise 3, test the theory that age is related to preference
for a Democratic nominee.

i

1

:J
j

1


i
1

1

~
i

;

l1
j



¡

!

1

,J

158

Bivariate Hypothesis Testing

5. A lot of people in the United States think that the Watergate seandal in
1972 eaused a sea change in terms of U.S. citizens' views toward incumbent
politicians. Use the data in Table 8.12 to produce a difference of means test of
the nuJl hypothesis that average reelection rates were the same before and after
the Watergate scandaJ.
6. Using the data set "BES2005 Subset," produce atable that shows the combination values for the variables "LabourVote" (Y) and "IraqWarApprovaIDich"
(X). Read the descriptions of these two variables and write about what this
table tells you about politics in the UK in 2005. Compute a X2 hypothesis test
for these two variables. Write about what this tells you about politics in the
UK in 2005.
7. Using the data set "BES200S Subset," test the hypothesis that values for "BlairFeelings" (Y) are different across different values of "IraqWarApprovalDich"
(X). Read the descriptions of these two variables and write about what this
table te lis you about politics in the UK in 2005.
8. Using the data set "BES200S Subset," produce a scatter plot of the values for
"BlairFeelings" (Y) and "SelfLR" (X). Calcula te a correlation coefficient and
p-value for the hypothesis that these two variables are related to each other.
Read the descriptions of these two variables and write about what this table
tells you about politics in the UK in 2005.



Bivariate Regression Models

OVERVIEW

Regression models are the workhorses oí data analysts in a wide range of
fields in the social sciences. We begin this chapter with a discussion of fliting
a line to a scatter plot oí data, and then we discuss the additional inferences
that can be made when we move. from a conelation coefficient to a twovariable regression model. We include discussions oí measures of good..ill1ssof-fit and on the nature oí hypothesis testing and statistical significasce in
regression models. Throughout this chapter, we present important concepts
in text, mathematical formulae, aÍld graphical illustrations. This chapter
concludes with a discussion of the assumptions of the regression model
and mínimal mathematical requirements for estimation .



"

TWO-VARIABLE REGRESSION

In Chapter 8 we introduced three différent bivariate hypothesis tests. In
this chapter we add a fourth, twd-variable regression. This is an important first step toward cbe myltiple regression model .. which is the topie of
Chapter 10 - in whieh we are able to "control for" another vari ble Z s
en
st
we measure the re atlons ip between ou
aIi01nt.r derfttkut ~e...Ll') .. It is crucial to develop an in-depth
understandmg o two-vanable regresslOn before moving to multiple regression. In the seetions that follow, we begin with an overview of the twovariable regression model, in which a line is fit to a seatter plot of data. We
then discuss the uncertainty associáted with the line and how we use various measures of this uncertainty to make inferences about the underlying
population. This ehapter concludes with a discussion of the assumptions
of the regression model and the mínimal mathematical requirements for
model estimatíon.
159

160

Bivariate Regression Models

161

.fl FITTING A LINE: POPULATION {} SAMPLE
The basic idea of two-variable regression is that we are fittim¡ the "best" line
through a sca;er plot of data. This !jne, which is de6ned bf jts slope and
y-intercept, serves as a statistical model of reality. In this sense, n,yo-variahle
\ regression IS very dlfferent from the three hypothesis-testing tec niques that
we introduced in Chapter 8; although those tec nigues allow hypothesis
testing, they do not produce a statistical model. You may remember from
a geometry course the formula for a line expressed as

í

Y=mX+b,

j

)

where b is the y-intercept and m is the slope - often explained as the
"rise-over-run» component of the fine formula. For a one-unit increase
([Un) in X. m is tbe corresponding amount of rise in Y (or falU.n Y, if
m is negatiye). Together these two elements (m and b) are desgibed as
the line's parameters. 1 You may remember exercises from junior high or
high school math classes in which you were given the values of m and b
and then asked to draw the resulting line on graph papero Once we know
these two parameters for a line, we can draw that line across any range of
Xvalues. 2
11). a twa-variable regre&Qon model. we ~sent the y-intercept parameter by the Greek letter alpha (IX) and the slope parameter by the Greek
letter beta (l3!.3 As foreshadowed by all of our other discussions of variables, Y is the dependent variable and X is the independent variable. Our
theory about the underlying population in which we are interested is expressed in the population regression model:
Y; =

9.2 Fitting a Line: Population

t.

( sample regression model: Y; = &. +

tbe variables (X and Y in tbis case) vary.
want to complete Exercise 1 at tbe énd of tbis cbapter before you continue reading.
3 Different textbooks on regression use sligbtly differenr notation for tbese parameters, so

it is important not to assume tbat aJI textbooks use tbe same notation wben comparing
across tbem.
.

~ Xi + ti;.

Note that, in the sample regression model, ex, /3, and U¡ get hats, but
Y;, and Xi do noto This is because Y; and X; are values for cases in the
population that ended up in the sample. As such, Y; and X; are values are
measured rather than estimated. We use them to estimate IX, /3, and the ti;
\ values. The values that define the line are the estimated systematic components of Y. For each X yaluee we use & and to calculate the predictcd
v~e of Y¡, which we calI Y¡. where

6



= &.+ ~X¡_

This can also be written in terms of eXQectations,
E(YIX;) =

Y¡ = &+ ~X¡,

which meaDS tba r rbe expecred val lle of y ¡:iyen X; (or Y¡l is egual to our
formula for the two-varia
'on ¡¡ne So we can now talk about
each Y; as having aD estimateº systematjc component, te and an.estimated
stochastic component, Uj. We can thus write our model as

Note that in this model there is one additional component, U¡, that does
not correspond with what we are used to seeing in line f~rmi7lae from
geometry classes. This term is the stochastie or "random" eomponent of oyr
.dependent variabk. W$ hay; this term beeause we do not expeet all oLour
data points to line up perfeedy on a straight lineo This eorresponds dirc:.ctly
';;h our discussion in earlier chapters about the probabilistic (as opposed to
deterministic) nature of causal theories about polítical phenom~na. We are,

2 If tbis is not familiar to you, or if rou merely want to refresb your memory, you may

Sample

after all, trying to explain processes that involve human behavior. Because
human beings are complex, there is bound to be a fair amount of random
noise in our measures of their behavior. Thus we think about the values
of about our dependent variable Y¡ as having a systematic component,
IX + /3 Xi, and a stochastic component, U¡.
As we have discussed, we rarely work with population data. Instead,
we usesample data to make inferences about the
Iying population of
interest. n two-vana e regression, we use information fr.QD1 the sample
regression model to make inferences about t
een o lation regression
mo e. o distinguish between these two, we place hats (') over tyms in
the sample regression model that are estim s of terms from the unseen
p2l'~ ation regression mo e.:... ecause they have hats, we can describe &.
and as being parameter estima~. These terms are our best guesses of the
unseen population parameters IX and /3:

IX+ /3Xi + U¡ •.

. 1 The term "parameter H is a synonym for "boundary" witb a more matbematical connotation. In tbe description of a line, tbe parameters (m and b in tbis case) are lixed wbereas

{:>

Y; =

Y; +tt¡,

and we can rewrite this in terms oE Ui to get a better understanding of the
stochastic component:

e~mated

J

U¡ = Y¡ - Y¡ .

From this formula, 1Me can see that th!; estjmate~chastic component
(u¡) is e
.
between the actual value of the de-;endent
variable (YJ and (be predicted value of the depen eot va,rj;:¡;le from our
tw~odel. Another name for the estimated stochastic
component is the re~ "Residual" is another word for "leEtover," and

162

Bivariate Regression Models

163

o

<O



'"

Ol

<ti



'E

'~"

~f5

~



>..




t::
<ti

a.

1:

"'o

~v





••
••



9.3 Which Une Fits Best?

A: Y¡=50.21+1.15X,

•••••

•••
• • •• •



.

------------.
.-.' .......



~--------~-------r--------r-------.--------.--.J

-15

-10
-5
O
5
Percentage Change In Real GDP Per Capita

.-.'



I

• ••• ~:.~~=51.86+rXi

C: ~;=52.01+.25Xi

_1L-------e•



g

10

Figure 9.1. Scatter plot of change in GDP and incumbent-party vote share.

this is appropriate, beca use u¡ is the leftover part of Y; after we have drawn
which follows
the line defined by Y¡ = &. + ~ X¡. Another wa to refer to
Because
from the formula u¡ = y; - Y¡, is to call it th am le error te
U¡ IS an estima te of UI, a corresponding way of referring to U¡ is to~call it
die populatioQ error terro.,...
A.

.i'

....

_.r"

• •




g



_ - - - - : : -..."'#

.............................

::J

o



.... -.



E







••

WHICH LINE FITS BEST? ESTIMATING THE REGRESSION LINE

Consider the scatter plot of data in Figure 9.l. Our task is to draw a straig~
line 4 that describes the relationship between our inde endent vanable
an our. epen ent vana e
ow do we draw our ¡¡ne? We clearly want
o the case in our scatter
to raw a ine that come as close as o '
p ot of data. Because the data have a general pattern from lower-Ieft to
upper-;¡g¡;,-we know that our slope will be posi!!,.ve.
In Figure 9.2 we have drawn three lines with ositive o es -Iabeled
A, B, and C - through the scatter plot o growth and vote and wnt en e
corresponding parametric formula above each line on the right-hand side
of the figure. So, how do we deci
line "best" fits the data that we
. and Y; value!1 Because we are interested in
see in our scatter p ot o
expraining our dependent variable, we want our residual value~, U¡, which
are vertical distances between each Y; and the corresponding Y¡, to be as

-15

-10

-5
O
5
10
Percentaga Changa In Real GDP Per Capita

15

Figure 9.2. Three possible lines.

small as possible. But, beca use these vertical distances come in both positive
and negative values, we cannot ¡ust add them up for each line and ha ve a
good summary of the "fit" between each line and our data. s
So we need a method of assessing the fit of each line in which the
positive and negative residuals do not cancel each other out. One po~bility
is to add together the absolute valuc;.gf tb e r8sH18ls foc each 1iI!P:

-

Another possibiÍity is to add together the sguared value of each t~siduals
for each 19¡e:

;=1

With either choice we want to choose the line that
st total
va ue. Table 9.1 presents these calcúlations for t e three lines in Figure 9.2.
'--From both calculations, we can see that line B does a better ¡oh of fitting
the data than lines A and C. AlthQugh the absolute-value calculation is
¡ust as valid as the squared residual calculation, statisticians have tended to
p~fer the ¡atter (both methods identify the same line as being "besO. Thus
S Inirially, we mighr think rhar we would want ro minimize rhe sum of our residuals. Bur

4

By "srraighr line," we mean a line wirh a single slope rhat does nor change as we move
{rom leJr ro righr in our figure.

the line thar minimizes the sum of the residuals is actually a flat fine parallel ro rhe x·axis.
Such a fine does not help us ro explain rhe relarionship berween X and Y.

165

Bivariate Regression Models

164

9.4 Uncertainty and the Regression Line

Table 9.1. Measures of total residuals for three
different lines
Line'

i

l
'11
'$

A
B

~~

el

e

)

o<D

',Parametric formula

el

=

=

lil.46
,
128.61'
,
7 .69

n

'

1028.11
736.46
891.70

!:?
Q)

Q.

-:i
;:').

".:i!

,

we draw a line that minimizes the sum of the squared residuals I:7=1 Uf· This
techni ue for estimatin the arameters of a regression morlel is known as
ordmary least-squares (OLS egression. For a ~o-variable OLS regression,
the ormu ae
e parameter estima tes of the line that meet this criterion
aré
A

13 =

I:7=1 (Xi - x)(Y; - Y)
I:7=1(Xi - X)2
'

&= y- ~x.



~
.,

.¡.-

Y 50.21 + 1.,15)4
Y == 51.86 + 0.65X¡
y 52.01 + 0,25X¡



Q)

~

>

~



<11
Q.

-E



:>

o



.E

o

'"



-15



••

I
I
I
I

••

: ..
l··
..
I

~

o

lO

Q)
.o ov
E



I
I
I

-.~-------------

,.





~

I
I
I
I
I
I
I
I
I
I
I

-10

O
5
-5
Percentage Change in Real GOP Per Capita

10

Figure 9.3. OLS regression line through scatter pIot with mean-delimited quadrants.

p,

)

/

If we examine the formula for
we can see that the numerator ~ the
same as the numerator for calculating the covariance be
Y.
T us the logic of ow eac case contributes to this formula, as displayed
in Figure 9.2, is the same. The denominator in tbe formula for ~ is me sum
of squared deviations of the . values from the mean value of X
. Thus,
for a glven covariance between X and Y, the more (less) spread out X is,
the less (more) steep the estimated slope of the regression lineo
One of the mathematical propertjes DE OI.S regression is that the line
produced by the parameter estimate¡ soPS tbrough the sample mean values
óf X and Y. This makes tbe estimarian af g fairly simple If we start out at
the point defined by tbe mean yal lle of X and tbe mean yalue of Y and then
use the estimated slope ,Bl to draw a line. the value of X where Yequals
zero is &. Figure 9.3 shows the OLS regression line thwugh the scatter plot
of data. We can see from this figure that the OLS regression line passes
Fough the point where the line depicting the mea~ of X meets the
line depicting the mean valuef i
Using the data presented in Table 8.11 in the preceding formulae, we
have ealculated & = 51.86 and ~ = 0.65, making our sample regression
line formula Y = 51.86 + 0.65X. If we think about what this. tells us about
polities, we first need to remember that Y is the incumbent party's share
f the major-party vote and X is the real per capita growth in GDP. So, if

~

6 The formulae for OLS pararneter estimates come from setting the sum of squared residuaIs

equal t~ zero and using differential calculus to solve for the values of ~ and Oc.

MEASURING OUR UNCERTAINTY ABOUT THE OLS
REGRESSION LINE

As we have seen in Chapters 7 and 8, inferences about the underlying
population of interest from sample data are made with varying degrees of
uncertainty. In Chapter 8 we diseussed the role of p-values in expressing this
uncertainty. With an OLS regression model, we have several different ways
in which to measure our uncertainty. We discuss these meas tires in terms of
the overall fit between X and Y first and then discuss the uneertainty about
individual parameters. Our uncertainty about individual parameters is used

166

167

Bivariate Regression Models

~~ Goodness-of-Fit: Root Mean-Squared Error

• reg VOTE GROWTH

Source

SS

dI

MS

Model

406.285326

1 406.285326

Residual

736.446506

30 24.5482169

VOTE
GROWTH
_Cons

1142.73183

Coel.

31

36.8623172

sld. Err.

0.6535869 .1606563
51.85977

.8816524

Number 01 obs

=

F( 1,

= 16.55

30)

Prob> F

.,

Tolal

32

=0.0003

R-squared

=0.3555

Adj R-squared

=0.3341

RoolMSE

=4.9546

root MSE =

I

p>111

4.07

0.000

.325483

.9816909

58.82

0.000

50.05919

53.66034

(95% Con!. Interval]

Figure 9.4. Stata results for two-variable regression model of VOTE = oc + f3 x

GROWfH.

!
!
f

t!.-

j

I
1
~

¡

f
~

f

/

9.4 Uncertainty and the Regression Une

in the testing of our hypotheses. Throughout this discussion, we refer to
our example of. fitting a regression line to our data on U.S. presidential
elections in ord~r to test the theory of economic voting. Numerical results
from Stata for "this model are displayed in Figure 9.4. These numerical
results can be partitioned into three separate areas. The table in the upperleft comer of Figure 9.4 gives us mea sures of the variation in our mode!.
The set oí statistics listed in the upper-right comer of Figure 9.4 gives
us a set summary statistics about the entire mode!. Across the bottom
Igure 9.4 we et atable of statistics on the model's arameter estimates.
T e name of the dependent variable. "VOTE," is displayed at the top
of this table. Underneath we see the name of OUt independent.A'ariable,
"GROWTH," and "cons," which is short fur "constant" (another name
for the y-intercept ten~), which we also know a~ Moving to the right in
the table at the bottom of Figure 9.4, we see that t~ column h~g
here is "CoeS" which is short for "coefficient" hich is ~e for
p¿rameter estima te. In this column we see the values ~ a ~ which are
0.65 and 51.86 when we round these results to the second decimal place?

-

7 The choice of how many decimal places to report should be decided based on the value of

the dependent variable. ,In this case,.because our dependentvariable is a vote percentage,
we have chosen the second decimal place. Political scientists usually do not report e1ection
resul.ts beyond the /irst two decimal places.

j E?:t u~ .

The squaring and then taking the square root of the quantities in this
formula are done to adjust for the fact that sorne of our residuals will be
positive (points for which Y; is above the regression line) and sorne will be
negative (points for which Y; is be!ow the regression line). Once we realize
this, we can see that this statjstic js basically tbe average distance between
the data points 2nd t be regressiou Une,
From the numeric results depicted in Figure 9.4, we can see that the root
MSE for our two-variable mode! of incumbeot-party vote is 4.95. Tb¡;;;¡;;
is found on the sixth line of the column of results on the right-hand side of
'Figure 9.4. It indicates thát on aveÍ'a e ur mode! is off b
. ts in
pre icting the percentage of the inctimbent party's shap: of tbe maio¡;party
vote. It is worth emphasizin~ tbar tbe root MSE is always expressed in
terms of the metric in which the dependent variable is measured. The only
.( reason why this particular value corresponds to a percentage is beca use the
metric of the dependent variable is vote percentage.
, "9:4.2';
__ ,.,).

G 00d
'
ness-of-Fit: R-Squared
Statistic

An~th.er PQ~ular ind~cator DE t be model's goodness-o~the R-sguar;9
stat1stI~ (typlcalIy wntten as R2). The

R2

statistic ranges between zero and
one, indicating the proportion of the variation in the dependent variable
that is accounted for by the mode!. The basic idea of the R2 statistic is
shown in Figure 9.5, which is a Venn diagrarn depiction of variation in
X and Y as welI as covariation between X and Y. The idea behind this
diagram is that we are depicting variation in each variable with a circle.
The larger the circle, the larger the variation. In this figure, the variation
in Y consists of two areas, a and b, and variation in X consists of areas b
and C. Area a represents yariariau iu y tbat is uot related t~tion in
X, and area b represents cayarjatiou between X and Y. In a two-yarja.ble
regression model, area a is tbe residual or stochastjc variation in Y. The

s:

I

I
1

!

168

169

Bivarlate Regression Models

9.4 Uncertainty and the Regression Line

From the numeric results depicted in Figure 9.4, we can see tbat the R3.
statistic for our two-variable model of incumbent-party vote is .355 . .Ihis
number appears on tbe fourtb líne of tbe column of results on th.c...righth:.nd side ofFigure 9.4. I!.indicates that our model accounts for about l6%
of the variation in tbe dependent variable,. We can also see in Figure 9.4
tbe values for the MSS, RSS, and TSS that are presented under tbe column
labeled "SS" in the table in tbe upper-Ieft-band comer of Figure 9.4.

y

I~ ls That a "Good" Goodness-of-Fit?

Figure 9.5. Venn diagram of variance and covariance for X and Y.

statistic is equal to area b over tbe total variation in Y, wbicb is ,equal to
the sum oE areas a and b. Tbus smaller values oE area a and larger values
or-a;ea b lead to a lar er R2 statistic. The formula Eor total variation in
Y areas a and b in Figure 9.5), also known as tbe total sum oE squares
(TSS), is
~

n
~

- 2
TSS = L.J(Y; - Y)
.
;=1

Tbe formula for tbe residual variation in Y, area a tbat is not ag;ounted for
by X, called the residual sUm of squares (RSS)...is
n

RSS=

Lur.
;=1

Once we bave these two quantities, we can calculate tbe R2 statistic as
RSS
'
TSS

2

R =l-

Tbe formula for the other part of TSS tbat is not the RSS, called tbe model
sum of squares (MSS),!!
n
~

A

-

2

MSS = L.J(Y; - Y) .
;=1

'!Bis can also be IIsed ro calculate R2 as.
2

A logical question to ask wbenwe see a measure of a model's goodness-offit is "Wbat is a good or bad value tor tbe roor MSE anel/oc R2 ?" This is not
an easr question to answer. In part, tbe answer depends on what you are
trying to do witb tbe modelo ti you are trying to predict electioI!J)utcomes,
~ring that you can predict tbe outcome with a l'lpjcal error of 4.95 .may
not seem very good. After all. most presideotjal e1ectjons are fairly clase
and, in the scheme oftbjngs 4.95'% is a lot of votes. In fact; wecan see that,
in 12 of the 32 elections that we are looking at. the winning ma~gin was
IeSs than 4.95%, making over one-third of our sample of elections too clase
to call with this modeL8 On tbe other band, looking at tbis anotber W.ilY,
we can say tbat we are able to come tbjs dose aud, ju terms of R2, explain
almost 36% ofthe variation in incumbent vote from 1880 to 2004 witl:Uust
one measure of the economy. When we start to think of aH oE the different
campaign strategies, personalities, scandals, wars, and everytbing el se that
IS not in tbis simple model, this level of accuracy is rather impressive. In
fact, we would suggest tbat this tells us sometbing pretty remarkable about
polítics in the United States - the economy is massively important.

i~¡~1A Uncertainty about Individual Components of the Sample
Regression Model
Before we go through this subsection, we want to warn you that tbere are
a lot of formulae in it. To use a familiar metapbor, as you go tbroug,h the
irum.l!lae in this subsection it is important to focys on the contours of the
forest and not to get caught up in the details of the many tre~s that we wiU
see along tbe wav. Instead of memorizing each formula, concentra te on
what makes the overall values generated by these equations larger or
smalJer.
A crucial part oí tbe uocertainty in OLS regression models is tbe degree
of u~rtainty about individual estimates of population paramet~ values

I

MSS

~ = TSS'

8

We can get this by calculating VOTE (100 - VOTE) for the values in Table 8.11.

170

171

Bivariate Regression Models

from the sample regression mockl. We can use the same logic that we discussed l1l Chapter 7 for making inferences from sample values about population values for each of the individual parameters in a sample regression
mode!.
One estimate that factors into the calculations of our uncertain a out
each o t e population parameters is the estimated yariance of the population stochastic component, u¡. This unseen variance, a 2 , is estimated from
the residuals (Uj) after the parameters for the sample regression model ha ve
been estimated by the following form!!la:
-2

",n

a =

varianon we have in X, the more recisely we will be ble to estima te the
relationship etween X and Y.
The variance and standard erinrs fur the intercept parameter estima te
(<X) are then estimated from the fohowing formulae:

se(&) = Jvar(&) =

-2

L-j-1U¡

'!be logi c fgr taking apart the cowponpnts oC these formulae js slightly
more complicated beca use we can !lee that the sum of the X; vajues squared
appears in the numerator. We can see, however, that the denominator
contains the measure of the varia non of the X¡ values around their mean
{Xl multiplied by n, the number oí cases Thus the same basic logic holds
for these terms: The larger the tij values are, the larger will be the variance
and standard error oí [he intercept parameter estimate; and the larger the
variation of the X¡ values around thejr mean, the smaller will be the variance
and standard error of the intercept parameter estima te.
Less obvious - but neverthe!ess true - from the preceding formulae is
the fact that larger sample sjzes wilf also produce smaller standard errru:s. 9
$0, just as w;-learned about the effects of sample size when calculating the
standard error oí the mean in Chapter 7, there is a n identjcal effect hers,
Larger sample sizes \Viii, other things bein~ eQual, produce smaller sta n daJ;d
errors of our estimated regression CQefficjents.

n-2 .

Looking at this formula, we can see two components that playa role in
determining the magnitude of this estimate. The first component comes
from the individual residual values lU j l. Remember that these values (calY¡ Y;l are the vertical distance berween each observed Y¡
culated as Uj
value and the regression lins. The larger these values are, the further the
individual cases are from the regression lineo The second component of this
formula comes from n, the sample size. By now, you should be familiar
with the idea that the larger the sample size. the smaller the variance of the
estimate. This is the case with our formula for {¡2.
Once we have estimated (¡2, the variance and s
d errors for
the sope parameter estimate ((3) are then estimated from the following
formulae:
-2

a
var(l3) = "'~ (X _ X)- 2'
L-,=l

9.4 Uncertainty and the Regression Une

,

Both of these formulae can be broken into twO components that determine
their magnitude. In the numerator, we find {¡ values. $0 the larger these
values are, the larger will be the variance and standard error of the slope
Earameter estimate. This makes sense, beca use the farther the points represcnting our data are from the regression line, the less confidence we will
have in the value ol the slope. If we look at the denominator in this equation, we see the term L;~l (X¡ - X)2, which is a measure of the variation of
the X; values around their mean (Xl. The greater this variation, the smaller
will be the variance and standard error of the slope parameter estimate.
This is an important property; in real-world terms it means that the more

;" 9:4~'5'1

Confidence Intervals about Parameter Estimates

In Chapter 7 we discussed how we me tbe pormal distrjbnt;on (supported
by the centrallimit theorem) to estimate CQnfidence iD11'fva's tor the unseen
E.!?pulation mean from sample data. We go through the same logical steps to
estimate confidence intervals for the unseen parameters from the population
regression mode! by using the results from the sample regression mode!. The
formulae for estimating confidence intervals;ue

~ ± [t x se(~)],
& ± [t x se(&)],
9

Ir is rrue beca use rhe numeraror of rhe expression conrains ü, which, as seen furrher
previously, has rhe sample size n in ¡rs denominaror.

172

Bivariate Regression Models

where the value for t is determined from the t-table in ApRendix B. So, for
instance1 if we want to calculate a 95% confidence intervaI,l° this means
that we are looking down the column for 0.025. Once we have.determined
the appropriate column, we select our row based 0n tbe nllmber of degrees
oí freedom. The number oí de~rees oí íreedom íor this t-test is equal to
the number of observations (n) minus the number of parameters estimated
(k). In the case of the regression model presented in Figure 9.4. n = 32 and
k - 2, so our degrees of freedom egual 30. Looking down the column for
0.025 and across the row for 30. we can see that t - 2.042. Thus our 95%
confidence intervals are

~ ± [t x se(~)] = 0.654 ± (2.042 x 0.161) = 0.325 to 0.982,
&. ± [t x se(&.)] = 51.86 ± (2.042 x 0.88) = 50.06 to 53.66.

173

9.4 Uncertainty and the Regression Line

These types of tests are either one or two tailed. Most statistkal computer programs report the results from two-tailed hypothesis tests that the
parameter in question is not egual to zero. Despite this, many political science theories are more appropriately translated into one-tailed hypothesis
tests, which are sorne times referred to as "directional" hypothesis tests.
We review both types of hypothesis tests with the example regression from
Figure 9.4.
~W-~):."~H.

tti§;!~?,a Two-Tailed Hypothesis Tests

The most common form oí statistical hypothesis tests about the parameters
from an OLS regression mode! is a two-tailed hypothesis test that the slope
parameter is egual to zero. Ir is expressed as

&:/3 = O,

These values are displayed in the lower right-hand corner of the table at
the bottom of Figure 9.4.

E9:..7!.§J

-\f.

H\

I

Hypothesis Testing: Overview

The traditional approach to hypothesis testing with OL5 regression is that
we specify a null hypothesis and an alternative hY}?othesis and then comPare the t\\(O. Although we can test hypotheses about either the §lppe or the
intercept parameter we are usuall more concerned with tests a out the
sope parametg,. In particular, we are usually concerned with testing the
zer. The lo~
hypothesis that the o ulation slo e aramet r is ua
of t is hypothesis test corresponds closely with the logic oí the bjvariate hypothesis tests introduced in Chapter 8. We observe a sample slope
para meter, which is an estimate oí the population slope. Then. from the
value of this parameter estimate, tbe con6dence interyaJ a[Qund it , a nd the
size of our sample, we evaluate how likely it is that we observe thiuampie slope ií the true but unobseryéd popuJarion slope is egual to zerQ. If the
answer is "ycey likely." tben we conclude tbar the population slQpe.is egual
to zero.
T o understand why we so often focus on a slope value of zero, think
about what this corresponds to in the formula for a lineo Remember.that
the slope is the change in Y from a one-unit inerease in X. If.!hat change is
egual to zero, then thefe is nQ coyariation between X and Y, and we have
failed to clear our third causal hurdle.

H1:/3#0,
where Ho is the nul! hxpothesis and H 1 is the alternative hypothe.sis. Note
that these two rival hypotheses are expressed in termLof the slope parameter from the population regression mode!. To test which of these two
hypotheses is supporred, we calculate a t-ratio in which (3 is set egual to the
value speci6ed in the nul! hypothesis (in tbis case zera beca!Jse /:10-: (3 = O),
which we represent as Ir:
t,,-k=

~ - /3*
se(/3)

--A-'

For the slope para meter in the two-variable regression mode! presentcd
in Figure 9.4, we can calculate this as
t30

=

~ - /3*
0.654 - O
se(~) = 0.161 = 4.06.

From what we have se en in previous chapters, we can teH that this t-ratio
is quite large. Remember tbat a typkal standard fo c statistical significance
in the social sciences is when the p-value is less than .05. If we look across
the row for degrees oí Ireedom egual to 30 in Appendix B, we can see
that, to have a p-value oI less than .05. we would need a (-ratio oI 2.042
or larger. We clearly have exceeded this standard. 11 In fact, if we ¡ook at
the far-right-hand column in Appendix B for 30 degrees of freedom, we
can see that this t-ratio exceeds the value Ior t needed for /l to be less than
.002. This means that it is extreme!y unlikely that 1jJ is the case,-which

10 T o understand this, think back to Chapter 7, where we introduced confidence intervals. A

95% confidence interval would mean that would leave a total oE 5% in the tails. Because
there are two tails, we are going to U$C the 0.025 column.

11 Because this is a two-tailed hypothesis test, for the standard of p < .05 we need to look
down the column labeled ".025."

Hivariate Regression Models

1'/4

175

in turn gceatly incceases ouc confidence in Ji, If we look at the table at
the bottom oE Figuce 9.4, we can see that the t-ratio and resulting p-yalue
foc this hypothesis test are presented in the fourth and f i n
GRO
H row. It is worth noting that although the reported p-yalue is
.000 this does not mean that the probability of the null hypothesis being
the ~ase is actually egual to zero. Instead, this means that it is a very
small number that gets rounded to zera when we report ir to tbte' dec;i¡nal
~aces.
.
The exact same logic is used to test hypotheses about !he y-mteccept
para~ter. 1he formula tor thls t-ratlo is

-

---

tn-k

=

. fcom this ta~le, we can reject the null hypothesis (110: a. = 50) with a fair
Jamount of confidence. .
~j

In Figure 9.4 we see the calculation for the following null hypothesis and
alternative:
11o:a.=O,

o.

The resulting t-ratia is a who1212ing 58.82! This makes sense when we think
ahout this quantity in real-world terms. Remember that the y-intercept
is the expected value of the dependent variable y when the independent
variable X is eqnal ro zera In oyc model. this means we waoUo know
the expected value of incumbent-party vote when gr?wth eguals zero: Even
when the economy is shrinking, there are always gom to be s e dlehard
¡zartisans who will vote or t e incumbent paro/. Thus it. makes sense that
the nuB hypothesis 110: a. - O wonld be ,,[etty e~s~ to re¡.ect. .
Perhaps a more interesting null hypothesls IS that the mcumben~s
would still obtain 50% oE the vote if growth were equal to zero. In thls
case,
11o:a. = 50,
HI:a. :f; 50.
The corresponding t-ratio is calculated as
_ Be.- a.* _ 51.86 - 50 = 2.1l.
se(Be.) 0.88

t30 -

1

Looking at the row for degrees of freedom equal to 30, we can see that
this t-ratio is just larger than 2.042, which is the value for p < .05 but not
as large as the 2.457 value for p < .02. With a more detailed Hable or a
computer, we could calculate the exact p-value for this hypothesis test. But,

The Relationship between Confidence Intervals and Two-Tailed
Hypothesis Tests

In the previous thcee suhsections, we introduced confidence intervals and
hypothesis tests as two of the ways for making inferences about the parameters pi the population regression model from our samp1e regression
model. These two methods for making inferences are mathematically celated to each other. We can teH this because they each rely on the Hable.
The relationship between the two is such that. if the 95% confideru::e interval does not inelude a pacticular value, then the null hypothesis that
the population parameter equals that value (a two-tailed hypothesis test)
wtlI have a p-value smallec than .05. We can see this foc each of the thcee
nypothesis tests that we discussed in the section on two-tailed hypothesis
tests:

Be. - a.*
se(Be.) •

H¡:a.:f;

9.4 Uncertainty and tIle Regression Line

• ~ecause the 95% coofidence interval for Ol!r §lgp~eter does not
!nelude O, the p-value for the hypothesis test that B - O is less than

.Q}.

• Because the 95% cOAHeleRse iRte",?! (or 0llt i~i!Bf:ter does
not inelude O, the p-value foc the hypothesis test that a. - O is less than

.!ll.
• Bec~use the 95% con6dence int:cval for ouc ~<;.gJ ~ny;,!!J; does
~t melude 50, the p-value foc rhe hypothesis test that
.

t~5.

g.".-

50 is less

;1

.~~
..

t9;'Ú'¡

One-Tailed Hypothesis Tests

As we pointed out in pcevious sections, the most common form of starisrical
,hypothesis tests about the pacameters {rom an OLS regressjon QlSldel is a
two-tailed hypothesis test that the slope parameter is equa I ro zero. That this
is the most common test is somewhat of a fIuke. By default, most statistical
~omputer pro rams ce oct the r ults of this hypothesis test. In cealitJ,
though, most political s .
t S
re t
etec is either
~sitive or negative and not just that the parameter is different fcom zeco.
This is what we call a directionaI hypothesis. Consider, Eor instance, the
theocy oE economic voting and how we would tcanslate it into a hypothesis
about the slope parameter in our current example. Our theory is that the
better the economy is performing, the higher will be the vote peccentage

!

I

I
I
1

177

Bivariate Regresslon Models

176

for the incumbent-party candidate. In other words, ~e expect to see a
positive relationship between ecor¡.omic growth and the incumbent-party
'vote percentage, meanmg that we ex ect {3 to be reater than zero.
W en our t eory eads to such a directional hypothesis, it is exg.ressed
as
~

Ho:{3

O,

Hl:{3 > O,

~ - (3*
se({3)

t,,-k= --A-'

For the slope parameter in the two-variable regression model presented in
Figure 9.4, we can calculate this as
t30

= ~ - j3* = 0.654 - O = 4.07.
se({3)

\

0.161

Do these calculations look familiar to you? They should, because this ...
t-ratio is calculated exactly the same way that the Natio fur the twO;:Sided
.hypothesis about this parameter was calculated. The difference comes in
how we use the Hable in Appendix B to arrive at the appropriate p-value
for this hypothesis test. Because this is a one-tailed hypothesis test, we use
the column labeled ".05" instead of the column labeled ".025" to assess
whether we ha ve achieved a t-ratio such that p < .05. In other words, we
would need a t-ratio of only 1.697 to achieve this level of significance for
a one-tailed hypothesis test. For a two-tailed hypothesis test, we needed a
t-ratio of 2.047.
We can see from this example and from the Hable that, when we
have a directional hypothesis, we' can more easily reject a null hypothesis.
Qne of the quirks of polítical science research is that, even when they h.ave
directional hypotheses. researchers often repon the resuIts of two-taIled
hypothesis tests._
&: 13 ~ o because this is the critical value for
the nuU hypothesis. Under this nuU hypothesis, zero is the threshold, and evidence that 13
is eq~al to any value less than or equal to zero is supportive of this null hypothesis.

12 We choose O when the nuU hypothesis is

9.5 OLS Assumptions

Mi' ASSUMPTIONS, MORE ASSUMPTIONS, AND MINIMAL
MATHEMATICAL REQUlREMENTS

If assumptions were water, you'd need an umbrella right now. Any time
that you estimate a regression model ou are im licid makin a large
set o assumptlons a out the unseen population model. In this section, we
break these assumptlOns mto assumptions about thepopulation stochastic
component and assumptions about. our model specification. In addition,
there are sorne minimal mathematical requirements that must be met befQre
a regression model can be estimated.; In this final section we list these
assumptions and requirements and briefly discuss them as they apply to
our working example of a two-variable regression model of the impact of
economic growth on incumbent-party vote. We will provide more elaborate
discussions in later chapters.
E[5~ Assumptions about the Population Stochastic Component

The most ;roportant assumptions about the population stochastic component U¡ are about its distribution. These can be summarize.d as

which means that we assume that U¡ is distributed normallyj"-' N)_...~!!h
the mean equal to zero and the variance 13 egual to a~. This compact
mathematical statement contains three oí the five assumptions that we
make about the population stochastic component any time we estimate a
regression rogde! We nQW go gver ea ch Qne se.parately.



ls Normally Distribu.1ed

The assumption that U¡ is normall distributed allows us to use the t-table to
make pro abilistic inferences about the population regression model from
the sample regression modek The main justification for this assumption is
the centrallímit theorem that we discussed in Chapter 7.
E(u;)

= O: ~o Bia;!.

The assumption that U¡ has a mean or expected value egual to zero is also
known as the assumption of zero bias. Consider what it would mean if
13 Strict1y speaking we do not need to make aU of these assumptions to estimate the param-

etees of an OlS model. But we do nced to makc al1 of these assumptions to interpret the
results from an OlS model in the standard fashion.

178

Bivariate Regression Models

179

9.5 OLS Assumptions

there w
r which E u¡
O. In other word
is would be a case
, or which we would expect our regression mode! to be off. If we have cases
Tike this, we would essentialIy be ignoring SQrne tbeoretjcal insjght that we
have aboirt the underlying causes of Y. Rernember, this term is supposed to
l)e random. If E(u¡) # O, then there must be sorne nonrandom cornponent
\ ro this termo It is important to note here that we do not expect al/ 9i our
tI¡ values to egual zero because we know that sorne of our cases will faH
áf)ove and below the
re sion lineo But i assum tion means h our
est guess or expected value for each individual u¡ value is zero ..
If we think about the example inthis chapter, this assumptlon means
that we do not have any particular cases for which we expect our mode!,
with economic growth as the independent variable. to overpredict or underPredict the value of tbe incumbent-party vote percentage in any particular
~Iection. If, on the other hand, we had sorne expectation along these lines,
~uld not be able to make this assumption. Say, for instance, that
we expected that during times of war the incumbent party would fare
'better than we would expect them to fare based on the economy. Un] der these circumstances, we would not be a~le to make this ~ssumption.
The soludon to this problem would be to mclude another mdependent
variable in our mode! that measured whether or not the nation was at
war at the time of each e!ection. Once we control for aH such potential
sources of bias, we can fee! comfortable making this assumption. The inc1usion of additional independent vadables is the main subject covered in ,
)
Chapter 10.

independent variable is economic perfQ¡;rnance Under these circumstances,
tríe assumption of homoscedasticity would not be reasonable.
No Autocorrelation

l

We aIso assume that there is no autocorrelatioD. Autocorrelation occurs
when the stochastjc terms for any two or more cases are Sfstematically
re!ated to each other.> Th~ clearly cuts against the grain of the idea that
t~se terms are stochastic or randQm. FormalIy, we express this assumption
as
COVUi • Uj

= O Vi # ;;

in words, this means that the covariance between the population error terms
and u i is equal to zero for aH i not equaI to j (for any two unique cases).
• The most common form of autocorreIation Occurs in regressioo mode!s
of time-series data. As we discussed in Chapter 4, time-series data involve
measurement of the re!evant variables across time for a single spatial unit.
In our example for this chapter, we are using mea sures of economic growth
and incumbent-party vote percentage measured every 4 years for the Uoited
States. n, tor sorne reason, the error terms for adjacent pairs of elections
were correIated, we would have autocorre!ati9n.


XValues Are Measured Without Error


At first, the assumption that X values are measured without error may
seem to be out of place in a Iisting pf assumptions about the population
stochastic component. But this assumption is made to greatly simplifyJ!lferences that we make about our population regression mode! from our
sample regression mode!. By assuming that X is measured without error.,
we are assuming that an variabilit from our re ression line is due to the
stoc astlc component tI¡ and not to measurement probIems in X. To put
it another way, if X also had a stochastic component, we would need to
mode! X before we could mode! Y, and that would substantiaHy complica te
matters.

Has Variance (12 : Homoscedasticity

The assumption that u¡. has variance equal to a 2 seems pretty straightfor;ard. But, beca use this notation for variance does not contain an i subscript, it means that the variance for eyery case in the underlying population
is assumed to be the same. The word for describing this situation is "hornoscedasticity," whichmeans "uniform error variance." Ii tbis as~ption
. does not hold, we have a situation in which the variance of U; is al known
a~teroscedasticity," which means "unegual error vaciance." When we
~ have heteroscedasticity, our regression mQde! fits sorne Qf tbe cases in the
population better than others. This can potentiaHy cause us problerns when
we are estimating confidence intervals and testins hypoth~es.
In our example for this chapter, this assumJ?tion would be violated
if, for some reason, sorne e!ections were harder than Qthers foLDur mode!
ro predict. In this case, our mode! would be heteroscedi!,stic. It.s:ould. fQr
'instance, be the case that e!ections that were he!d aftex politisaJ debates
became !e!evised are harder to predjct witb DI![ ronde! jn which tbC Gnly
o

,

1

t""

\

With just about any regression mode! that we estima te with real-world
data, we wilI like!y be pretty uncomfortable with this assumption. In the
example for this chapter, we are assuming that we have exactly correct
mea sures of the percentage change in real GDP per capita from 1880 to
2004. If we think a little more about this measure, we can think of all kinds
of potential errors in measurement. What about illegal economic activities
that are hard for the government to tneasure? Because this is per capita,

·1

180

181

Bivariate Regression Models

9.5 OLS Assumptions

-.>
-~

how confident are we that the denominator in this calculation, popul~on,
is measured exactly correctly?
Despite the obvious problems with this assumption, we make it every
time that we estimate an OLS mode!. Unless we move to considerably more
complicated statistical technigues. this is ao assumptioo Ibat we baye to
live with and keep in the back of our minds a
con ence In w at our mode!s tell uso
Recall from Chapter 5, when we discussed measuring our concepts
of interest, that we argued that measurement is important beca use, if we
mismeasure our variables, we may make incorrect causal inferences about
the real world. This assumption should make the important lessons of that
chapter crystal elear.

l

~~~~~~.2

'"'.:f.o

r,ary. In other words, the relationship between X and Y is tbe same.across
all values of K.
.
In the context of our current example, this means tbat we are assuming
that the impact of a one-unit increase in change in real GDP per capita is
the same. So moving from a value of -10 to -9 has the same effecr as
moving from a value of 1 to 2. In Chapter 11 we discuss sorne techniques
for relaxing tbis assumption.

~:~;~;~ Minimal Mathematical Requirements
For a two-variable regression model, we have two minimal requirements
tbat must be met by our sample data before we can estima te our parameters.
We will add to these requirements when we expand to multiple regression
models.

Assumptions about Our Model Specification

The assumptions about our model specification can be summarized as a
~ingle assumption that we haye the correct model specification. We break
Ihis into two separate assumptions to highlight the range of ways in wich
this assumption mighr be ,,¡glatcd.

XMustVary

Think about what tbe scatter plot of our sample data would loo k like if X
did nor vary. Basically, we would ha ve a stack of Y values at the same p~t
on the x-a~ The only reasonable line that we could draw tbrougb this set
oTpoints would be a straight line parallel to the y-axis. Remember rhar our
goal is to explain our dependent variable Y. Under these circumstances~
would have JalIea miserably because any y value would be just as good as
any other glven our single value of X. Thus we need so me variation in X
In order to estima te an OLS regression mode!.

No Causal Variables Left Out; No Noncausal Variables Included

This assumption means that if we specif our two-varia ble re ression
el
of the re tions
nnot be sorne otber variable
Z that also causes 14 Y. It also means that X must cause Y. In otber words,
tbis is just another way of saying tpat the sample regression model tbat we
.
have specified is tbe true underlying population regression mode!.
As we have gone througb the example in this chapter, we have already
begun to come up witb additional variables that we theorize to be causally
related to our dependent variable. To comfortably make tbis assumption,
we will need to inelude all such variables in our mode!. Adding additional
independent variables to our mode! is the subject of Chapter 10.

To estimate a regression model, the number oí cases (n) must exceed rhe
n~mber of parameters to be estimated (kl. Thus, as a minimum, when we
estima te a two-variable regression mode! witb two parameters (IX and (3)
we must have at least three cases.
j..:'1',,,

t

Parametric Linearity

The assumption of parametric linearity is a fancy way of saying tbat QJ.lL
population parameter !3 (or the re!ationship between X and Y does no,!

14

One exception to this is the very special case in which there is a Z variable that is causally
In this case, we would still be able to
related to Y but Z is uncorrelated with X and
get a reasonable estimate of the relationship between X and Y despite leaving Z out of
our model. More on this in Chapter 19.

"j.

.

Y'

"'~

9.5.4J How Can We Make AH of These Assumptions?

The mathematical requirements to estimate a regression mode! aren'r roo
severe, but a sensible question to ask at this point is, "How we can reasonably make al1 of the assumptions just listed every time that we run a
regression model?" To answer this question, we refer back to the discussion in Chapter 1 of the analogy between mode!s and maps. We know that
al! of our assumptions cannot possibly be meto We also know rbar we are
t~g ro simplify complex realities. The only way that we can do tbis is-to

182

-.>

Bivariate Regression Models

make a large set of unrealistic assumptions about the worJd. It is cruoal, though, that we never lose sight of the fact that we aremaking these
assumptions. In the next chapter we relax one of these mOjt unrealistic
assumptions made in the two-variable re
ontrolling for
a secon variable.2.

m

Multiple Regressiorl Models 1: The Basics

CONCEPTS INTRODUCED IN THIS CHAPTER

alternative hypothesis
directional hypothesis
ordinary least-squares
parameters
parameter estima tes
population error term
population regression model
residual

root mean-squared error
R-squared statistic
sample error term
sample regression model
statistical model
stochastic
t-ratio

EXERCISES

1. Draw an X-Y axis through the middle of a 10 x 10 grid. The point where the

X and Y lines intersect is known as the "origin" and is defined as the point at
which both X and Y are equal to zero. Draw each of the following lines across
the values of Xfrom -5 to 5 and write the corresponding regression equation:

(a) y-intercept = 2, slope = 2;
(b) y-intercept= -2, slope = 2;
(c) y-intercept = O, slope = -1;
(d) y-intereept = 2, slope= -2.
2. Estimate and interpret the results from a two-variable regression model by
using your own data. Try to check the calculations made by the computer by
using the formulae that we have presented in this chapter.
3. Think through the assumptions that you made when you carried out Exercise 2.

Which do you feelleast and most eomfortable making? Explain your answers.
4. In Exercise 8 for Chapter 8, you calculated a correlarion coefficient for the

relationship between two eontinuous variables. Now, estimate a two-variable
regression model for these same two variables. Produce atable of the results
and write about what this table tells you about polities in the UK in 2005.

OVERVIEW

Despite what we have leamed in the preceding chapters on hypothesis
testing and statistical significance, we have not yet crossed all four of our
hurdles for establishing causal relatÍonships. Recall that all oC the techniques
we have leamed in Chapters 8 and 9 are simply bivariate, X-and- Y-type
analyses. But, to fully assess whether X causes y, we need to control for
other possible causes oC y, which we have not yet done. In this chapter, ~
show how multiple regression - which is an extension of the two-variable
model we covered in Chapter 9 - does exactly that. We explicitly connect
the formulae that we include to thEi key issues oC research design that tie
the entire book together. We also discuss sorne oC the problems in_multiple
regression models when key causes oC the dependent variable are omitted,
which ties this chapter to the fundamental principIes presented in Chapo
ters 3 and 4.

4"1' MODELING MULTIVARIATE REALITY
From the very outset of this book, we have emphasized that almost all
interesting phenomena in social reality have more than one cause. And yet
most of our theories are simply bivariate in nature.
We have shown you (in Chapter 4) that there are distinct methods for
dealing with the na tu re of reality in our designs for social research. If we are
fortunate enou h to be able to condllet an experiment, then the p~of
randomly assigning our participants to treatment groups Wl autoroatically
'''control [or" those other possible causes that are not a part o[ our theory.
But in observational research - which represents the vast majority of politieal science research - there is no automatic control for the
other possible causes of om dependent variable; we have to ~rol for
them statistically. The main way tllat social scientists accomplish this is
183

184

185

Multiple Regression Models 1: The pasies

through multiple regression. The math in this mode! is an extension of
the math involved in the two-variable regression mode! you just learned in
Chapter 9.

M(,'I THE POPULATION REGRESSIqN FUNCTION
We can generalize the population regression model that we learned in
Chapter 9,
bivariate population regression mode!: Y; =

IX + /3 X; + Uj,

to inelude more than one systematic cause of Y, which we have been calling
Z throughout this book:
multiple population regression model: Y;

= IX + /31 X; + /324 + Uj.

The interpretation of the slope coefficients in the three-variable mode! is
similar to inter retin bivariate c
'ents with one ve ¡m ortant diff;;ence. In both, the coefficient in front of the variable X.ffi in the two-variable
--- /3 io the multlOle
.
' mode1) represeots t he ".
model,
re~resslpn
nse-over-run "
1
effect of X on Y. In the multiple ¡:egression case, though, (3, actually represents the effect of X on Y while holding copstant the effects of ~. If this
distinction sounds important, it is. We show how these differences arise in
the next section.

4',"

FROM TWO-VARIABLE TO MULTIPLE REGRESSION

RecalI from Chapter 9 that the formula for a two-variable regression line
(in a sample) is

And recall that, to understand the nature of the effect that X has on Y, the
estimated coefficient ~ tells us, on average, how many units of chaqge in Y
we should expect given a one \lAit ¡nerease in x. Ihe formula for /3 in the
two-variable model, as we learned in Chapter 9~s
A

/3 =

L?-I(X; - x)(Y; L?=1 (X; - X)2

Y)
.

10.3 From Two-Variable to Multiple Regression

a third dimension leaves the formula for aplane. And the formulª-.for that
plane is

That might seem deceptively simple. A formula representing aplane simply
adds the additional ~204 term to the formula for a line. 1
Pay attention to how the notation has changed. In the two-variable formula for a line, there were no numeric subscripts be!ow the /3 coefficientbecause, well, there was only one of them. But now we have two independent variables, X and Z, that he!p to explain the variation in Yl and
therefore we have two different coefficients /3, and so we subscriQt them
-/31 and /32 to be elear that the values of these two effects are different from
one another,z
The key message from this chapter is that, in the preceding, formula,
the coefficient 13, represents more than the effect of X on Y: in lhe 11'lUkiplc
regression formula, ir represents the effect of X on Y while controlJing for
the effect of Z. Simultaneously, the coefficient B2 represents the effeqpf Z
on Y while controlling for the effect of X. And in observationa research,
this is the key to crossmg our ourth causal hurdle that we introduced all
the way back in ChaRter 3.
How is it the case that the coefficient for /3, actually contr.Q1s-.for Z?
After all, /31 is not connected to Z in the formula; it is, quite ogyiously,
connected to X. The first thing to realize here is that the preceding multiple
regression formula for /31 is different from the two-variable formula for /3
from Chapter 9. (We'lI get to the formula shortly.) The key consequence
of this is that the value of 13 derived from the two-variable formula, representing the effect of X on Y, wiII almost always be different - perhaps
only trivialIy different, or perhaps wildly different - from the value of /31
derived from the multiple regression formula, representing the eHect of X
on Y while controlling for the effects of Z.
But how does /3 1 control for the effects of Z? Lees assume that X and .
Z are correlated. They need not be re!ated in a causal sense, and they ne.ed
1 AII of the subsequent math about adding one more independent variable, Z, generalizcs
quite easily to adding still more independent variables. We use the three-variable case for
ease of illustration.
2 In many other textbooks on regression analysis, just as we distinguish between PI and
/32' the authors choose [O la be! their independent variables XIz X 2 , and so forth. We ha ve
consistently used the notation of X, Y, and Z to emphasize the concept of controlling for
other variables while examining the relationship between an independent and a dependent
variable of theoretical interest. Therefore we will stick with this notation throughour this
chapter.

1

l
11

1i
1

j

1
1
¡
i

186

Multiple Regression Models 1: The Basies

187

nor be related strongly. They simply have to be related to one another that is, for this example, their covariance is not exactly egual to zero. Now,
assuming that they are related somehow, we can write their relationship
just like that of a two-variable regression model:

X;

= ~,./""
+ f3 Z; + e¡.
,..

¡

Note sorne notational differences here. Instead of he
and ~,
we are ca ing the estimated arameter Al nd Al 'ust so ou are aware
t ar their values will be different from the & and ~ estimates in..previous
equations. And note also that the residuals, which we labeled U¡ in previous
equations, are now laheled ej.here.
If we use Z to predict X, then the predicted yaJue oí X.(or X) hased
on Z is simply

f!

X¡=&+f3Z;,

A

I!
,

....

,Al

which is just the preceding eguation, hut without the error term, because
it is expected (on average) to he zero. Now we can ·u
itute the
le t- an SI e o t e preceding equatíon into the previous equation and get



= X; +e¡

or, equival~tly,



= X; -

Xi.

Thesc C¡, then, are the exact eguivalents of the residuals from the twovariable regression of Y on X that you learned from Chapter 9. So their
interpretation is identical, too. That being the case, the e¡ are the portion
of the variarion in X that Z cannot explain. (The portion of X that Z can
explain is the predicted portion - the X;.)
So what ha ve we done here? We have just documented the relationship
between Z and X and partitioned the variation in X ioto two parts - the
portion that Z can explain (the X) and the portion that Z cannot explain
(the e¡). Hold this thoucht.
We can do the exact same thing for the relationship between Z and Y
that we just did for the relationship between Z and X. The proces~illlook
quite similar, with a bit oE different notation to distinguish the processes.
So we can model Y as a function of Z in the followiug..way:

Here, th,e estimated slope is ~* and the error term is represented byj¡.

10.3 From Two-Variable to Multiple Regresslon

lust as we did with Z and X, if we use Z to predict y, tben tbe predicted
value oE Y (or Y) (which we willlabel Sr*) based on Z is..simply

1'; * = &* + ~ *Z;,
which, as before, is identical to the preceding ec;¡uat;on, but-without the
error term, because the residuals are expected (on average) to be zero.
And again, as before. we can subS~t¡,¡t8 ~e 18ft baBel siele 6f-the preceding
equation imo the previous eguation, and get

Y;

== 1';* + V¡

or, equivalently,

V¡ == Y; - 1';*.
These Vi, then, are interpreted in in identical way to that of the preceding )
~ They represent the portion oE the variation in Y that Z caooot explain.
(The portion of Y that Z can explain is the predicted portion - the Y/.)
Now what has this accomplished? We have just documented the relationship between Z and Yand partitioned the variarion in Y inro two partsthe portion that Z can explain and' the portion that Z cannot explain.
So we have now let Z try to explain X and found the residuals (the ei
values); similarly, we have also now let Z try to explain Y, and found those
residual s aswell (the Vi values). Now back to our three-variable regression
model that we have seen before, with Y as the dependent variable, and X
and Z as the independent variables:

I

1

Y;
The formula for

3J-s

== &+ ~1X¡ + ~2Z; +

"¡.

@¡, representing the effect of X on Y while controlling for

-

A _
1-'1 -

I:7=t e¡Vi

"n
L...¡=1 e¡

A2'

Now, we know what e¡ and Vi are from the previous equations. So, substiruting, we Jiet

~

_ I:7=t (X¡ - -Xi)(Y; - Sí·)
tI:7=1(X; _ -Xi)2

Pay careful attention to this formula. The "hatted" components in
these expressions are from the two-variable regressions involving Z that
\Ve previously learned about. The key components of the formula for the
effect of X on y, while controlljnc for Z, are the e¡ and vi, which, as we jusJ
learned, are the portions of X and Y (respectively) that Z cannot account
foc. And that is how, in the multiple regression model, the parameter (31)

11

11

!!

188

Multiple Regression Models 1: The Basies

which represents the effects of X on Y, controls for the effects of Z. How?
]ecause the only components of X and y that it uses are components that
Z cannot account for - that is, the e¡ and V¡.
Comparin~ this formula for 6, with the two-variable formula for /3
is very revealing. Instead of using the factors (Xi - Xl and (Y; - Y) in the
numerator, which were the components of the two-variable regression of Y
on X from Chapter 9. jn the niultiple regression formula that controls for
Z the factors in the numerator are (Xi - Xi) and (Y; - Y¡*), where, again,
the hatted portions represent X as predicted by Z and Y as predicted by Z.
Note something else in the comparison of the two-variable formula
for /3 and the multiple regression formula for /31' The result of ~ in the
two-variable regression of Yand X and ~1 in the three-variable regression
of y on X while controlling for Z will be different almost all the time. In
{act, it is quite rare - though mathematically possible in theory that those
two values will be identical,3
And the formula for 6 1 , likewise, represents the effects of Z on Y
while controlling for the effects of X. These processes, in fact, happen
simultaneously.
It's been a good number of c.\lapters - six of them, to be precise between the first moment when we discussed the importance of controlling
for Z and the point, just aboye, when we showed you precisely how to do
it. The fourth causal hurdle has never been too far from front-and-center
since Chapter 3, and now you know the process of crossing it.
Don't get too optimistic too quickly, though. As we noted, the threevariable setup we just mentioned can easily be generalized to more than
three variables. But the formula for 6 1 controls only for tbe effects of the
Z variable that are included in the re ression e uation. It does not con rol
or other vana es t at are not measured d not included in the mo el.
An what happens when we fajl ro iDdude a relevant cause oí y in our
regression model? Bad things. And those bad things are the focus of the
\
next section.

l

J1
-I

i

-PU WHAT HAPPENS WHEN WE FAIL TO CONTROL FO~ Z?
Controlling for the effects of other possible causes of our dependent variable
Y, we have maintained, is critical to making the correct causal jníerences.
Sorne Df you might be wondering something like the following' "How does
omitting Z from a regression model affect my inference of whether Xcauses
Y? Z isn't X, and Z isn't Y, so why should omitting Z matter?"
3

In the next section, you will see that there are two situations in which the two-variable
and multiple regression parameter estima tes of f3 will be the same.

189

10.4 What Happens When We Fail to Control for Z?

Consider the following three-variable regression mode! involving our
now-familiar trio of X, Y, and Z:
Y¡ =

IX.

+ 131 Xi + 1324 + u¡.

And assume, for the moment, that this is the correct model oí reality.
That is, the only systematic causes of Y are X and Z; and, to sorne degree,
Y is also influenced by sorne random error component, u.
Now let's assume that, instead of estimating this correct model, we fail
to estimate the effects of Z. That is, we estima te
Y¡ =

IX.

+ (3 ~ X¡ + u¡ .

As we previously hinted, the value of (31 in the correct, three-yaria bIe
equation and the value of (3i wiII not be identical under most greumstances. (We'll see the exceptions in a moment.) And that, right there, shouId
be enough to raise red flags of warning. For, if we know that the threevariable model is the correct m
- and what that means of cou e i that
t e estimated vaIue of (31 that we obtain from the data wiII be egual to the
true population vaIue - and if we kno\V that (3, wiII not be egual to (3 j , then
there is a problem with the estimated value of (3 •• That robIem is a st tistical problem calle ias hich means that the ex ected value oE the arameter estimate that we obtain from a sample wiII not be egual to tbe true
population parameter. The specific tvpe of bias that results from the failure to inelude a variable that belon s in our re ression model is ea ed
omitted-variables bia
Let's get specific about the nature of omitted-variables bias. If, instead
oE estimating the true three-variable model, we estimate the incorrect twovariable model, the formula lar the slope @j wiII be

~* = I:7=I(X¡ - Xl(Y; - Y)
1
I:7=1 (Xi - X)2
.
Notice that this is simply the bivariate formula for the effect of X on Y. (Of
course, the model we just estimated is a bivariate model, in spite of tbe fact
that we know that Z, as well as X, affects Y.) But beca use we know tbat Z
should be in the model, and we know from Chapter 9 that regression lines
trave! through the mean values of ea eh variable, we can figure out that the
following is true:

We can do this because we know that the plane will trave! througb eaeh
mean.
Now notice that the left-hand side of the preceding eguation - the
(Y; - Y) - i! identical to one portion of the numerator of the slope for

i3;.

190

Multiple Regression Models 1: The Basies

191

Therefore we can substitute the rigbt-baod side of tbe preceding equation- that entire mess - into the numerator of t e formul or 13"'* ,
yes,
1
- e resulting máth isn't anything that is beyond your skills in algebra,
but it is cumbersome, so we won't derive ir here. After a few lines of
m~ltiplying and reducing, though, the formula ror ~: will reduce..ro

10.4 What Happens When We Fail to Control for Z?

y

E(íi*) = a +a L?=I(X;-x)(~-z).
1-'1
1-'1 1-'2
L?=I(X;_X)2

t

This might seem like a mouthful - a fact that's rather hard to deny - but
there is a very important me~s*ag~ in it: What the equ.ation says is ~t the
estimated effect of X on Y. 61' In whlch We do not mclude the f'~ts of
Z on Y (but should have), will be egual to the tcue (31 - that is.z. the effe.ct
with Z taken into account - plus a bundle of other stuff. That other stuff,
strictly speaking, is bias. And because this bias carne about as a result of
omitting a variable (Z) that shollld have been in the modelo this type of bias
is known as omitted-variables bias...
Obviously, we'd like the expected vallle of OUt 11;. (estimated without
Z) to egual the true {3¡ (as if we had estimated the equation with Z). And if
the things on the right-hand si de of the "±" sign in the precedinc equation
egual zero, it will. When will that harpen? In two cirQ!IDstances, neither
of which is particularly likely. First, ~1 = 131 if 132 = O. Second, ~; = 131 if
the large quotíent at the end of the eguation is egual to zero. What is that
quotient? It should look familiar; in fact, ir is the bivariate slope parameter
of a regression of Z on XIn the first of these two special circumstances, the bias term will ~ual
zero íf and only if the effect of Z on Y - that is, the parameter (3, - is
zero. Okay, so it's safero omit an independent variable from a recression
equation if it bas no effect on tbe dependent variable. (If that seems obvious
to yo u, good.) The second circumstance is a bít more interesting: It's safe to
omit an independent variable Z from an e uation'f it is entirel nr l te
to the ot er in e endent variable X. Of course, i we omit Z in such
circumstances, we'll still e eprive of und rsta in how Z affe
. but
at least, so Ion as Z and are absolutel unr 1
i
not
advc:r.se!y a kct our zjtimate of..the.e!fe!iI Q(X.9Q..r
- W~phasiz; that this second conditíon is unlikely to occur in practice. Thereíore, ií Z affects Y, and Z and X are related. then ií we omit Z
{rom our model, our bias term will not egual zera. In the end. omittin~ Z
wiH cause us to misestimate tbe effect of X 00 Y.
Tbat might seem unfair, but it's true.If we estimate a regression model
that oiñfts an independent variable (Z) that beloogs in tbe model, then
4

Omitting Z from our regression model also drives down the

R2

statistic.

Figure 10.1. Venn diagram in which X,
Y, and Z are correlated.

the effects of that Z will somehow
work their way into the parameteYstima tes for the independent vari~le
that we do estimate (X) and pol!ute
our estimate of the effect of X aUo Y.
The preceding equation also suggests when the magnitude of the bias
is likely to be large and when it is
likely to be smaH. If either or both of
the components of tbe bias term [132
and Ei_J~.\í-X)(4-Z)] are clase to zero,
then th~;b~~~i~l~kely to be smaH (be-

cause the bias term is the product of
both components); but if both are likely to be large, then the bias is likely
to be quite large.
Moreover, the eguation also suggests the likely direction of the bias.
AH we have said thus far is that the coefficient
wi!! be biased - that is, it
will not egual irs true value. But will it be too large or too smaIJ? If we bave
good guesses about the values of B2 and the correJation between X and Z,
then we can suspect the direction oE the bias. For example, s~at
131' 132' and the correlation between X and Z are aH positive. That means
that our estimated coefficient 13j will be Iarger than it is supposed to be,
beca use a positive number plus the product of two positive numbers will
be a still-Iarger positive number. And so on. s

S;

To better understand the impdrtance of controlling for other possible
causes of the dependent variable and the importance of the relationship
(or the lack of one) between X artd Z, consider the following graphical
- iIlustrations. In,Figure 10.1, we represent the total variation oí Y, X, and
Z each with a circle. The covariation between any oí these two variables _
or among all three - is represented by the places wbere the circles over~p.
Thus, in the figure, the total variation in Y jsrepq:sented as the sum of the
area a + b + d + f. The covariation between Yand X is represented by the
area b+d.
Note in the figure, though, that the variable Z is rrJated to both Y and
X (because the circle for Z overIaps with both Yand X). In particular the
a
relationship between Y and Z is accounted for by the area f + d, and,.the
relationship between Z and X is accounted for by the area d + e. As we
have already seen, d is also a portion of the relationship between Y and
X. If, hypothetically, we erased the circle for Z from the figure, we wouId
s With more than two independent variables, it becomes more complex to figure out the
direction of the bias.

,
r
I

/

1

-,




1
l'

¡

192

193

Multiple Regression Models 1: The Basles

(incorrectly) attribute aH of the aTea
b + d to X, when in fact the d nartion of the variation in Y is shared by
both X and Z. This is why, when Zis
related to bóth X and Y, if we faiLJo
control for Z, we will end up with a
biased estimate of X's effect on Y.
Consider the alternati~e­
nario, in which both X and Z..affect
Figure 10.2. Venn diagram in which X Y, but X and Z are com letel u reand Y are correlated with Z, but not with
late to one another. That scenarioJs
each other.
portrayed graphica11y in Figurc0.0.2 .
There, the circles for both X and Z overIa with the circle for Y but they
o not overIap at a11 with one another. In that case - which we hay
ed,
is un ikely in applied research we can safely omit consideration oE Z when
considering the effects of X on Y. In that figure. the relationshjp bet:ween
X and Y the area b is unaffected by the presence (or absence) of Z in
the mode1. 6
y

Table 10.1. Three regression models of
U.S. presidential elections

'C
Gro~h~;(;;~' ";"0.65"
GoodNews
Constant

6

For identieal reasons, we eould safely estimate the effeet of Z on Y - the area
eonsidt;ring the effeet oE X.

f - wirhout

51.86',
(0.88)

0.96'
(0.34)
47.20'
(2.07)

,', ~36
32

.20
32

0.72'
(0.30)
48.12'
(1.75)

,,~
!

í
1

J

.46

32

Notes: Standard errors are,ln parentheses.

'= p< .05

/

in Multiple Regression

I

i

(0:16), "

~'E4,:i': An Additional Mintmal Mathematical Requirement

We outlined a set of assumptions and minimal mathematical requirements
for the two-variable regression model in Chapter 9. In multiple regression L
aH of these assumptions are made and a11 of the same mjnimal matbematical
requirements remain in place. In addition to those, however, we need to
~dd one more minima! mathematical requirement to be able ro estimate
our multi le re ression mad ls: t must be the case that there is o xac
ee an
o m re of r i
ri
inear lat 'onsh' b
w ich we have c;Ued X and Z). This is also called the assumprion DE DO
perfect multicollinearity (as in X and Z cannot be perfectly collinear).
What does it mean to say that X and Z cannot exist in an j!xact
linear relationship? Refer back to Figure 10.1. If X and Z had an exact
linear relationship, instead of having sorne degree of overIaD tba r iSrSome
imperfect degree of correlation - the circles would be exactly on top of one
another. In such cases, it is literally impossible to estimate the regression
m~del, as separating out the effects of X Qn Y from the effects oE Z on Y is
impossibk.

10.5 Interpreting Multiple Regression

-(.8.,

This is not to S'ay that we must assume that X and Z are egtirely
uncorrelated with one another (as in Figure 10.2). In fact, in almose ..aH
applications, X and Z will have some degree of correlation between thero.
Things become complicated only as that correlation approaches 1.0; a"--d
when it hits LO, the regression model will fai! to be estimable with both X
and Z as independent variables. In Chapter 11 we will discuss these issues
further.

INTERPRETING MULTIPLE REGRESSION

For an illustration of how to interpret multiple regression coefficients, let's
return to our example from Chapter 9, in which we showed you the results
of a regression of U.S. presidential e1ection results on the previous year's
growth rate in the U.S. economy (see Figure 9.4). The model we estimated,
you will recall was Vote = "+ (13 x Growth), and the estimated coefficients there were &. = 51.86 and ~ = 0.65. Those results are in column A of
Table 10.1._
In column A, you see the para meter estimates tor the annual growth
rate in the U.S. economy (in the row labeled "Growth" and the st dard
error o t at estimated slo e 0.16. In the row labeled "Constant " yOtl
see t e estimated y-intercept for that regression, 51.86. and its associated
standard error, 0.88. Both parameter estimates are statistically significant.
RecaH that the interpretation of the slope coefficient in a two-variable
regression indicates that, for every one-unit increase in the independent
variable, we expect to see @units of change in the dependent variable. In
the current context, ~ = 0.65 means that for every extra one percentage

¡
~1

'1

194

195

Multiple Regression Models 1:. The Basles

point in growth rate in the U.S. economy, we expect to see, on average,
-;n extra 0.65% in vote percenta e for the incumbent art in residential
elections.
Bu;-;ecaIl our admonition, throughout this book, about being too
quick to interpret any bivariate analysis as evidence of a causal relationship.
We have not shown, in column A of Table 10.1 that hi her r
rates
in t e economy cause incumbent-party vote totals to be higher. To be SJlIj!,
the evidence in column A is consistent with a causal connection but ·t oes
~ot prove it. Why not? Because we have not controlled for ?ther ~~ible
causes of e1ection outcomes. Surel there are other causes In addltlo to
how t e economy has (or has not) rown·in the last ear f how we he
incum ent party will fare in a presidential election. Indeed, we can even
imagine other economic causes that might bolster 0)]( statistica l explanation
of presidential e1ectio...rn. 8
.
Consider the fact that the growth variable accounts for economic
growth over the past year. But perhaps the public rewards or punishes
the incumbent party for sustained economic growth over the long run, In
particular, it does not necessarily make sense for the public to redect a
party that has presided over three years of subpar growth in the economy
hut a fourth year with solid growth. And yet, with our single measure of
growth, we are assuming - rather unrealistically - that the public would
pay attention to the growth rate only in the past year. Surely the public
does pay attention to recent growth, but the public might also pay heed to
growth overthe long ruo.
In eolumn B of Table 10.1, we estimate an t
sion mo el, this time using the number of eQDseclItive quarters oí s~ng
economic growth leadin u to the residential ele .
e v i ble 1
e1ed "Good News" -as our independent variable. 9 (Incumbeot-Party Vote
Share remains our dependent variable.) In the row labeled "Good News,"
we see that the parameter estima te is 0.96 which means that on avera e,
o~ every additional consecutive Qparter of good economic news. we expect
v
.
is
to see a 0.96% inerease in incumbents~atistically signific~t.
Be sure not to invert the independent and dependent variables in describing results. It is
not correet to interpret column Ato say "for every 0.65-point change in growth rate in
the U.S. economy, we should expect to see, on average, an extra 1 % in vote percentage
for the incumbent party in presidential e1ections." Be sure that you can see the difference
in those descriptions.
.
8 And, of course, we can imagine variables relating to success or failure in foreign policy,
for example, as other, noneconomic causes of election outcomes.
9 Fair's operationalization of this variable is "the number of quarters in the lirst 15 quarters
of the administration in which the growth rate oE real per capita GDP is greater than
3.2 pe.reent."

7

J

10.5 Interpreting Multiple Regressfon

Qur separate two-variable regressions each show a relationship between the independent variable in the particular model g..nd incumbentparty vote shares. But none of the parameter estimates in....columns A or
B controls for the other independent variable. We rectify .tbat situation in
column C, in which we estimate me effects of hotb the Gromth and Good
News variables on vote shares similltaneopsly.
Compare colurno C with cplninos A and B. In tbe Tpw labeled "Good
News," we see thar the estimated parameter of B= 0,72 indicates that,
for every extra quarter of ayear wim strong growth rates, the incumbent
party should expect to see an additional 0.72% of the national vote share,
while contro/ling for the effects of Grow!!?.. Note the additional c1anse in
the interpretation as well as me emphasis that we place on it. Multiple
regression coefficients always represent me effects of a one-point inc~se
in that particular independent variable on tbe dependent variable, while
controlling for the e.{fects afall oeber ;ndependent variables in the madel.
The higher the number of quartets of continuous stron rowth in the
uld be in the next
econom the hi er t
e1ection, controlling for the previous year's growth.rate.
But, critical to this chapter's ~ocus on multiple regression, notic0n.
column C how including the "Good News" v riable chan es the estima d
e ect oi t e "Growth" variable fróm an estimated 0.65 in colurno A..to
0.57 in column C. The effect in coIumn C is different beca use ir cantrols
for the effects o( Goad News. That is, when the effects of lon:r~nning
economic expansions are controllea for, the effects of short-rerm ~rowth
falls a bit..The efiect is srill Quite strong and is still statistically.IDgnificant,
but it is more modest once the eHects oi long-term growth are...taken into
account. lO Note also that the R2 statistic rises from .36 in column A to
.46 in column C, which means thát adding the "Good News" variable
increased the proportion of the varidnce of our dependent variable that we
have explained ll by 10%.
And we can likewise compare the bivariate effects of Good News on Vote shares in
column B with the multivariate results in eolumn e, noting that the effect oE Good News,
in the multivariate context, appears to have fallen by appcoximately one-fourth.
11 Ir is important to be cautious when reporting contributions to R2 statistics by individual
independent variables, and this table pcovides a good example oE why this is the case.
If we were estimate Model A lirst and e second, we might be tempted to conclude that
Growth explains 36% oEVote and Good News explains 10%. But iE we estimated Model
B and then e, \Ve might be tempted to condude that Growth explains 26% of Vote and
Good News explains 20%. Actually, both of these sets of conclusions are faulty. The
R2 is always a measure of the overall lit of the model to the dependent variable. So, all
that we can say abour the R2 for Model e is that Growth, Good News, and rhe constant
term together explain 45% of the variation in Vote. So, although we can talk about how
the addition or subtraction oE a particular variable to a model increases or decreases the
lO

~j

I,

196

MUltiple Regression Models 1: The Basles

/1

In this particular example, the whole emphasis on controlling for other
causes might seem like much ado about nothing. After all, comparing the
two columns in Table 10.1 did n~t change our interpretation of whether
short-term growth rates affect incumbent-party fortunes at the polls. But
we didn't know this until we tested for the effects of long-term growth. And
in Chapter 12, we will see examples in which controlling for new causes
of the dependent variable substantial1y changes our interpretations about
causal relationships. We should be clear abo!!t one other thing regarding
Table 10.1: Despite controllin for another varia le we s ill have
s
to go e ore we can say that we've controlled for all other possible cau,$!s
of the dependent variable. As a result, we should be cautious a.Q..out interpreting those results as proof of causality. However, as we contin~to
add independent variables to our regression model. we ineh c10sec and
closer to saying that we've controlled for every other possible cause...that
comes to mind. Recall that, all the way back in Chapter 1, we noted that
one of the "rules of the road" of the scientific enterprise is to always be
willing to consider new evidence. New evidence - in the form of controlling
for other independent variables - can chan e our inferences about wh her
any particular m ependent variable is causallf related to the dependent
v~.

197

10.6 Whieh Effect Is "Biggest"?

Although they are normally not comparable, there is a rather simple method
to remove the metric of each variable to make them comparable with one
another. As ou mi ht ima ine, such coefficients because they are on a standardized metric, are referred to a tandardized coefficients We comB,ute
them, quite simply, by taking the unstandardized coefficients and taking
out the metrics - in the forms of the standard deviations of both the
independent and dependent variabl~:
A

/3Std

/

model's R2, we should not be tempted to attribute particular values oí R2 to specific
independent variables. lf we returo to Figure 10.1, we can get sorne intuition on why
this is the case. The R2 statistic for the model represented in this figure is aJi:~!6' lt is
the presence of area d that confounds our ability to make definitive statements about the
contribution of individual variables to R2.

Sx

/3-,
Sy

~~Std

is t~e standardized regression coefficient, (3 is the unstandardized coefficient (as in Table 10.1), and Sx and Sy are the standard deviations
of X and Y, respectively. The interpretation of the standardized coefficients
changes, not surprisingly. Whereas the unstandardized coeffisients represent the expected ehange in Y given a one-unit increase in X, the standardized coefficients represent the expected standard deviation change in Y
given a one-standard-deviation increase in X. Now, because all parameter
estimates are in the same units - that is, the standard deviations - they
beeome comparable.
Implementing this formula for the unstandardized coefficients in column C ofTable 10.1 produces the following results. First, for Growth,

~Std = 0.57 (~:~;) =

• MI WHICH EFFECT IS "BIGGEST"?
In the preceding analysis, we might be tempted to look at the coefficients
in column C of Table 10.1 for Growth (0.57) and for Good News (0.72)
and conclude that the eHect for Good News is roughly one-third larger
than the eHect for Growth. As tempting as such a conclusion might be, it
K must be avoided for one critical reason: The two independent variabl~are
~ measured in different metrics. which makes that comparison misleading.
Short-run growth rates are measured in a diHerent metrjc - ranging from
negative numbers for times durin which the econom shrunk all the way
t rough stronger periods during wpich growth exceeded 5% per fear than
are the number of guarters of consecutiye strong growth which [anges
{rom O in the data set through 10. That makes eomparing the coefficients
misleailing.
Because the coefficients in Table 10.1 each exist in the native metric of each variable, they are referred to asénstandardized coefficient~

A

=

0.52 .

Next, for Good News,

~Std = 0.72 (!:~~) = 0.34.
These coefficients would he interpreted as follows: F6r a one-standarddeviation increase in Growth. on average. we expect a 0.52-standarddeviation inerease in the incumbent-party vote share, eontrolling for the
effects of Good News. And for a one-standard-deviation ¡ncrease in Good
News, we expect to see, on average, a 0.34-standard-dev;at;on ¡ncrease
tbe cffects of Growth.
in the incumbent-party vote shares, controllin~
Note how, when looking at the unstandardized coefficients. we might have
mistakenly thought that the cEfeet of Good News was larger than the effect
of Growth. But the standardized coefficients (eorrectly) tell the opposite
s,!ory: The estimated eflect of Growth is abol!t 1 50% oLthe size of the
eHect of Good News,,!2

ror

12

Sorne objecti~ns have bee~ raised about the use of standardized coefficients (King 1986).
From a techmcal perspectlve, because standard deviations can differ across samples, this
makes the results of standardized coefficients particularly sample specific. Additionally,
and from a broader perspective, one-unit or one-standard-deviation shifts in different
independent variables have different substantive meanings regardless of the metrics in

1
1

198

Multiple Regression Models 1: The Basies

8(10

STATISTICAL AND SUBSTANTIVE SIGNlFICANCE

[

Related to the admonitionabout which effect is "biggest," co...nsider the
following, seemingly simpler, guestioni Are the effects found in cDlumn C
(¡fTable 10.1 "big?" A tempting answeqo that Question is "Well ofcourse
they're big. Both coefficients are statistica))y significant. Iberefore,...they're
bjg,."
Ihat logic, although perha s a ealin'
. Recall the discussion
from Chapter speci cally, Subsection 7.3.2) on the effects of sample size
on the magnitude of the standard error of the mean. And we noted the same
effects of sample size on the magnitude of the standard error of our regression coefficients (specificaIly, Section 9.4). What this means is that, evellif
the strength of the relationshiEjas measured by our coefficient estimates)
remains constant, by merel v increasin~ our sample sjze we can affect. the
. nifistatistical significance of those coefficients. Wh ? Because statisti
cance is determined by a t-tW (see Subsection 9.4.7) in which the standard
error is in the denominator of that guotient. What you can remember is tbat
larger sample sizes wiII shrink standard errors and th r f r ma
ding
statistically significant relationships more likeJy.13 It is also apparent from
Appendix B that, when the number of degrees of freedom is grea ter , it is
easier to achieve statistical i nifica
We ope that you can see that arbitrarily increasing the size of a
sample, and therefore finding statistically significant relationshipso does not
in any way make an effect" i er" or even "bi ." Recall such chan es o
the standard errors have no beariog on the rise-oyer-run nature of the..slope
coefficients themseIves.
, ( ~Id you judge whether an eHect of one variable on
; .. another is "big?" <?ne way is t~ use the ~ethod ¡ust described - using
same
standardized coefficlents. B lacIO the vana s o
metnc..,)t is possible to come to a j~d~ment about .how big .an effect is.
Ihis is particularly helpful when the mde endent vana bies vana
X and
Z-;::-or the ependent variable y, or botho are measured in metrics that are
unfamiliar or artificg}.
When the metrics of the variables in a regression analysis are intuitive and weIl known, however, rendering a judgment about whether an
,<ffect is large or small becomes somethin~ of a matter of interpretation,
For example, in Chapter 12, we wiII see an example relating the effects of

¡

which the variables are measured. We might therefore logieally eonclude that there isn't
mueh use in trying to figure out whieh effeet is biggest.
13 To be certain it's not always possible to inerease sample sizes, and, even when possible,
it is nearly al~ays eostly to do so. The ceseareh situations in whieh ineceasing sample size
is mo~t likely, albeit still expensive, is in mass .. based survey research.

199

10.8 Implieations

changes in the unemployment rate (X) on a president's approval rating (y).
It is very simple to interpret that a slope coefficient of, say, -1.51, means
that, for every additional point of unemployment, we expect approval to
go down by 1.51 points, controlling for other factors in the mode!. Is that
effect large, smaIl, or moderate? Ihere is something of a judgment caIl to
be made here, but at least in this case, the metrics of both X and Y are quite
familiar; no one needs to explain What unemployment rates mean or what
approval poIls mean. Independent of the statistical significance of that estimate - which, you should note, wé have not mentioned here - discussions
..
of this sort represent attempts to judge the substantive significauce of a
coefficient estimate. Substantive significance is more difficult to judge than
statistical significance because there are no numeric formulae for making
such judgemerus. Instead, substantÍve significance is a judgment call about
whether or not statisticaIly significant relationshjps are "Iarge" or "smaIl"
in terms of their real-world impacto
From time to time we will see a "Iarge" parameter estima te that is
not statistically significant. Although it is tempting to describe~ a result
as suostantively significant, it is noto We can undersrand tbjs b}! thinking
about what ir means for a particular result to be statisticalIy significant. As
we discussed in Chapter 9, in most cases we are testin~ the nuU b)LpQthesis
that the population parameter is equal to zero. In such cases, even when we
have a large parameter estimate, if it is statisticaIly insi~njficant thiuneans
~th~t it is not statisticalIy distinguish'able from zero. Iherefore a parameter
estima te can be substantive1y signifu;ant onlv when it is also statisticalIy
signifi,9!lt.
.
~

I

.',!:I IMPLICATIONS
What are the implications of this chapter? Ihe key take-home point of this
Chapter - that failing to control for aIl relevant independent variables wiII
often lead to mistaken causal inferences lor the variables that do make it
into
our models - applies in several Contexts. If vOu are reading a rese,arch
\
~in one of your other e1asses, and it shows a regression analysis
between two variables. but fails to control for the effects of some other
possible cause of the dependent variable, then you have sorne reason to
be skeptical about the reported findings. In particular, if vou can think
of another independent variable rbat is likel v to be related to Iwth the
independent variable and the dependent variable. then the relatioDship that
the artiele does show that fails to control for that variable is IikeIr.. to be
elagued with bias. And if that's the case, then there is substantial reason
to .Qoubt the findings. The findiD~s might be right, but fOU can't know

200

Multiple Regression Models 1: The Basies

that from the evidence presented iD tbe articlej in partiC!!.4tr, you'd need to
control for the omitted variable to know for sure.
But this critical issue isn't just encountered in research articles. When
you read a news article from your favorite media web site that reports a relationship between sorne presumed cause and sorne presumed effect - news
articles don't usually talk about "independent variables" or "dependent
variables" - but fails to account for sorne other cause that you can imagine
might be related to boi:h the independent and dependent variables, then
you have reason to doubt the conclusions.
It might be tempting to reaCf to omitted-variables bias by saying,
"Omitted-variables bias is such a potentially serious problem that l,don't
want to use regression analysis." That would be a mistake. In fuct, tbe logic
of omitted-variables bias applies to any type of research, no matter what
type of statistical techni ue used - in fact no matter whether the r eareh
is qua itative or quantitative.
Sometimes, as we have seen, coutrolling for other causes of me dependent variable ehanges the discovered effects 2nly at the margins. That
happens on occasion in applied research. At other times, however, failure
to control for a relevant cause of tbe dependent variable can haye..serious consequences for our causal inferences about the real world. In Chapter 12, you will see several such examples. But first, in Chapter 11, we
present you with sorne crucial extensions of the multiple regression model
that you are likely to encounter when consuming or eondueting research.

201

Exereises

Table 10.2. Bias in I~ 1 when the true
population model is Y, = ex + I~IX, + 0 2 2,
but we leave out Z
. ti '

I:T_í(X¡-:i){z,,,:!)'/'c;:j"~

.... 2....,I:l'.t(X¡-Jb2 ,:,/:: ';,'.

¡i
O

-t'

O
O

+

+

+

+

/.

,

R

+ 11,

ltin b' . a
es" .. ,. 9 las ID Pt
?
?
?
?
?
?
?
?

model). You go to a researeh presentation in whieh other researehers present
a theory that their independent variable Z. is negatively related to their dependent variable Y;. They report the results from a bivariate regression model
in which the slope parameter for Z. was statistically signifieant and negative.
Your Y; and their Y; are the same variable. What would be your reaetion to
these findings under eaeh of the following circumstances?
(a) You are eonfidennhat the correlation between Z. and X; is equal to zero.
(b) You think that the correlation between Z; and X; is positive.
(e) You think that the correlation between Z. and X; is negative.

CONCEPTS INTRODUCED IN THIS CHAPTER

1

bias
omitted-variables bias
perfeet multicollinearity

1

standardized eoeffieients
substantive signifieanee
unstandardized eoeffieients

1

1,
EXERCISES

1. Identify an article from a prominent web site that reports a causal relationship
between two variables. Can you think of another variable that is related to
both the independent variable and t~e dependent variable?
2. In Exereise 1, estimate the direetion of the bias resulting from omitting the
third variable.
3. FiII in the values in the third column of Table 10.2.
4. In your own researeh you have found evidenee from a bivariate regression

mode! that supports your theory that your independent variable X; is positively related to your dependent variable Y; (the slope parameter for X; was
statistical.ly signifieant and positive when you estimated a bivariate regression

¡

203

11.2 Dummy Independent Variables

."1

BEING SMART WITH DUMMY INDEPENDENT VARIABLES IN OLS

In Chapter 6 we discussed how an important part of knowing your data
involves knowing the metric in which each of your variables is measured.
Throughout the examples that we have examined thus far, almost aH of
the variables, both the independent and dependent variables, have ~en
continuous. This is not by accidento We chose examples with continuous
variables beca use they are, in mani cases, easier to interpret than modd.§.in
which the variables are noncontinúous. In this section, though, we consider
a series of scenarios involving independent variables that are not contp.u~e begin with a relatively simple case in which we have a categorical
independent variable that takes on one of two possible values for allJ:ases.
Categorical variables like this are commonly referred to as dummy vari~Although any two values will do, the most common form oí dummy
variable is one that takes on values of one or zero. We then consider more
complicated examples in which we have an independent variable that is
categorical with more than two categories.

m

Multiple Regression Models 11:
Crucial Extensions
..

"

~OVERVIEw'

In this chapter we provide introductory disc:ussions ol and advice tor commonly encountered researeh seenarios involving multiple regression models. Issues eovered include dummy independent variables, interactive speeifieatioIl§.. dummy dependent variables, influential cases, multicollineari~,
and models of time-series data.

. , , . EXTENSIONS OF OLS

In the prevíous two chapters we discussed in detail various aspects of the
estimatíon ano interpretatíon of OLS regressíon models. In this chapter
we go throu h a series of research scenarios commonl
y
politicalscience researchers as they attem t to t th ir
eses within
the OLS ramew
. T e purpose of this chapter is twofold - first, to help
you to identify when you have hit these issues and, second, to help you to
figure out what to do ro continue on your way.
We begin with a discussion of "dummy" independent variables and
how to properly use them to make inferences. We then discuss how to test
interactive hypotheses wíth dummy variables. Qur third topic with dummy
variables involves the interpretation of models in which our dependent
variable ís a dummy variable. We next tuen our attention to two frequently
encountered problems in OLS - oudiers and multicoIlinearity. With both
oE these topics, at least half of the batde is identifying that you have the
problem. Finally, we conclude with a discussion oE a series of problems
specific to the analysis of time-series data.

202

• ..••.•• ;'.e)

~ 1 ~;2 ..1 j

Using Dummy Variables to Test Hypotheses about a Categorical
Independent Variable with Only Two Values

During the 1996 U.S. presidential election between incumbent Democrar
Bill Clinton and Republican challenger Robert Dole, Clinton's wife Hillary
was a prominent and polarizing figure. Throughout the next couple of examples, we will use her "thermometer ratings" by individual respondents
to the NES survey as our dependent variable. A thermometer rating is a
survey respondent's answer to a question about how they feel (as opposed
to how they think) toward particulár individuals or groups on a scale that
typicalIy runs from O to 100. Scores oí 50 indicate that the individual feels
neirher warm nor cold about the individual or group in question. Scores
from 50 to 100 represent increasingly warm (or favorable) feelings feelings, and scores from 50 to O represent increasingly cold (or unfavorable)
feelings.
During the 1996 campaign, Ms. Clinton was identified as a being a
left-wing feminist. Given this, we thebrize thar there may haye beeo a causal
relationship between respondents' family incomes and their thermometer
rating of Ms. Clinton - with wealthier individuals, holding all else constant,
liking her less - as well as a relationship between respondents' geoder and
their thermometer rating of Ms. Clinton - with women, holding al! else
constant, liking her more. For the sake of this example, we are going to
assume that both our dependent variable and our income independent

¡j

I
:/

204

Multiple Regression Models 11: CfUcial Extensions

205

11.2 Dummy Independent Variables

.reg hillarLthermo income male lemale
Source
Model
Residual
Tolal
hiliary

Ih~o

¡ncome
male
female
eons

SS

80916.663
1266234.71
1347151.37
Coef.

dI

F( 2, 1539)
Prob> F

2 40458.3315
1539 82 .764595
1541 874.205954
Sld. Err.

1542
49.17
0.0000
0.0601
0.0588
28.684

Number 01 obs =

MS

R-5quared

Adj R-5quared =
P>:I:

Rool MSE
=
[95% Conf. (nlerval]

Table 11.1. Two models of the effeets of gender and
ineome on Hillary Clinton Thermometer seores

>:

f'

mdependent variable'

.,

Male

Model.1

-8.0S···
(1.50)

.117856

-7.13

0.000

-1.071948

-.6095978

Female

,,' S.OS···

8.081448
1.495216
61.1804 . 2.220402

5.40
27.55

0.000
0.000

5.148572
56.82507

11.01432
65.53573

mcorne

-0.S4···
(0.12)
61.1S···

-.840n32
(dropped)

Figure 11.1. Stata output when we inelude both gender dummy variables in our model.

(1.50)

mtercept

(2.22)

variable are continuous,t Each respondent's gender was cod~d as eq!!aling
either 1 {or "male" or 2 {or "female." Although we could leave this gender
variable as it is and run our analyses, we chose to use this variable to create
two new dummy variables, "male" equaling 1 for "yes" and O for "no,"
and "female" egualing 1 for ':Ves" and O lor "no."
Qur first inclination is to estimate an QLS model in which the specification is the following:
Hillary Thermometer¡' = IX. + ptIncome¡ + f32Ma1e¡

1542
.06

-0.S4'"
(0.12)
69.26'"
(t.92)

1542
.06

Notes: The dependent.variable In both models is the respondent's
thermometer score for Hillary Clinton. Standard ercors in

parentheses. Two-sided t-tests: "'indicates p < .01;
"indicates p < .05; 'indicates p < .10.

+ f33Female¡ + U¡.

But if we try to estimate this model, our statistical com u program
will revolt an glve us an error message.2 Figure 11.1 shows a screen shot
of what this output looks like in Stata. We can see that Stata has reported
the results from the following model instead of what we asked for:
Hillary Thermometer¡

Model2 .

= IX. + f31Income¡ + 13 3Female¡ + U¡.

Instead of the estima tes for /3 2 on the second row of parameter estimates, we get a note that this variable was "dropped." This is the case
beca use wc baye failed to meet the additional mjnjmal mathematjcal criteria
that we jntroduced when we moved from two-yariable OJ Sto IDlIltiple OJ S
in Chapter 10 - "no perfect multicollineari~." The reason that we have
failed to meet this is that, for two of the independent variables in our model,
Male¡ and Female¡, it is the case that
Male¡ + Female¡ = 1 Vi.
1 In this survey, respondents' family ineome was measured on a seale mnging from 1 to 24

aceording to which eategory of income ranges they ehose as best deseribing their family's
ineome during 1995.
2 Most programs will throw one of the two variables out of the model and report the results
from the resulting model along with an error message.

In other words, our variables "Male" and "Female" are perfecdycorrelated: If we know a respondent's value on the "Male" variable, then we
know their value on the "Female" variable with perfect certainty.
'!,hen this happens with dummy variables, we call this situatio,n the
dummy-variable tra,Q.. To avoid the dummy-variable trap, we ha,ye to omit
one of our dummy vari;bles. But we want to be able to compare the
effects of being male with the effects of being female to test our hypothesis.
How can we do this if we have to omit of one our two variables that
mea sures gender? Before we answer this question, let's look at the results
in Table 11.1 from the two different models in ~hich we omit onc of
Í:hese two variables. We can learn a lot by looking at what is a l1 d what
is not the same across these two models. In both models, the parameter
estimate and standard error for income is identical. The R2 statisti!=is also
identical. The parameter estimate and the standard error for the intercept
are different across the two models. The parameter estimate for m?le is
-8.08, whereas that for female is 8.08, although the standard error for
each of these parameter estimates is 0.12. If you're starting to think that all
of these similarities cannot have happened by coincidence, you are correcto
In fact, '(hese two models are, mathematically sQeaking, the sarn~__model.
AII of the y values and residuals for the individual cases are exactly the
same. With income held constant, the estimated difference between being
male and being female is 8.08. The sign on this para meter estimate switches

206

207

Multiple Regression Models 11: Crucial Extensions

from positive to negative when we &0 from Model 1 to Model,2 because
we are phrasing the question differently across the rwo models:

11.2 Dummy Independent Variables

.<.&

• For Model1: "What is the estimated difference for a female compared
with a m-ª.le?"
For Model 2: "What is the estimated difference for a male compared
with a fem,rue?"

i

t
r

t

I
¡
t

So why are the intercepts different? Think back to ouc discussions in
Chapters 9 and 10 about the interpretation of the intercept - it is the estimated value of the dependent variable when the independeot yariablesare
all equal to zero. In Model1 this means tbe estjmated yalue oftbe dependent
variable for a low-income mano In Moclel 2 this means the estimated value
oí the dependent variable for a low-jncome woman. And the diffetence
between these two values - you guessed it - is 61.18 - 69.26 = -8.08!
What does the regression line from Model1 or Model2100k like? The
answer is that it depends on the gender of the individual for which we are
plotting the tine, but that it does not depend on which of these two models
we use. Formen, where Female¡ = O and Male¡ = 1, the predicted values
are calculated as follows:

V¡ = 61.18 + (8.08 x Female¡) - (0.84 x Income¡),
V¡ = 61.18 + (8.08 x O) - (0.84 x Income;),
.~<
Y¡ = 61.18 - (0.84 x Income;);
Model2 for Men: V¡ = 69.26 - (8.08 x Male¡) - (0.84 x Income¡),
V¡ = 69.26 - (8.08 x 1) - (0.84 x Income¡),
V¡ = 61.18 - (0.84 x Income¡).
Model1 for Men:

So we can see th~ for men. regardJess of wbetber we use the results
{raIn Model 1 or Model 2, the formula for predicted values is {he same.
For women, where Female¡ = 1 and Male¡ = O, the predicted values are
calculated as follows:

V¡ = 61.18 + (8.08 x Female¡) - (0.84 x Income¡),
Y; = 61.18 + (8.08 x 1) - (0.84 x Income;),
V¡ = 69.26 - (0.84 x Income¡);
Model2 for Women: V¡ = 69.26 - (8.08 x Male¡) - (0.84 x Income¡),
Vi = 69.26 - (8.08 x O) - (0.84 x Income¡),
V¡ = 69.26 - (0.84 x Income¡).

Model1 forWomen:

....
"""""'"

....

.
.... ...............
Men: Y - 61.18 - (0.84 x Income 1
P

....

...... ....

...................
.............................

o

5

10

15

20

25

¡ncome

Figure 11.2. Regression lines from the interactive model.

Again, the formula from Model 1 is the same as the formula from Model
2 for women.To ilIustrate these two sets of predictions, we have plotted
them in Figure 11.2. Given that the two predictive formulae have the same
slope, ir is not surprising rO see that the two lines in this figure are parallel
to each other with the intercept difference determining the space between
the two lines.
~if~~~:J Using Dummy Variables to Test Hypotheses about a Categorical
Independent Váriable with More Than Two Values

As you might imagine, wh,en we have a categorical variable with more than
two categories and we want to inelude it in an OLS model, things ~ore
complicatesL We'll keep with our running example of modeling Hillary
Clinton Thermometer scores as a function of individuals' characteristics
and opinions. In this section we work with respondents' retigious affiliation
as an independent variable. The frequenc; of different responses to this item
'iñ.the 1996 NES is displayed in Table 11.2.
Could we use the Religious Identification variable a~ it is in our regression models? That would be a bad idea. Remember, this is a categorical
variable, in which the yalues of tbe yariable are not ordered from lowest to hil;hess. II!.-deed, there is no such thing as "lowest" oc "bighest" on
this variable. So runnins a regression model with the data as is wQUld be
meaningless. But beware: Your statistics package does not know thfl.1.tbis

208

Multiple Regression Modela 11: Crucial Extensions

Table 11.2. Religious Identifieation in the
1996 NES

Value

l~Pefcent

Category

"

O
1
2
3
4

Protestant
Catholic'
Jewish
Other
None

153
510

39.85
20.19
1.28
8.93
29.75

209

11.2 Dummy Independent Variables

Table 11.3. The same model of religion and ineome on Hillary Clinton ' :
Thermometer scores with different reference categoríes
~
, IndEipendent
variable'
Income
Protestant
Catholic'

is a categorical variable. It will be more than hap.py to run thuegression
and report parameter estima tes to you. even thouc;h these estima tes will be
nonsensicru.
In the previous subsection, in which we had a categorical variable (Gender) with only two possible values, we saw that, when we switched which
value was represented by "1" and "O," the estimated parameter switched
signs. This was the case because we were asking a different question. With
a categorical independent variable that has more than two values, we have
more than two possible guestions that we can ask. Because using the variable as is is not an option, the best strategr for modeling the effects of
such an independent variable is to inelude a dummy variable for aH values of that independent variable except on!!;.3 T~value<>.f the independent
variable for which we do not inelude a dummy variable is known as the refercnce catcgo!y:" This is the case Qecause the parameter estimates for aH of
the dummy variables representing the other values of the independent variable are estimated in reference to fhat value of the independent variable. So
let's say that we choose to estimate the foHowing model:

Hillary Thermometer¡ =

+ f31Income¡ + f32Protestant¡ + f33Catholic¡
+ f34Jewish¡ + f3 s 0ther¡ + U¡.
IX

For this model we would be using "None" as our reference category
for religious identification. This would mean that ~2 would be the estimated
cffect of being protestant r;¡ative to bein nonreli 'ous and we could se
t~~s va ue along wit its standard error to test the hypothesis that this effect
was statisticaHy significant, controlling for the effeets of ¡ncome. The remaining parameter estimates (jf3' ~4' and ~s) would all also be interpreted
3 If our theory was that only one category, such as Catholics, was different from all of
the others, then we would collapse [he remaining categories of the variable in question
together and we would have a two-category independent variable. We should do this only
if we ~ave a theoretical justification for doing so.

Jewish
Other

>::: .."
,

"

Modell

Model2

Model3

Model4

Mode15

-0.97·"
(0.12)
-4.24"
(1.77)

-0.97"·
(0.12)
-6.66·
(2.68)
-0.35
(2.93)
18.16·· '
(7.02)

-0.97···
(0,12)
-24.82···
(6.70)
' -18.51··
(6.80)

-0.97···
(0.12)
-6.30··
(2.02)

-0.97···
(0.12)

68.40···
(2.19)

-2.42
(2.75)
70.83···
(2.88)

-18.16··
(7.02)
-20.58··
(6.73)
88.98···
(6.83)

18.51"·
(6.80)
0.35
(2.93)
-2.07
(2.12)
70.47···
(2.53)

6.30··
(2.02)
24.82···
(6.70)
6.66·
(2.68)
4.24·
(1.77)
64.17···
(2.10)

1542
.06

1542
.06

1542
.06

1542
.06

1542
.06

2.0~

(2.12)
20.58··
(6.73)
2.42
(2.75)

None
In1.'rcept

Notes: The dependent variable in both models ls the respondent's thermometer
score for Hillary Clinton. Standard errors in parentheses.
Two-sided t-tests: ···indicates p < .01; ··indicates p < .05; ·indicates p < .10.

1,
as the estimated effect of being in the eaeh of the remaining categories relati ve to "None." The value that we choose to use as our referenee eategory
does not matter, as long as we interpret our results appropriately. But wc
can use the choice of the referenee category to focus on the relationships
in which we are particularly interested. For each possible pair of categories
of the independent variable, we can conduct a separate hypothesis test.
The easiest way to get aH of the p-values in which we are interested is
to estimate the model multiple times with different reference categories.
Table 11.3 displays a model of Hillary Clinton Thermometer seores with
the five different choices of referenee categories. It is worth emphasizing
that this is not atable with·five different models, but that this is atable
with the same model displayed five different ways. From this table we can
see that, when we control for the effects of income, sorne of the categories
of religious affiliation are statistically different from each other in their
evaluations of Hillary Clinton whereas others are noto This raises an interesting question: Can we say the effect of religion affiliation, controlling for
¡ncome, is statistically sigriificant? The answer is that it depends on which
categories of religious affiliation we want to compare.

,

210

.,D

211

Multiple Regression Models II: Crucial Extensions

TESTING INTERACTI~ HYPOTHESES WITH DUMMY
VARIABLES

Table 11.4. The effects of gender and feelings toward the
women's movement on Hillary Clinton Thermometer seo res

AII of the OLS models that we have examined so far have been additive
mode1s. To calculate the Y value for a particular case from an additive
model, we simply multiply each independent variable value for that case by
the appropriate parameter estimate and add these values together. In this
section we explore sorne interactive models. Interactive models contain at
least one independent variable that we create by multiplying together two
or more independent variables. When we specify interactive models, we are
testing theories about how the effects of one independent variable on our
dependent variable may be contingent on the value of another independent
variable. We will continue with our running example of modeling respondents' thermometer scores for Hillary Clinton. We begin with an additive
model with the following specification:
Hillary Thermometer¡ =

IX +

/31 Women's Movement Thermometer¡

+ /32Female¡ +

IX +

'

Independent variable

Additive m9(iel

Interactive model

Women's'Movement
Thermometer
Female

0.68'"
(0.03)
7.1:i"·
(1.37)

5.98""
(2.13)

0.76'·'
(0.05)
15.21""'
(4.19)
-0.13"
(0.06)
1.56
(3.04)

1466
.27

1466
.27

Women's Movement
Thermometer x Female '
,ID.tercept
n

R2

Notes: The dependent variable in both modela is the respondent's
theImometer scere tor HilJary Clinton. Standard errors in parentheses.
Two-sided t-tests: "··indicates p <:.OÚ '"indicates p < .06; "indicates p < .10.

U¡.

In this model we are testing the theory that respondents' feelings toward Hillary Clinton are a function of their feelings toward the women's
movement and their own gender. This specification seems pretty reasonable, but we a Iso' want to test an additional theory that the effect of feelings
toward the women's movement have' a stronger effect on feelings toward
Hillary Clinton among women than they do among meno Notice the difference in phrasing there. In essence, we want to test the hypothesis that the
slopc of the line representing the relationship between Women's Movement
Thermometer and Hillary Clinton Thermometer is steeper for women than
it is for meno To test this hypothesis, we need to create a new variable that
is the product of the two independent variables in our mode! and inelude
this new variable in our model:
Hillary Thermometer¡ =

11.3 Testing Interactive Hypotheses with Dummy Variables

/31 Women's Movement Thermometer¡

+/32Female¡ + /33(Women's Movement Thermometer¡ x Female¡) +

U¡.

By specifying our model as such, we have created two different mode!s
for women and meno Sowe can rewrite our model as
for Men (Female = O) : Hillary Thermometer¡ =
+ /31 Women's Movement Thermometer¡

IX

+ U¡;

for Women (Female = 1) : Hillary Thermometer¡

= IX

+ /31 Women's Movement Thermometer¡
+ (/32 + (33)(Women's Movement Thermometer¡) + U¡.

And we can rewrite the formula for women as
for Women (Female

= 1) : Hillary Thermometer¡ = (IX + (32)

+ (/31 + (33)(Women's Movement Thermometer¡) + "j.
What this all boils down to is that we are allowing our regression line to
be different for men and women. For men, the intercept is IX and the slope is
/31' For women, the intercept is IX +' /32 and the slope is /31 + /33' However, if
/32 = Oand /33 = O, then the regression lines for men and women will be the
same. Table 11.4 shows the results for our additive and interactive models
of the effects of gender and feelings toward the women's movement on
Hillary Clinton Thermometer scores. We can see from the interactive model
that we can reject rhe null hypothesis rhar /32 = O and the null hypothesis
that /33 = O, so our regression lines for men and women are different. We
can a1so see that the intercept for the line for women (IX + (32) is higher rhan
the intercept for men (IX). But, perhaps contrary to our expectations, the
estimated effect of the Women's Movement Thermometer for men is greater
than the effect of rhe Women's Movement Thermometer for women.
The besr way ro see the combined effect of all of the results from the
interactive model in Table 11.4 is to look at them graphically in a figure
such as Figure 11.3. From this figure we can see the regression lines for
men and for women across the range of the independent variable. It is cIear
from this figure that, although women are generalIy more favorably inclined
toward Hillary Clinton, this gender gap narrows when we compare those
individuals who fee! more positively toward the feminist movement.

t
-=:'

212

213

Multiple Regression Models II: Crucial Extensions

11.4 Dummy Dependent Variables

Table 11.5. The effects of partisanship and performance
evaluations on votes tor Bush in 2004

Independentv,arlable .

.
.

' .. P8.ram~terestimate
o.Og"·

Party Identification

(0.01)
Evaluation: War on Terror

O.OS···
(0.01)

Evaluation: Health of the Economy

O.OS"·
(0.01)
0.60'"
(0.01)

Intercept

780
.73
o

o

20

40

60

80

100

Feminist Thermometer

Notes: The dependent variable is equel to one if the respondent voted for
Bush end equel to zero if they voted for Kerry. Standard errors In parentheses.
Two-sided t-tests: '''indicates' p < .01; "lnclicates p < .05; *lndicates p < .10.

Figure 11.3. Regression lines from the interactive inodel.

M'"

DUMMY DEPENDENT VARIABLES

Thus far, our discussion of dummy variables has been limited to situations
in which the variable in question is ane of the independent variables in our
model. The obstacles in those models are relatively straightforward. Things
get a bit more complicated, however, when oue dependent variable is a
dummy variable.
Certainly, many of the dependent variables of theoretical interest to
political scientists are not continuous. Very often, this means that we need
to move to a statistical mode! other than OLS if we want to get reasonable
estima tes for our hypothesis testing. 'Qne exception to this is the linear probability mode! (LPM). The LPM is an OLS mode! in which the dependent
variable is a dummy variable. It is called a "probability" mode! because
we can interpret the Y values as "p¡:-edicted probabilities." But, as we will
see, it is not without problems. Because of these problems, most political
scientists do not use the LPM. We provide a brief discussion of the popular
alternatives to the LPM and then conclude this section with a discussion of
goodness-of-fit mea sures when the dependent variable is a dummy variable.
~ ii.4.1
"""l ,.

The Linear Probability Model

As an example of a dummy dependent variable, we use the choice that
most U.S. voters in the 2004 presidential e!ection made between voting
for the ¡ncumbent George W. Bush and his Democratic challenger John

Kerry.4 Our dependent variable, which we will call "Bush," is cqual to
one for respondents who reported voting for Bush and equal to zero for
respondents who reported voting for Kerry. For our model we theorizc
that the decision to vote for Bush or Kerry is a function of an individual's
partisan identification (ranging from -3 for strong Democrats to O for
independents, to +3 for strong Republican identifiers) and their :valuation
of the ¡ob that Bush did in handling the war on terror and the health of
economy (both of these evaluations range from +2 for "approve strongly"
to -2 for "disapprove strongly"). The formula for this mode! is:
Bush;

= a.+ f3tParty ID; + f3 2War Evaluation;
+ f33Economic Evaluationj + Uj.

Table 11.5 presents the OLS results from this mode!. We can see from
~he table that all of the parameter estimates are statistically significant
lO the expected (positive) direction. Not surprisingly, we see that people
4

.

There was only a handful of respondenrs to the NES who refused to reveal their vote
to the interviewers or voted for a different candidate. But there were a large number e
respondenrs .who reponed t~at they did not vote. By excluding all of these categories,
we are definmg the populatlon about which we want to make inferences as those who
voted for Kerry or Bush. lncluding respondents who voted for other candidates refuscd
to report their :vot~, or those who did not vote would amount to changing fr~m a dicho,tomous categoClcal dependent variable to a multichotomous categorical dependent
variable. The types of models used for this type of dependent variable are substantially
more complicated.

214

Multlple Regrssslon Models 11: Crucial Extenslons

who identi6ed with the Republican Party and who had more approving
evaluations of the president's handling of the war and the economy were
more likely to vote for 'him. This model performs pretty well overall, with
3n R 2 statistic equal to .73.
To examine how the interpretation of this model is different from that
of a regular OLS model, let's calculate sorne individual Yvalues. We know
from Table 11.5 that the formula for Yis

Yi = 0.6 + 0.09 x Party ID; + 0.08 x War Evaluation;
+ 0.08 x Economic Evaluation;.
For a respondent who reported being apure independent (Party ID ::= O)
with a somewhat approving evaluation of Bush's handling of the war on
terror (War Evaluation = 1) and a somewhat disapproving evaluanon of
Bush's handling of the health of the economy (Economic Evaluation = -1),
we would calcula te Yi as follows:

Y; = 0.6 + (0.09 x O) + (0.08 xl) + (0.08 x -l) = 0.6.
One logical way to interpret this predicted value is to think of it as a
predicted probability that the dummy dependent variable is equal to one.
Using the example for which we just calculated Yi , we would predict that
such an individual would ha ve a 0.6 probability (or 60% chance) of voting
for Bush in 2004. As you can imagine, if we change the values of our three
independent variables around, the predicted probability of the individual
voting for Bush changes correspondingly. This means that the LPM is a
special case of OLS for which we can think of the predicted values of the
dependent variable as predicted probabilities. From here on, we represent
predicted probabilities for a particular case as "P;" or" P()'; = l}" and we
can summarize this special property of the LPM as P; = P()'; = l} = Y;.
One of the problems with the LPM comes when we arrive at extreme
values of the predicted probabilities. Consider, for instance, a respondent
who reported being a strong Republican (Party ID = 3) with a strongly
approving evaluation of Bush's handling of the war on terror (War Evaluation = 2) and a somewhat strongly approving evaluation of Bush's handling of the health of the economy (Economic Evaluation = 2). For this
individual, we would calculate Pi as follows:

Pi = Yi =

0.6 + (0.09 x 3) + (0.08 x 2) + (0.08 x 2) = 1.19.

This means that we would predict that such an individual would have a
119% chance of voting for Bush in 2004. Such a predicted probability is,
oE course, nonsensical beca use probabilities cannot be smaller than zero
or grea~er than one. So, one of the problems with the LPM is that it can

215

11.4 Dummy Dependent Variables

produce such values. In the greater scheme of things, though, this problem
is not so severe, as we can ma,ke sensible interpretations of predicted values
higher than one or lower than zero - these are cases for which we are very
confident that probability is close to one (for Pi > 1) or close to zero (for
Pi < O).
To the extent that the LPM has potentially more serious problems,
they come in two forms - heteroscedasticity and functional formo We discussed heteroscedasticity in Chapter 9 when we noted that any time that
we estimate an OLS model we assume that there is homoscedasticity (or
equal error variance). We can see that this assumption is particularly problema tic with the LPM beca use the values of the dependent variable are all
equal to zero or one, but the Yor predicted values range anywhere between
zero and one (or even beyond these values). This means that the errors (or
residual values) will tend to be largest for cases for which the predicted
value is close to .5. Any nonuniform pattern of model error variance such
as this is called heteroscedasticity, which means that the estimated standard
errors may be too high or too low. We ca re about this beca use standard
errors that are too high or too low will ha ve bad effects on our hypothesis
testing and thus ultimately on our conclusions about causal relationships.
The problem of functional form is related to the assumption of parametric linearity that we also discussed in Chapter 9. In the context oE the
LPM, this assumption amounts tó saying that the impact of a one-unit
change in an independent variabl¿ X is equal to the corresponding parameter estimate ~ regardless of the value of X or any other independent
variable. This assumption may be particularly problematic for LPMs because the effect of a change in an iridependent variable may be greater for
cases that would otherwise be at 0.5 than for those cases for which the
predicted probability would otherwise be close to zero or one. Obviously
the extent of both of these problems will vary across different models.
For these reasons, the typical polítical science solution to having a
dummy dependent variable is to avoid using the LPM. Most applications
that you will come across in political science research will use a binomial
logit (BNL) or binomial probit (BNP) model instead of the LPM for mode1s in which the dependent variable is a dummy variable. I::~L and BNP
models are similar to regression models in many ways, but they involve an
additional step in interpreting them. In the next subsection we pro vide a
brief overview of these types of models.

:I!:~:~] Binomial Logit and Binomial Probit
In cases in which their dependent variable is dichotomous, most political
scientists use a BNL or a BNP model instead of a LPM. In this subsection we

j

,
!

.,,1

¡

216

Multiple Regresslon Models 11: Crucial Extenslons

217

provide a brief introduction to t~ese two models, using the same example
that we used for our LPM in the previous subsection. To understand these
models, let's first rewrite our LPM from our preceding example in terms of
a probability statement:


= P(Y; = 1) = a: +

(31 x Party ID¡ + (32 x War Evaluation¡ + (33

11.4 Dummy Dependent Variables

Table 11.6. The effects of partisanship and performance evaluations on
votes for Bush in 2004: Three different types of models
BNL>;

'Party Identift~tion

x Economic Evaluation¡ + u¡.
,

This is just a way of expressing the probability part of the LPM in a
formula in which "P(Y; = 1)" translates to "the probability that Y; is equal
to one," which in the case of our running example is the probability that
the individual cast a vote for Bush. We then further collapse this to
Pj

= P(Y; =

1)

= a: +

(31 Xli + (32X2j + (33X3j + Uj,

and yet further to

Ó.Og···
(0.01)

,

Ev81uation: Health of the Economy

Inte~cept

, "O.OS···
(0.01)
O.OS..·.'
(0.01) ,,',
0.60"·
(0.01) ,

.':."",

0.82···
(0.09) , '
0.60'"
(0.09)
0.59'"
(0.10)

1.11···
(0.20)

BNP

0.45···
(0.04)
0.32·'·
(0.05)
0.32·'·
(0.06)
0.5S o , .
(0.10)

Notes: The dependent variable is equal to one if the respondent votad for Bush and
equlÍl to zero lf they voted for Kerry: Standard errors in parentheses.
Two-sided signifl.cance tests: '''indicates p < .01; "indicates p < .05;
·indicates p < .10.

Pj = P(Y; = 1) = X; (3 +Uj,
where we define X; (3 as the systematic component of Y such that X; (3 = a: +
(31 Xli + (32X2i + (33X3j.5 The term u¡ continues to represent the stochastic
or random component of Y. So if we think about our predicted probability
for a given case, we can write this as

Y¡ = Pj = P(Y; = 1) = X;~

= &+ ~IXli + ~2X2¡ + ~3X3¡'

A BNL model with the same variables would be written as

The predicted probabilities from this model would be written as

Pj = P(Y; =

1)

= A(&+ ~IXli + ~2X2¡ + ~3X3j = A(X;~).

A BNP with the same variables would be written as
p¡ = P(Y;

= 1) =

el>(a: + (31 Xli + (32X2¡ + (33X3¡ + Uj)

= el>(X;(3 + U¡).

The predicted probabilities from this model would be written as



= P(Y;

= 1) = el>(&+ ~lXli + ~2X2¡ + ~3X3¡ =

el>(X;~).

The difference between the BNL model and the LPM is the A, and
the difference between the BNP model and the LPM is the el>. A and el> are
known as link functions. A link function links the linear component of a
5

This shorthand comes from matrix algebra. Although matrix algebra is a very useful tool
in statistics, it is not needed to master the material in this texto

logit or pro bit model, X;~, to the qu:mtity in which we are interested. the
predicted probability that the dummy dependent variable equals one P(Y; =
1) or Pj. A major result of using these link functions is that the relationship
between our independent and dependent variables is no longer assumed to
be linear. In the case of a logit model, the link function, abbreviated as
A, uses the cumulative logis tic distribution function (and thus the name
"logit") to link the linear component to the probability that Y; = 1. In
the case of the pro bit function, the link function abbreviated as <t> uses
the cumulative normal distribution function to link the linear component
to the predicted probability that Y; = 1. Appendices e (for the BNL) ami
D (for the BNP) provide tables for converting X;~ values into prcdicted
probabilities.
The best way to understand how the LPM, BNL, and BNP work
similarly to and differently from each other is to look at them all with
the same model and data. An example of this is presented in Table 11.6.
From this table it is apparent that across the three models the para meter
estimate for each independent variable has the same sign and significancc
leve!. But it is also apparent that the magnitude of these parameter estima tes
is different across the three models. This is mainly due to the difference of
link functions. To better ilIustrate the differences between the three models
presented in Table 11.6, we plotted the predicted probabilities from them
in Figure 11.4. These predicted probabilities are for an individual who
strongly approved of the Bush administration's handling of the war on
terror but who strongly disapproved of the Bush administration's handling

218

Multiple Regression Models 11: Crucial Extensions

219

11.4 Dummy Dependent Variables

Table 11.7. Classíficatiol1 table from LPM
of the effects of partisanship and
performance evaluations on votes for Bush

CIO

c:i

'~Model-based

Actual Vote

ex¡)ectations
',Kerry ,

Bush.
Kerry

28

355,
,

'Notes: ,Cell entrie~ are the n~eI of cases:;)

,.

Predictions ere balÍéd on a cutoff of t.; 0.6,'

N

c:i

o

Lr----,-----,----.,----,-----,-----r

-3
-2
StrDem
WkDem

-1

O

1

IndDem
Indlnd
IndRep
Party Identification

2

WkRep

3

StrRep

Figure 11.4. Three different models of Bush vote.

of the economy.6 The horizontal axis in this figure is this individual's party
identification ranging from strong Democratic Party identifiers on the left
end to strong Republican Party identifiers on the right end. The vertical axis
is the predicted probability of voting for Bush. We can see from this figure
that the three models make very similar predictions. The main differences
come as we move away from a predicted probability of 0.5.
The LPM Hne has, by definition, a constant slope across the entire
range of X. The BNL and BNP lines of predicted probabilities change
their slope such that they slope more and more gently as we move farther
from predicted probabilities of 0.5. The differences between the BNL and
BNP lines are trivial. This means that the effect of a movement in Party
Identification on the predicted probability is constant for the LPM. But
for the BNL and BNP, the effect of a movement in Party Identification
depends on the value oC the other variables in the model. It is important
to realize that the differences between the LPM and the other two types of
models are by construction instead of sorne novel finding. In other words,
our choice of model determines the shape of our predicted probability
lineo

6

These were the modal answers to the two evaluative questions that were included in the
model presented in Table 11.6. It is fairly common practice to illustrate the estimated
impact of a variable of interest from this type of model by holding all other variables
constant at their mean or modal values and then varying that one variable to see how the
predicted probabilities change.

t\l:~ Goodness-of-Fit with Dummy Dependent Variables
Although we can calculate an R2 statistic when we estimate a linear probability model, R2 doesn't quite capture what we are doing when want to
assess the fit of such a model. What we are trying to assess is the ability
of our model to separate our cases into those in which Y = 1 and those in
which Y = O. So it is helpful to think about this in terms of a 2 x 2 table
of model-based expectations and actual values. To figure out the model's
expected values, we need to choose a cutoff point at which we interpret
the model as predicting that Y = 1. An obvious value to use for this cutoff point is y> 0.5. Table 11.7 shows the results of this in what we caH a
classification tableo Classification tables compare model-based expectations
with actual values of the dependent variable;
In this table, we can see the differences between the LPM's predictions
and the actual votes reported by survey respondents to the 2004 NES. One
fairly straightforward measure of the fit of this model is to look at the
percentage of cases that were COW!ct:,.. classified through use the model. So
if we add up the cases corréctly classified and divide by the total number of
cases we get
correctly classified LPMo.s =

361 + 355
780

716

= 780 = 0.918

Soour LPM managed to correctly classify 0.918 or 91.8% of the respondents and to erroneously classify the remaining 0.082 or 8.2%.
Although this seems like a pretty high classification rate, we don't really
know what we should be comparing it with. One option is to compare our
model's classification rate with the c1assification rate for a naive model
(NM) that predicts that all cases will be in the modal category. In this
case, the NM would predict that all respondents voted for Bush. So, if we
calculate the correctly c1assified for the NM,
correctly classified NM =

361 + 36
397
780
= 780 = 0.509

220

Multiple Regression Models 11: Cruc~al Extensions

This means that the NM correctly classified 0.509 or 50.9% of the respondents and erroneously classified the remaining 0.491 or 49.1 %.
T urning now to the business of comparing the performance of our
mode! with that of the NM, we can calcula te the proportionate reduction
of error when we move from the NM to our LPM with party identification
and two performance evaluations as independent variables. The percentage erroneously classified in the naive mode! was 49.1 and the percentage
erroneously classified in our LPM was 8.2. So we have reduced the error
proportion by 49.1- 8.2 = 40.9. If we now divide this by the total error
percentage of the naive model, we get :~:~ = 0.833. This means that we
have a proportionate reduction of error equal to 0.833. Another way of
saying this is that when we moveq from the NM to our LPM we reduced
the c1assification errors by 83.3%.

.'1'

OUTLIERS AND INFLUENTIAL CASES IN OLS

In Section 6.4 we advocated using descriptive statistics to identify outlier
values for each continuous variabte. In the context of a single variable, an
outlier is an extreme value relative to the other values for that variable.
But in the context of an OLS mode!, when we say that a single case is an
outlier, we could mean several different things.
We should always strive to know our data well. This means looking
at individual variables and identifying univariate outliers. But ¡ust beca use
a case is an outlier in the univari:)te sense does not necessarily imply that
it will be an outlier inall senses of this concept in the multivariate world.
Nonetheless, we should look for outliers in the single-variable sense before
we run our models and make sure fhat when we identify such cases that they
are actual values and not values created by sorne type of data managementmistake.
In the regression setting, individual cases can be outliers in several
different ways:
1. They can have unusual independent variable values. This is known
as a case having large leverage. This can be the result of a single
case having an unusual value for a single variable. A single case can
also have large leverage because it has an unusual combination of
values across two or more variables. There are a variety of different
measures of leverage, but they all make calculations across the values
of independent variables in order to identify individual cases that are
particularly different.
2. They can have large residual values (usually we look at squared residuals to identify outliers of this variety).

221

11.5 Outliers and lnfluential Cases in OLS

3. They can have both large leverage and large residual values.
The re!ationship among these different concepts of outliers for a single
case in OLS is often summarized as separa te contributions to "influence"
in the following formula:
influence; = leverage; x residual;.
As this formula indicates, the influence of a particular case is determined
by the combination of its leverage and residual values. There are a variety
of different ways to measure these different factors. We explore a couple of
them in the following subsections with a controversial real-world example.

~E~itl Identifying Influential Cases
One of the most famous cases of outliers/influential cases in political data
comes from the 2000 U.S. presidential e!ection in Florida. In an attempt
to measure the extent to which ballot irregularities may have influenced
e!ection results, a variety of mode!s were estimated in which the raw vote
numbers for candidates across different counties were the dependent variables of interest. These models were fairly unusual because the para meter
estimates and other quantities that are most often the focus of our model interpretations were of little interest. Instead, these were models for which the
most interesting quantities were the diagnostics of outliers. As an example
of such a mode!, we will work with the following:
Bucham n;

= + I3Gore; + U;.
(X.

In this mode! the cases are individual counties in Florida, the dependent
variable (Buchanan;) is the number of votes in each Florida county for
the independent candidate Patrick Buchanan, and the independent variable
is the number of votes in each Florida county for the Democratic Party's
nominee Al Gore (Gore;). Such mode!s are unusual in the sen se that there
is no claim of an underlying causal re!ationship between the independent
and dependent variables. Instead, the theory behind this type of model is
that there should be a strong systematic re!ationship between the number of
votes cast for Gore and those cast for Buchanan across the Florida coun tieso 7
There was a suspicion that the ballot structure used in sorne counties especially the infamous "butterfly ballot" - was such that it confused sorne
voters who intended to vote for Gore into voting for Buchanan. If this
7 Most of the models of this sort make adjustments to the variables (for example, logging

the values of both the independent and dependent variables) to account for possibilities
of nonlinear relationships. In the present example we avoided doing this for the sake of
simplicity.

222

Multiple Regression Models 11: Crucial Extensions

223

11.5 Outliers and Influential Cases in OLS

o
o
o

Table 11.8. Votes for Gore and Buchanan in
Florida counties in the 2000 U.S. presidential
election

Independent variable .:
Votes for Gore

v

ePALMBEACH

o
o
o

• ParamEÍter estimate

..,

0.004··· .•

CI>

'5

>
Intercept

e
as
e

80.63'
(46.4) '.
67 ...

as
oC

o

O
O
C\I

o

::>

ce

.48

O
O
O

eBA wAAO

Notes: The dependent variable is the numbElr of votes for
Patrick Buchanan. Standard errors In parentheses.
Two-sided e-testS: '''lndicates p < .01; "lndicates p < .05;
'indicetes p < .10.

eOAOE

o

~~==~----r----------.-----------.----------,-

o

100000

200000

300000

400000

Gore Vote

was the case, we should see these counties appearing as outliers after we
estima te our modelo
We can see from Table 11.8 that there was indeed a statistically significant positive relationship between Gore and Buchanan votes, and that
this simple model accounts for 48% of the variation in Buchanan votes
across the Florida counties. But, as we said before, the more interesting inferences from this particular OL5 mode! are in the outlier/influence
of particular cases. Figure 11.5 presents a 5tata Ivr2plot (short for

eBAOWAAO

eOAOE

ePALMBEA( H

e IN ELLAS
e ILLSBOROUGH
IORANGE

o

o
Figure

p.S.

2

A

Nonnalized Residual Squared

~

Stata Ivr2plot for the model presented in Table 11.8.

.8

Figure 11.6. OLS line with scatter plot for Florida 2000.

"leverage-versus-residual-squared plot") that displays 5tata's measure of
leverage on the vertical dimension ánd a normalized measure of the squared
residuals on the horizontal dimensiono The logic of this figure is that, as
we move to the right of the vertical line through this figure, we are seeing
cases with unusually large residual values, and that, as we move aboye
the horizontalline through this figure, we are seeing cases with unusuaIly
large leverage values. Cases with bóth unusually large residual and leverage
values are highly influentia!. FroIrl this figure it is apparent that Pinellas,
Hillsborough, and Orange counties had large leverage val pes but not particularly large squared residual vatues, whereas Dade, Broward, and Palm
Beach counties were highly influential with both large leverage values and
large squared residual values.
We can get a better idea of the correspondence between Figure 11.5
and Table 11.8 from Figure 11.6, in which we plot the OL5 regression
line through a scatter plot of the data. From this figure it is clear that
Palm Beach was well aboye the regression line whereas Broward and Dade
counties were well below the regression lineo By any measure, these three
cases were substantial outliers and thus quite influential in our model.
A more specific method for detecting the influence of an individual
case involves estimating our mode! with and without particular cases to
see how much this changes specific parameter estimates. The resulting
calculatíon is known as the DFBETA score (Belsley, Kuh, and We!sch
1980). DFBETA scores are calculated as the difference in the parameter

~

224

Multiple Regression Models II: Cru~al Extenslons

225

11.6 Multlcollinearity

..

J

Table 11.9. The five largest
(absolute·value) DFBETA seores
for ¡) from the model presented
in Table 11.8
County

DFBETA

PalmBeach
Broward
Dade
Orange
Pinellas

6.993
-2.514
-1.772
-0.109
0.OS5

.

estimate without each case divided
by the standard error of the original parameter estimate. Table 11.9
displays the five largest absolute values of DFBETA for the slope parameter (/3) from the model presented in
Table 11.8. Not surprisingly, we see
that omitting Palm Beach, Broward,
or Dade has the largest impact on our
estimate of the slope parameter. By
any measure, these cases exerted considerable influence on our model.

fucie~E!ndent

8 This para meter estimate was viewed by sorne as an estimate oE how many votes the balIot

irregularities cost Al Gore in Palm Beach County. But if we look at Model4, where we
indude dummy variables for Browarq and Dade Counties, we can see the basis for an
argument that in these two counties there is evidence of bias in the opposite direction.

....~~~: ," :"

"',!\".

variáble

Modell

Mode12

Model3

Mode14

Mode15

Gore

0.004···
(0.0005)

0.003···
(0.0002)
2606.3···
(150.4)

0.003·"
(0.0002)

0.005'"
(0.0003)

Intercept

80.6'
(46.4)

110.S···
(19.7)

110.S···
(19.7)

0.005···
(0.0003)
2095.5·"
(110.6)
-1066.0'"
(131.5)
-1025.6···
(120.6)
59.0···
(13.S)

n
R2

67
.48

67
.91

66
.63

67
.96

64
.S2

PalmBeach
dummy
Broward
duÍnmy
Dadedummy

~H.5:~] Dealing With Infiuential Cases
Now that we have discussed the identification of particularly influential/outlier cases on our models, we turn to the subject of what to do once
we have identified such cases. The first thing to do when we identify a case
with substantial influence is to double-check the values of aH variables for
such a case. We want to be certain that we have not "created" an influential case through sorne error in our data management procedures. Once we
have corrected for any errors of data management and determined that we
still have sorne particularly influential case(s), it is important that we report
our findings about such cases along with our other findings. There are a variety of strategies for doing so. Table 11.10 shows five different models that .
reflect various approaches to reporting results with highly influential cases.
In Model1 we have the original results as reported in Table 11.8. In Model
2 we have added a dummy variable that identifies and isolates the effect of
Palm Beach County. This approach is sometimes referred to as dummying
out influential cases. We can see why this is caHed dummying out from the
results in Model 3, which is the original model with the observation for
Palm Beach County droppecl from the analysis. The para meter estima tes
and standard errors for the inter~ept and slope parameters are identical
from Mode1s 2 and 3. The only differences are the model R2 statistic, the
number of cases, and the additional parameter estima te reported in Mode1
2 for the Palm Beach County dummy variable. 8 In Model4 and Model5,

"

Table 11.10. Votes for Gore and Buchanan in Florida counties in the 2000:
U.S. presidential election
~

59.0'··
(13.8)

Notes: The dependent variable is the number oC votes Cor Patrick Buchanan.
Standard errors In parentheses.

Two-sided t·tests: ···indicates p < .01; ··lndicates p < .05; ·indicates p < .10.

we see the results from dummying out the three most influential cases and
then from dropping them out of the analysis.
Across aH five of the models shown in Table 11.10, the slope para meter
estimate remains positive and statistically significant. In most models, this
would be the quantity in which we are most interested (testing hypotheses
about the relationship :·~tween X and Y). Thus the relative robustness of
this parameter across model specifications would be comforting. Regardless
of the effects of highly influential cases, it is important first to know that
they exist and, second, to report accurately what their influence is and what
we have done about them.

_lB'

MULTICOLLINEARITY

When we specify and estimate a multiple OLS model, what is the interpretation of each individual parameter estima te? It is our best guess of
the causal impact of a one-unit increase in the relevant independent variable on the dependent variable, controlling for all of the other variables
in the model. Another way of saying this is that we are looking at the
impact of a one-unit increase in one independent variable on the dependent

226

Multiple Regression Models 1I: Crucial Extensions

variable when we "hold all other variables
constant." We know from Chapter 10 that
a minimal mathematical property for estimating a multiple 015 mode! is that there
is no perfect multicollinearity. Perfect multicollinearity, you will recall, occurs when
one independent variable is an exact linear
x
z
function of one or more other independent
variables in a mode!.
In practice, perEect multicoIlinearity is
Figure 11.7. Venn diagram with usually the result of a small number of cases
multicollinearity.
re!ative to the number of parameters we
are estimating, limited independent variable
values, or mode! misspecification. As we have noted, if there exists perfect multicollinearity, 015 parameters cannot be estimated. A much more
common and vexing issue is less-than-perfect multicollinearity. As a result,
when people refer to multicollinearity, they almost always mean "lessthan-perfect multicollinearity." From here on, when we refer to "multicollinearity," we will mean "high, but less-than-perEect, multicollinearity."
This means that two or more of the independent variables in the model are
extreme!y highly correlated with one another.
y

~i.(1J How Does Multicollinearity Happen?

Multicollinearity is induced by a small number of degrees of freedom andlor
high corre!ation between independent variables. Figure 11.7 provides a
Venn diagram iIIustration that is useful for thinking about the effects of
multicollinearity in the context oE an 015 regression mode!. As you can see
from this figure, X and Z are fairIy highly corre!ated. Our regression mode!
IS

1'; = ex. + 131 Xj

+ 132~ + Uj.

Looking at the figure, we can see that the R2 from our regression mode! will
be fairIy high
= a!i:~!b)' But we can see from this figure that the areas
for the estimation of our two slope parameters - area f Eor 131 and area b
for 132 - are pretty small. Because oE this, our standard errors for oue slope
parameters will tend to be fairIy large, which makes discovering statisticaIly
significant re!ationships more difficult, and we will have diEficulty making
precise inferences about the impacts oE both X and Z on Y. It is possible
that beca use of this problem we would conclude neither X nor Z has much
oE an impact on Y. But c1earIy this is not the case. As we can see from the
diagr~m, both X and Z are related to Y. The problem is that much of the

(R2

227

11.6 Multicollinearity

covariation between X and Y and X and Z is also covariation between X
. and Z. In other words,itis ,the: .~i~e ofarea d that i~ causing us problems.
We have precious little area in which to examine the effect of X on Y while
holding Z constant, and likewise, fhere is little leverage to understand the
effect of Z on Y while controlling for X.
It is worth emphasizing at this point that multicollinearity is not a statistical problem (examples of statistical problems inelude autocorrelation,
bias, and heteroscedasticity). Rather, multicoIlinearity is a data problem.
It is possible to have multicoIlinearity even when all of the assumptions oE
015 from Chapter 9 are va lid and all of the the minimal mathematical requirements for 015 from Chapters 9 and 10 have been meto So, you might
ask, what's the big deal about multicollinearity? To underscore the notion oE multicollinearity as a data problem instead of a statistical problem,
Christopher Achen (1982) has suggested that the word "multicollinearity"
should be used interchangeably with "micronumerosity." Imagine what
would happen if we could double or triple the size of the diagram in Figure 11. 7 without changing the reta ti ve sizes of any of the areas. As we
expanded aIl of the areas, areas f and b would eventually beco me large
enough for us to estimate accurate standard errors.
tn~~;~] Detecting Multicollinearity

Ir is very important to know when jou have multicoIlinearity. In particular,
it is important to distinguish situations in which estimates are statistically
insignificant because ."e relationships ¡ust aren't there from situations in
which estimates are statistically insignificant because of multicollinearity.
The diagram in Figure 11.7 shows,us one way in whichwe might be able
to detect multic9llinearity: If we have a high R2 statistic, but none (or
very few) of our parameter estimates is statistically significant, we should
be suspicious of multicollinerity. We should also be suspicious of multicoIlinearity iE we see that, when we add and remove independent variables
from our mode!, the parameter estimates for other independent variables
(and especially their standard errors) change substantially. If we estimated
the mode! represented in Figure 11.7 with just one of the two independent
variables, we would get a statistically significant' re!ationship. But, as we
know from the discussions in Chapter 10, this would be problematic. Presumably we have a theory about the re!ationship between each of these
independent variables (X and Z) and our dependent variable (Y). So, although the estimates from a mode! with just X or just Zas the independent
variable would help us to detect multicollinearity, they would suffer from
bias. And, as we argued in Chapter 10, omitted-variables bias is asevere
problem.

228

Multiple Regression Models 11: Crucial Extensions

~ulticollinearity

So, to simulate multicollinearity, we are going to create a population
with the following characteristics:

1
VIF¡ = (1- RJ)'

We can see from the description of our simulated population that we
have met all of the OLS assumptions, but that we have a high correlation
between our two independent variables. Now we will conduct a series of
random draws (samples) from this population and look at the results fram
the following regression models:

rii~6:3'; Multicollinearity:
-. '

11.6

A more formal way to diagnose multicollinearity is to calculate the
variance inflation factor (VIF) foÍ each of our independent variables. This
calculation is based on an auxiliary regression model in which one independent variable, which we will c~ll X¡, is the dependent variable and all of
the other independent variables are independent variables. 9 The R2 statistic
from this auxiliary model, RJ' is then used to calculate the VIF for variable
i as follows:

Many statistical programs report the VIF and its inverse (V}F) by default.
The inverse of the VIF is sometimes referred to as the tolerance index measure. The higher the VIF ¡ value, or the lower the tolerance index, the higher
will be the estimated variance of X¡ in our theoretically specified mode!. Another useful statistic to examine is the square root of the VIF. Why? Because
the VIF is measured in terms of variance, but most of our hypothesis-testing
inferences are made with standard errors. Thus the square root of the VIF
provides a useful indicator of the impact the multicollinearity is going to
have on hypothesis-testing inferences.

'" ~-

229

A Simulated Example

Thus far we have made a few scattered references to simulation. In this subsection we make use of simulatiQn to better understand multicollinearity.
Almost every statistical computer program has a set of tools for simulating data. When we use these too'ls, we have an advantage that we do not
ever have with real-world data: We can know the underlying "population" characteristics (beca use we create them). When we know the population parameters for a regression model and draw sample data from
this population, we gain insights into the ways in which statistical models
work.
9 Studenrs facing 015 diagnostic procedures are oiren surprised that the first thing that we

do after we estimate our theoretically specified model of interest is to estimate a large
set of atheoretical auxiliary models to test the properties of our main mode!. We will
see that, although these auxiliary models lead to the same types of output that we get
from our main model, we are ofren inrerested in only one particular part of the results
from the auxiliary mode!. Wirh our "main" mode! of inrerest, we have learned that we
should inelude every variable that our theories teH us should be ineluded and exelude .aIl
other variables. In auxiliary models, we do not foHow this rule. Instead, we are runmng
these models to test whether certain properties have or have not been met in our original
mode!.

1. Two variables Xli and X2j such that the correlation rXIi,X2I = 0.9.
2. A variable Uj randomly drawn from a normal distribution, centered
around with variance equal to 1 [Uj "" N(O, 1)].
3. A variable Y; such that Y; = 0.5 + 1Xti + 1X2j + "j.

°

Model1: Y;

= ex+ (3¡Xti +

Model2: Y; = ex + (31 Xli

(32 X2j + "j,

+ "j,

Model3: Y; = ex+ (32X2i +Uj.
In each of these random draws, we increase the size of Qur sample starting with S, then 10, and finally 25 cases. Results from models estimatcd
with each sample 0: <lata are displayed in Table 11.11. In the first column
of results (n = S), we can see that both slope parameters are positive, as
would be expected, but that the parameter estimate for Xl is statistically
insignificant and the parameter estimate for X2 is on the borderline oí statistical significance. The VIF statistics for both variables are equal to 5.26,
indicating that the variance for each para meter estima te is substantially inflated by multicollinearity. The model's intercept is statistically significant
and positive, but pretty far írom what we know to be the true popula tion
value íor this parameter. In Models 2 and 3 we get statistically significant
positive parameter estimates for each variable, but both of these estimatcd
slopes are almost twice as high as what we know to be the true population parameters. The 95% confidence interval for ~2 does not include the
true population parameter. This is a clear case of omitted-variables bias.
When we draw a sample of 10 cases, we get closer to the true population
parameters with ~l and eX in Model1. The VIF statistics remain the same
because we have not changed the underlying relationship between Xl and
X2 • This increase in sample size does not help us with the omitted-variables
bias in Models 2 and 3. In fact, we can now reject the true population
slope parameter for both models with substantial confidence. In our third
sample with a sample of 25 cases, Model 1 is now very close to our true
population model, in the sense oí both the parameter values and that alI of

231

Multiple Regression Models Il: Crucial Extensions

230

, TabIe 11.11. Random draws of increasing size from a

11.6 Multicollinearity

Table 11.12. Pairwise correlations bctwcen indcpendent variables

.

population with substantial multicoJlínearity

Sample:
Estimate

n=5 '

-'

Modell:

~1
~2
&

. R2
VIF1
VI F2

Sample:. "
Sample:
n = 10 ,,',c ~';; 25

0.546
(0.375)
1.422'
(0.375)
1.160"
(0.146)
.99
5.26
5.26

':
"

0~882

.

,

.......

(0.557)
1.450"
(0.557)
0.912··.. : '.
(0.230)
.93
5.26
5.26

Model2:

~1
&
R2

&
R2

1.012"
(0.394)
L324'"
(0.394)
"""'.0.579'"
(0.168)
".89
.5.2f?
5.26

-:,

1.827"
(0.382)
1.160"
(0.342)
.88

2.187'"
(0.319)
0.912"
(0.302)
.85

2.204'"
(0.207»
0.579'"
(0.202)
.83

1,914'"
(0.192)
1.160'"
(0.171)
.97

2.244'"
(0.264)
0.912'"
(0.251)
.90

2.235'"
(0.192)
0.579'"
(0.188)
.86

Model3:

~2

,

,.,

.•.."

=

Notes: The dependent variable is 11 .5 + lXll + lX2I + UI·
Standard enOla in parentheses. Two·sided t·tests:
"'indicates p < .01; "indicates p < .05; 'indicates P < .10.

these parameter estimates are statistically significant. In Mode!s 2 and 3,
the omitted-variables bias is even more pronounced.
,
The findings in this simulation exercise mirror more general.findmg~
in the theoreticalliterature on OLS mode!s. Adding more data wtll allevlate multico/linearity, but not omitted-variables bias. We now turn to an
example of multicollinearity with real-world data.

:.'1'1..6.4 : Multicollinearity: A Real-World Example
' ••..,.......... '

Buab Th~rJn. ">;;1.00 ',- ..

InabIne

"~O.09~'

Ideology
Education
PBrtyID

:.:' .0.56''':

1.&6;:,
0.13·....

7:0.07·~·..

0.44'0;,

·.·:.,.0.69...•·

0.15'"

,1.00: .
-0.06'., 1.00
0.60"~~'·; 0.06"

.....
..- ..
Notes: CeD entrlea are·.conelátion coefftcienta. Two-sided
"'

..

. ......; . . .

"indicates 'p <.05;·indicates

1.00

'1

.

"

t~teats: "'lÍ1dicates p < ,01;

p< '.10. '

is the following:
Bush Thermometer¡

= ex + (31Income¡ + (32Ideology¡ + (33Education¡
+ (34Party ID; + U¡.

Although we have distinct theories about the causal impact of each independent variable on peoples' feelings toward Bush, Table 11.12 indica tes
that sorne of these independent variables are substantialIy correlated with
each other.
In Table 11.13, we present estima tes of our model using three different
samples from the NES 2004 data. In Model 1, estimated with data from
20 randornly chosen respondents, we see that none of our independent
variables are statistically significant despite the rather high R2 statistic.
The VIF statistics for Ideology and Party ID indicate that multicollinearity
might be a problem. In Model 2,estimated with data from 74 randomly
chosen respondents, Party ID is highly significant in the expected (positive)
direction whereas Ideology is near the threshold of statistical significance.
None of the VIF statistics for this Ínodel are stunningly high, though they
are greater than 1.5 for Ideology, Education, and Party m.t° FinaIly, in
Model 3, estimated with all 820 respondents for whom data on all of the
variables were available, we see that Ideology, Party ID, and Education
are aH significant predictors of peoples' feelings toward Bush. The sample
size is more than sufficient to overcome the VIF statistics for Party ID and
Ideology. Of our independent variables, only Income remains statisticalIy
insignificant. Is this due to multicollinearity? After aH, when we look at
Table 11.12, we see that income has a highly significant positive correlation
with Bush Thermometer scores. For the answer to this question, we need
to go back to the lessons that we learned in Chapter 10: Once we control

_~_J

In this subsection, we estimate a mode! of the thermomete~ sco.res
for U.S. voters for George W. Bush in 2004. Our mode! speclficatiOn

10 When we work with real-world data, there tend to be many more changes as we move

from sample to sample.

"

'·····1
232

233

Multiple Regression Models 11: Crucial Extensions

Table 11.13. Model results from random draws of increasing
size from the 2004 NES

.'D

ldeology

Education

PartyID

Intercept
n

Jil

~·t':~'~:·:,9·:72'"

0.77' .
(0.90) ,';.;~;~ :•. (0.51)
{1.63} . "" :::?{1.16}
7.02
....•....... 4.57'
(5.53)
,,,'.-,; (2.22)
{3.50}':<'."{1.78}
-6.29
-2.50
(3.32)
(1.83)
{1.42}
{1.23}
8.44'"
6.83
(3.98)
(1.58)
{3.05}
{1.70}
21.92
12.03
(23.45)
(13.03)

."~ '- '<,

.71

{1.2M

821

74

20

. 0.11
(0.15)
4.2E¡···
,'. (0.67)
{1.58}
-1:88 0 "
(0.55)
{1.22}
10.00'"
(0.46)
{1.56}
13.73'"
(3.56)

.56

.57

Notes: The d\!lpendent variable is the the ~espondent's thermometer
scere tor George W. Bush. Standard errara in parentheses;
VIF statistics in braces.
Two-slded f-tests: ••• indicates p < .01;" indicates p < .05;
• indicates p < .10.

BErnG CAREFUL WlTH TIME SERIES

In recent years there has been a massive proliferation of valuable timeseries data in political science. Although this growth has led to exciting
new research opportunities, it has also been the source of a fair amount
of controversy. Swirling at the center of this controversy is the danger of
spurious regressions that are due to trends in time-series data. As we will
see, a failure to recognize this problem can lead to mistakes about inferring
causality. In the remainder of this section we first introduce time-series
notation, discuss the problems of spurious regressions, and then discuss
the trade-offs involved with two possible solutions: the lagged dependent
variable and the differenced dependent variable.

lndependent variable '
lncorne

11.7 Being Careful with Time Series

Ir~<'''''''''~~

f&t..1:t'j

Time-Series Notation
In Chapter 4 we introduced the concept of a time-series observational study.
Although we have seen sorne time-series data (such as the Ray Fair data set
used in Chapters 8-10), we have not been using the mathematical notation
specific to time- series data. Instead, we have been using a generic notation in
which the subscript i represents an individual case. In time-series notation,
individual cases are represented with the subscript t, and the numeric valuc
of t represents the temporal order in which the cases occurred, and this
ordering is very likely to matter. 1l Consider the following OLS population
model written in the notation that we have worked with thus far:

Y; = a.+ f3¡Xti + f32X2j
for the effects of Ideology, Party 10, and Education, the effect of income
on peoples' feelings toward George W. Bush goes away.
(!!~iüi¡ Multicollinearity: What Should 1 Do?

In the introduction to this section on multicollinearity, we described it as a
"common and vexing issue." The reason why multicollinearity is "vexing"
is that there is no magical statistical cure for it. What is the best thing to do
when you have multicollinearity? Easy (in theory): Co/lect more data. But
data are expensive to collect. If we had more data, we would use them and
we wouldn't have hit this problem in the first place. So, if you do not have
an easy way increase your sample size, then multicollinearity ends up being
something that you ¡ust have to live with. It is important to know that you
have multicollinearity and to present your multicollinearity by reporting
the results of VIF statistics or what happens to your model when you add
and drop the "guilty" variables.

+Uj.

If the data of interest were time-series data, we would rewrite this mode! as

In most political science applications, time-s'eries data occur at regular
intervals. Common intervals for political science data are weeks, months,
quarters, and years. In fact, these time intervals are important enough
that they are usually front-and-center in the description of a data seto For
instance, thedata presented in Figure 2.1 would be described as a "monthly
time series of presidential popularity."
Using this notation, we talk about the observations in the order in
which they carne. As such, it is often useful to talk about values of variables
relative to their lagged values or lead values. Both lagged and lead values
are expressions of values relative to a current time, which we call time t. A
11 In cross-sectional data sets, it is almost always the case that the ordcring of the cases is

irrelevant to the analyses being conducted.

234

235

Multlple Regression Models II: Crucial Extensions

lagged value of a variable is the value of the variable from a previous time
periodo For instance, a lagged value from one period previous to the current
time is referenced as being from time t-1. Alead value of a variable is the
value of the variable from a future time periodo For instance, alead value
from one period into the future from the current time is referenced as being
from time t+1. Note that we would not want to specify a mode! with a
leading value for an independent variable beca use this would amount to a
theory that the future value of the independent variable exerted a causal
influence on the pasto
t-

I

: ~I~~2J

¡
r.

i
r



,

Memory and Lags in Time-Series Analysis

You might be wondering what, asid e from changing a subscript from an.i
to a t, is so different about time-series modeling. We would like to bring
special attention to one particular feature of time-series analysis that sets it
apart from modeling cross-sectional data.
Consider the following simple mode! of presidential popularity, and
assume that the data are in monthly form:

11.7 Being Careful with Time Series

in the minds of the public, and, indeed, sorne collective "forgetting" oecurs.
But surely this does not happen in a single:: m~nth.
And let's be elear what the problems are with a modellike the preceding
simple model of approval.lf we are convinced that at least sorne past values
of the economy still have effects today, and if at least sorne past values of
international pea ce still have effects today, but we instead estima te only
the contemporary effects (from period t), then we have committed omittedvariables bias - which, as we have emphasized over the last two chapters,
is one of the most serious mista~es a social scientist can make. Failing
to account for how past values of our independent variables might affect
current values of our dependent variable is a serious issue in time-series
observational studies, and nothing quite like this issue· exists in the crosssectional world. In time-series anaJysis, even if we know that Y is caused
by X and Z, we still have to worry about how many past lags of X and Z
might affect Y.
The elever reader might have a ready response to such a situation:
Specify additional lags of our independent variables in our regression
models:


PopularitYt = a + (3 1EconomYt + (32PeaCet +

Ut.

where Economy and Peace refer to sorne measures of the health of the
national economy and international peace,respectively. Now look at what
the model assumes, quite explicitly. A president's popularity in any given
month t is a function of that month's economy and that month's leve! of
international peace (plus sorne random error term), and nothing e/se, at any
!Joints in time. What about last month's economic shocks, or the war that
ended three months ago? T:hey are nowhere to be found in this equation,
which means quite literally that they can have no effect on a president's
popularity ratings in this month. Every month - according to this model the public starts from scratch evaluating the president, as if to say, on
the first of the month: "Okay, let's just forget about last month. Instead,
let's check this month's eco.nomic data, and also this month's international
conflicts, and render a verdict on whether the president is doing a good
¡ob or not." There is no memory from month to month whatsoever. Every
independent variable has an immediate impact, and that impact lasts exactly
one month, after which the effect immediate!y dies out entirely.
This is preposterous, of course. The publi~ does not erase its collective
memory every month. Shifts in independent variables from many months in
the past can have lingering effects into current evaluations of the president.
In most cases, we imagine that the effects of shifts in independent variables
eventually die out over a period of time, as new events become more salient

Popularityt

= a+ (31EconomYt + (32EconomYt_l +

(33EconomYt_2

+(34EconomYt_3 + (3sPeacet + (36PeaCet-l + (37PeaCet-2
+(3sPeaCet-3 +

tlt·

This is, indeed, one possible solution to the question of how to incorporate
the lingering effects of the past on the presento But the model is getting a
bit unwieldy, with lots of parameters to estimate. More important, though,
it lea ves several questions unanswered:

1. How many lags of the independent variables should we inelude in
our model? We have ineludecÍ lags from period t though t - 3 in the
preceding specification, but how do we know that this is the correct
choice? From the outset of the book, we have emphasized that you
should have theoretica/ reasons for ineluding variables in your statistical models. But what theory tells with any specificity that we should
inelude 3, 4, or 6 periods' worth of lags of our independent variables
in our models?
2. If we do inelude several lags of aH of our independent variables in
our models, we will almost surely induce multicollinearity into them.
That is, Xt, Xt-h and X t -2 are likely to be highly correlated with
one another. (Such is the nature of time series.) Those models, then,
would have all of the problems associated with high multicoHinearity

I
,

.,..

236

Multiple Regression Models II: Crucial Extensions

237

just identified - in particular, large standard errors and the adverse
consequences on hypothesis testing.
Before showing two alternatives to saturating our models with lots of
lags of variables, we need to confront a different problem in time-series
analysis.

11.7 Being Careful with Time Series

r-------------------------------------------~85
16.000

.---

........
!

¡;j

14.000

~

80

"'

......

-- ..... - ... '.

; 12.000

t~{7..il Trends and the Spurious Regression Problem
When discussing presidential popularity data, it's easy to see how a time
series might have a "memory" - by which we mean that the current values
of a series seem to be highly dependent of its past values.u Sorne series
have memories of their pasts that are sufficiently long to induce statistical
problems. In particular, we men~ion one, called the spurious regression
problem. 13
By way of example, consider the following facts: In post-World War
11 America, golf became an increasingly popular sport. As its popularity
grew, perhaps predictably the number of golf courses in America grew
to accommodate the demand for places to play. That growth continued
steadily into the early 21st centqry. We can think of the number of golf
courses in America as a time series, of course, presumably one on an annual
metric. Over the same period of time, divorce rates in America grew and
grew. Whereas divorce was formerly an uncommon practice, today it is
commonplace in American society. We can think of family structure as
a time series, too - in this case, the percentage oí households in which a
married couple is present. 14
And both of these time series - likely for different reasons - have long
memories. In the case of golf courses, the number oí courses in year t
ohviously dependsheavily on the number of courses in the previous year.
In the case of divorce rates, the dependence on the past presumably stems
from the lingering, multiperiod influence oí the social forces that lead to
divorce in the first place. Both the number of golf facilities in America and
the percentage oí families in which a married couple is present are shown
12 In any time series representing sorne form of public opinion, the word "memory" is a

particularly apt term, though its use applies to all other time series as well.
13 The problem of spurious regressions was something that economists like John Maynard

Keynes worried about long before it had been mathematically demonstrated by Granger
and Newbold (1974) in the 19705. Their main source of concern was the existence of
general trends in a variable over time. To be clear, the word "trend" obviously has
several popular meanings. In time-series analysis, though, we generally use the word
trend to refer to a long-Iasting movement in the history of a variable, not a temporary
drift in one direction or another.
14 For.the purposes of this illustration, we are obscuring the difference between divorce and
unmarried cohabitation.

,5

oS:

t

10.000

IL

~

---

'O
lO 8.000

i

o-o

'.

0

0

'. .........

6.000

50

Golf Facilities - - _. Marrled as Pereentage of Households

I

Figure 11.8. The growth of golf and the decline of the American family, 1947-2002.

in ~igure 11.8. And it's clear that, consistent with our description, both
vana.bles have trends. In the case of golf facilities, that trend is upward; for
marnage, the trend is down.
15

~at's th~ problem here? Any time one time series with a long memory IS placed 10 a regression model with another series that also has a
long memory, it can lead to falsely finding evidence of a causal connection
between the two variables. This is known as the "spurious regression problem." If we take the demise of marriage as our dependent variable and use
golf faci.lities as our independent variable, we would surely see that thcse
two vana bies
. are related, statistically. In substantive terms, we might be
tempte d to Jump to the conclusion that the growth of golf in America has
led to the breakdown of the nuclear family. We show the results of that
regression in Ta~le 11.14. !he dependent variable there is the percentage
of households wlth a marned couple, and the independent variable is the
number of golf courses (in thousands). The results are exactly as fearcd.
For every ~~ousa.nd golf facilities built in the United States, there are 2.53 %
fewer f~mlhes wlth a married couple presento The R2 statistic is quite high,
suggestmg that roughly 93% of the variance in divorce rates is explaincd
by the growth of the golf industry.
15 The National Golf Foundation kindly gave us the data on golf facilities. Data

structure are from the Current PopuJation Reports.

On

family

238

239

Multiple Regression Models 11: Crucial Extenslons

We're quite sure that sorne of
you. - presumably nongolfers - are
nodding your heads and thinking,
Coefftcient
"But maybe golf does cause divorce
Variable
(Std. Err.)
rates to rise! Does the phrase 'golf
.- -2.53'
Golf Facilities
widow' ring a bell?" But here's the
, (0.09)
problem with trending variables, and
Constant
91.36'
why
it's such a potentially nasty prob(1.00)
lem in the social sciences. We could
56
substitute any variable with a trend
.'.93
in it and come to the same "concluNote: 'lndicates p·<:'.05;
sion." To prove the point, let's take
another example. Instead of examining the growth of golf, let's look at a
different kind of growth - economic growth. In post-war America, GDP
has grown steadily, with few interruptions in its upward trajectory. Figure
11.9 shows GDP, in annual terms, along with the now-familiar time series of the decline in marriage. Obviously, GDP is a long-memoried series,
with a sharp upward trend, in which current values of the series depend
extreme!y heavily on past values.

The spurious regression problem
has sorne bite here, as well. Using Divorce as our dependent variable and
Coefficj,ent .
GDP as our independent variable,
. Variable'
(Std.Err.)
the regression results in Table 11.15
GDP (in trillions)
-2:71'
show a strong, negative, and statisti(0,16) ,
cally significant relationship between
Constant '
74.00' ..
the two. This is not oeeurring be(0,69)
cause higher rates of eeonomic outn
56 ,: .. ,
put have led to the destruction of the
R2
'.84':"
American family. It is oecurring beNote: 'indicates p < .05.
cause both variables have trends in
them, and a regression involving twO
variables with trends - even if they
are not truly associated - will produce spurious evidence of a relationship.
The two issues just mentioned - how to deal with lagged effects in a
time series and whether or not the spurious regression problem is relevant are tractable ones. Moreover, new solutions to these issues arise as the
study of time-series analysis becomes more sophisticated. We subsequently
present two potential solutions to both problems.
Table 11.15. GDP and the
decline of the family, 1947-2002

Table 11.14. Golf and the decline
of the family, 1947-2002

;r ...........,.. ¡
r-----------------------------------------------------~85
10.000
80

......
'"
8.000
ti

~"

.... -"-" 0_- ..
"

"-

~
~o

.

'. .

"

6.000

o

::!

e

C)



'.

4.000

'.,

::>

"

..
"

2.000

..

o

#~~#~##~~~~~~~#~~~~~~#~~\~~#,#
Vear

1-- GDP

. " . . Married as PIIfC8"1age 01 Householdsl

Figure 1l.9. The growth oE the U.S. econorny and the decline oE the farnily, 1947-2002.

11.7 Being Careful with Time Serdas

~!!:'1:~,,¡

The Differenced Dependent Va.riable

One way to avoid the problemsof spurious regressions is to use a differenced dependent variable. We calculate a differenced (or, equivalently,
"first differenced") variable by subtracting the first lag of the variable (Yi-t)
from the current value Yi. The resulting time series is typically represented
as Ll Yi = Yi - Yi-l'
In faet, when time series have long memories, taking first differences of
both independent and dependent variables can be done. In effect, instead of
Yi representing the levels of a variable, Ll Yi represents the period-to-period
changes in the leve! of the variable. For many (but not all) variables with
such long memories, taking first differenees will elimina te the visual pattern
of a variable that just seems to keep going up.
Figure 11.10 presents the first differences of the ~umber of golf courses
in the United States, as well as the first differences of the U.S. annual divoree
rates. You will notice, of eourse, that the time series in these figures look
drastically different from their eounterparts in levels from Figure 11.8. In
fact, the visual "evidence" of an association between the two variables that
appeared in Figure 11.8 has now vanished. The misleading culprit? Trends
in both time series.

t
~I

240

241

Multiple Regression Models II: Crucial Extensions

Di'S1I The Lagged Dependent Variable

700

600

0.5

,

,,

500

t

..
,~

):

~400
~
'O

1300

~
O

!

0:200

11.7 Being Careful with Time Series

)

:: ,,,':,
,
"
,," : .

.. .
...



,' ,,
"
\

, ,
, ,
,
,'
"
"
"

.:

"
"
," ,
r.

"
'1
,'1

I

,,

,
,, .~,

""
~

~

'.

""

,'! "
"~ "
""
"
.'
'~'
I

... .
.

"



,"" ,
,, ,,,



"

i

I

.

o :!l!

o "" I

.\.

,,

'

.!!I
..
;

"l'

Yt = cx+ 13 0Xt + 13 1Xi-1 + ... + 13kXi-k + Ut.

-O.5D..

'O

'.'

-1

I
i5

~
-1.5

100

Consider for a moment a simple two-variable system with our familiar
variables Y and X, except where, to allow for the possibility that previous
lags of X might affect current levels of Y, we inelude a large number of lags
of X in our model:

This mode1 is known as a distributed lag model. Notice the slight shift in
notation here, in which we are subscripting our f3 coefficients by the number
of periods that that variable is lagged from the current value; hence, the f3
for Xi is 130 (because t - O = t). Under such a setup, the cumulative impact
13 of X on Y is
k

o

4

~~~#'##~#~~~~~~§#;~~~~$~;~~#~
Yuar

1_ FlrsI DUlerence 01 Golf F - . - - - • FIraI DUlerence o. Pert8ntaga Manfed I
Figure 11.10. First differences of the number oí golf courses and percentage of roarried
faroilies,1947-2002.

Because, in these cases, taking first differences of the series removes
the long memories from the series, these transformed time series will not be
subject to the spurious regression problem. But we caution against thoughtless differencing of time series. In particular, taking first differences of time
series can eliminate sorne (true) evidence of an association between time
series in certain circumstances.
_
We recommend that, wherever possible, you use theoretical reasons
to either difference a time series or to analyze it in levels. In effect, you
should ask yourself if your theory about a causal connection between X
and Y makes more sense in levels or in first differences. For example, if
you are analyzing budgetary data from a govemment agency, does your
theory specify particular things about the sheer amount of agency spending
(in which case, you would analyze the data in leve1s), or do es it specify
particular things about what causes budgets to shift from year to year (in
which case, you would analyze the data in first differences)?
It is also worth noting that taking first differences of your time series
does not directly address the issue of the number of lags of independent variables to inelude in your models. For that, we tum to the lagged-dependentvariable specification.

13 = 13 0+ 131 + 132 + ... + f3k = L 13;.
;=0

It is worth emphasizing that we are interested in that cumulative impact of
X on Y, not merely the instantaneous effect of Xi on Yt represented by the
coefficient 130'
But how can we capture the effects of X on Y without estimating such
a cumbersome mode1like the preceding one? We have noted that a mode!
like this would surely suffer from multicollinearity.
If we are willing to assume that the effect of X on Y is greatest initially
and decays geometrically each period (eventually, after enough periods, becoming effectively O), then a few steps of algebra would yield the following
mode1 that is mathematically identical to the preceding one. 16 That model
looks like

'Y, = >"'Y,-l + cx.+ 13 0Xt +

Vt.

This is known as the Koyck transformation, and is commonly referred
to as the lagged-dependent-variable model, for reasons we hope are obvious. Compare the Koyck transformation with the preceding equivalent
distributed lag model. Both have the same dependent variable, Yt. Both have
a variable representing the immediate impact of Xi on Yt. But whereas the
distributed lag mode1 also has a slew of coefficients for variables representing all of the lags of 1 through k of X on Yt, the lagged-dependent-variable
mode1 instead contains a single variable and coefficient, >.. Yt-1' Beca use,
as we said, the two setups are equivalent, then this means that the lagged
16 We realize that the mode! does not look mathematicalIy identical, but it is. For ease of

presentatíon, we skip the algebra necessary to demonstrate the equívalence.

242

Multiple Regression Models II: Crucial Extensions

243

dependent variable does not represent how Yr-l somehow causes Yr, but
instead Yr-l is a stand-in for the cumulative effects of al! past lags of X
(that is, lags 1 through k) on Yr. We achieve al! of that through estimating
a single coeíficient instead of a very large number of them.
The coefficient A, then, represents the ways in which past values of X
affect current values of Y, which nicely sol ves the problem outlined at the
start of this section. Normally, the values of A will range between and
1.17 You can readily see that if A = then there is literally no effect oí past
values of X on Yr. 5uch values are uncommon in practice. As A gets larger,
that indicates that the effects of past lags of X on Yr persist longer and
longer into the future.
In these models, the cumulative effect of X on Y is conveniently described as

°

,,

¡

I

°

/3=~.
1-A

f

Examining the formulá, we easily see that, when A = 0, the denominator is
eqllal to 1 and the cumulative impact is exactly equal to the instantaneous
impacto There is no lagged effect at al!. When A = 1, however, we run into
problems; the denominator equals 0, so the quotient is undefined. But as
A approachcs 1, you can see that the cumulative effect grows. Thus, as
the values of the coefficient on the lagged dependent variable move from
toward 1, the cumulative impact of changes in X on Y grows.
This brief foray into time-series analysis obviously just scratches the
surface. When reading research that uses time-series techniques, or especially when embarking on your own time-series analysis, it is important
to be aware of both the issues of how theeffects of shifts in independent
variables can persist over several time periods, and also of the potential
pitfalls of long-memoried trends.

~

°

_18:1

WRAPPING UP

Even in its simplest varieties, OL5 regression - and especial!y multiple
OLS regression - can be complicated enough. What we've encountered
in this chapter shows that there are additional (but not insurmountable!)
obstacles to overcome when we consider that sorne of our theories involve
noncontinllous variables and that there are sorne unique obstacles involving
time-series data.
17 In fact, values close ro 1, and especially those greater than 1, indicate that there are

pr';lblems with the model, most likely related to trends in the data.

Concepts Introduced in This Chapter

Sorne of these techniques might seem intimidating at first, but we
encourage you to press onward. _One way to do this is to see how these
techniques work in actual examples of political science research. In our
final chapter, we examine three pieces of research that attempt to answer
compelling theoretical questions.
CONCEPTS INTRODUCED IN TRIS CHAPTER

additive models
auxiliary regression
model
binomiallogit
binomial probit
c1assifieation table
cumulative impaet
DFBETA seore
differeneed (or "first differeneed")
dependent variable
distributed lag model
dummying out
dummy variables
dummy-variable trap
instantaneous effeet

interactive models
Koyek transformation
lagged dependent variable
lagged values
lead values
leverage
linear probability model
link funetions
micronumerosity
multicollinearity
predieted probability
proportionate reduction of error
reference category
spurious regression problem
variance inflation factor

245

12.2 Economics and Approval Ratings

Economic
Reality

ID

Multiple Regression lVIodels 111:
Applications

OVERVIEW

In this chapter, we show how pqlitical scientists across a variety of subfields have used multiple regression to test theories about causal process es
involving polítics. In particular, we show how clever research design and
accompanying statistical analyses uncover interesting pattems of causal
dynamics in the domain of Amer¡.can presidential popularity, in economic
voting in European democracies, and in pattems of intemational conflict
and cooperation. One of the core ppnciples of this chapter - and of the bookis that solid research design is a prerequisite for insightful data analysis.
o

Mej'

WHY CONTROLLING FOR ZMATTERS

Thus far, the discussion of the effects of multiple regression has been mostly
abstract, referring to X, Y, and ~ instead of to real social phenomena that
most polítical scientists - and most political science students - ca re about.
Sorne of this abstraction is necessary, of course, but we want you to see
examples from actual research on how multiple regression is used, and how
including new variables can change our theoretical conclusions. That is the
goal of this chapter. What you will see, in these varying examples, is how
introducing new control s for variables can change our inferences about the
causal structure of the political world, which is what we have emphasized
throughout this book.
To highlight the changes that result to parameter estimates when we
change our model by adding a new variable, we show only a small portion
of the results from the different models that we feature in this chapter. It
is worth pointing out that these results come from models that contained
(and thus controlled for) other variables.

244

-el'

~

Presidential
Approval

Figure 12.1. A simple causal model of the
relationship between the economy and presidential popularity.

EXAMPLE 1: THE ECONOMY AND PRESIDENTIAL POPULARITY

Al! ~f you, we suspect, are familiar with presidential popularity (or presidentlal approval) polls. Presidential popularity, in fact, is one of the great
resources that presidents have at their disposal; they use approval as leverage i.n bargaining situations.1t is not easy, after al!, to say "no" to a popular
presldent. In contrast, unpopular presidents are often not influential presidents. Hence all presidents ca re about their approval ratings.
But why do approval ratings fluctuate, both in the short term and the
long term? What systematic forces cause presidents to be popular or unpopular over time? Since the early 1970s, the reigning conventional wisdom
held that economic reality - usually measured by inflation and unemploym~nt rates - drove approval ratings up and down. When the economy was
dolOg well - that is, when inflation and unemployment were both low the pres~dent enjoyed high approval ratings; and'when the economy was
performlOg poorly, the opposite was true. l That conventional wisdom is
represented graphically in Figure 12.1. Considerable amounts of research
over many years supported that wisdom.
. In the early 1990s, however, a group of three political scientists ques~lOned the traditional understanding of approval dynamics, suggesting that
It w~s ~ot actua~ econom;c reality that influenced approval ratings, but the
pubhc s perceptlOns of the economy - which we usually call consumer confidence (see MacKuen, Erikson, and Stimson 1992). Their logic was that it
doesn't matter for a president's approval ratings if inflation and unemployme~t are low if people don't perce;ve the economy to be doing well. Their
revIsed causal model is presented in Figure 12.2.
What these researchers needed to do, then, was to test the conventional
wisdom to see how ir held up to a control for a new variable. By use of
quarterly survey data from 1954:2 through 1988:2, this is what they did.
Table 12.1 re-creates a portion of MacKuen, Erikson, and Stimson's Table
2. In c?lumn A, we see a confirmation of the conventional wisdom. (Can
you thlOk why the authors might inelude a column in their tables like this?)
You should think of this column of results as testing the causal model in
h~s al~ays been the ~ase that scholars have recognized other systematic causes oE
presldentlal approval ratmgs, including scandals, international crises and batde E t l't
W ~
. h·
I
'
a a I ICS.
e OCUS, m t 15 examp e, exclu5ively on the economy {or simplicity oE presentation.

1 It

O

246

Multiple Regression Models III: Applications

Table 12.1. Excerpts from the table of MacKuen,
Erikson, and Stimson on the relationship between the
economy and presidential popularity
Variable ,:'

Inflation'
ChangeinUnemplQ~éñt'

B

0.87*
0.82'
(0,04) ,: "(0.04)
-0.39· , -0.17
(0.13) " : (0.13)
-'-1.51~,;,:·h: 0.62
(0;74)
(0.91)

Consumer C~nftdence

0.21~

. (.05) ,
.93
126 ,',

.94
117

Note: Standard errors are in párentheses; • ~ p < 0.05.
Other variables were estimated es a part of the regression model
but were excluded trom this table for ees,e of presentation.

Figure 12.1. The coefficient for the inflation rate, -0.39, indicates that,
for every 1-point increase in the inflation rate, presidential approval will
immediately fall by 0.39 points, on average, controlling for the effects
of unemployment (and other variables in their model, which we do not
show). Aceording to the table, the ratio of the coefficientto the standard
error places this effeet easily past the threshold of statistical significanee.
Similarly, column A presents the results for the effects of changes in the
llnemployment rate on presidential approval. The slope of -1.51 indicates
that, for every 1-pointincrease in the unemployment rate, presidential
approval falls by 1.51 points, on average, controlling for the effects of
inflation (and other variables that we do not show). This parameter estimate
is also statistieally signifieant.
Notiee also the presence of a laggeddependent variable in the model,
labeled Approvalt_l' Reealling our
discussion
of time-series issues in
Economic
Reality
Chapter 11, we find that the coefPresidential
ficient of 0.87, which is statistically
/APproval
significant, indicates that 87% of the
\
effects of a shift in one of the indepenConsumer
dent variables persists into the followConfidence
ing periodo Thus the effects of shifts
Figure 12.2. A revised mode! of presi- in X do not die out instantly; rather,
a large portion of those effects persist
dential popularity.

247

12.2 Economics and Approval Ratings

into the future. 2 What this means is that, for example, the coefficient for
Inflation of ~0.39 represents only the immediate effects of inflation, not
the cumulative effects of Inflation. The cumulative effect for Inflation, as
we learned in Chapter 11, is equal to the immediate impact divided by one
minus the coefficient for the lagged dependent variable, or,

/30
-0.39
(3 = 1 _ A = 1 _ 0.87 = -3.0.
The immediate impact of -0.39, then, considerably understates the total
impact of a shift in the Inflation rate, whieh, beca use of the strong dynamics
in the model- the value of the lagged dependent variable, 0.87, is a lot c10ser
to 1 than it is to O - is eonsiderably more impressive in substantive terms.
A 1-point shift in the Inflation rate eventually costs a president 3 points of
approval.
In short, the first column of data in Table 12.1 provides sorne confirmation for the conventional wisdom. But the results in eolumn A do not
control for the effects of Consumer Confidence. The results from when
MacKuen, Erikson, and Stimson did control for Consumer Confidence are
provided in column B ofTable 12.1. Notice first that Consumer Confidence
has a eoeffieient of 0.21. That is, for every 1-point inerease in Consllmer
Confidence, we expeet to see an immediate increase in presidenríal approval
of 0.21 points, controlling for the effects of Inflation and Unemployment.
This effect is statistically significant. 3
Notice also, however, what happens to the coefficients for Inflation
and Unemployment. Comparing the estimated effects in eolumn A with
those in column B reveals sorne substantial differenees. When there was
no control for Consumer Confidence in column A, it appeared that Inflaríon and Unemployment had modesfly strong and statistically significant
effects. But in éolumn B, the coefficients change beca use of the control for
Consumer Confidence. The effect of unemployment shrinks from -0.39 to
-0.17, which reflects the control for Consumer Confidence. The effect is
not c10se to being statistieally significant. We can no longer reject the null
hypothesis that there is no relationship between Inflation and presidential
Approval.
The same thing happens to the effect for the Change in the Unemployment rate. In column B, when Consumer Confidence is controlled for,
the effect for the Change in Unemployment changes from -1.51 to 0.62,
Indeed, in the seeond period, 0.872 of the effeet of a shift in X at time t remains, and
0.873 remains in the thirCl period, and so forth.
3 Again, notiee that the eumulative effeet of a l-point shift in Consumer Confidenee will
be larger, beeause of the strong dynamics in the model represented by the lagged value of
the dependent variable.
2

248

Multiple Regression Models III: Appl~cations

a substantial reduction in magnitude, but al so a change in the dire~ti~n of
the relationship. No matter, because the coefficient is no longer st~tl~tlcally
significan~, which 'me~~swe cannot ~eject the null hypothesis that It IS truly
·th F'
12 2
zero.
The second column ofTable 12.1, then, is consistent WI
Igure..,
which shows no direct connection between Economic Reality and presldential Approval. There ¡s, however, direct connection b.etween Cons~mer
Confidence and Approval ratings. In this case, introduclDg a new vanable
(Consumer Confidence) produced yery. different findings about a con~ept
(Economic Reality) that scholars had thought for decades exerted a dlrect

a

causal influence on Approval.

Mfl' EXAMPLE 2: POLITICS, ECONOMICS, AND PUBLIC SUPPORT
FOR DEMOCRACY

states in Eastern
Aft erthe Co Id War ended in 1989, the formerly communist
.
r' l
Europe began making transitions toward both democratIc ~o lOca systems
and, simultaneously, market economies. But as the euphona t~at foHowed
the faH of the Berlin WalI and the lifting of the Iron Curta~n beg~n ~o
fade into memory, most of these countries experienced g~o",:mg pams I.n
the early stages of democratization and the shift to. a capltah~t economlC
system. Bu t not all of the former Warsaw Pact natlons expenenced, these
growing pains equally. In the process, the transition from ~ommu,msm to
capitalism, and from autocracy to democracy, appeared,qu~te fraglle:
For decades, political scientists, especially those "':I~h IDterests m the
cross-national study of polities, have investigated why clt,lzens express support for democracy. Citizens in some fledgling democracles have bee~ very
supportive of the democratization process, whereas others, accordmg to
re not at all optimistic about the benefits of democracy.
"
h'b'
survey resu Its , a
One of the primary explanations for why different socletles ~x ,1 I~ mor~
or less enthusiasm for democratic government has been the v~n~tlO.n m eco
,
'ces 4 In particular jn so-called transition soctetles hke those
1 . l
nomlc expenen . '
,
E
f which bo~t-. democracy and markets are re atIve y
ID Eastern urope, or
'1'
, , ,
'd b
,
Przeworski (1991) argued that pohtlclans motlvate Y
new expenences,
,
. l r
electoral success will promise much, but be able to dehver relatIv~ y It;
tIe, Citizens' sentiments about democracy, then, are often a functlOn o
"f
l P
sk'l (1991) Before economic explanations for support for
See or examp e, r z e w o r ,
,
1 th
heo'
ame to prominence cultural explanations were dommant, n ose t
democracy c
'
I'k l
h
'ti s that
ries countries with a history of a democratic culture were ~~st I e y to ~ve CI z;:; ond
sup~orted democratic government, and hence to have thnvmg democracle5, See m
and Yerba (1963),

249

12.3 Support for Democracy

the public's experiences with the market economy and how these actual
experiences differ from their expectations. If the public's expectations are
exceeded, they will be supportive of democratization; if not, support wiII
wane. As the economic pie grows, so the thinking goes, then more and more
ordinary citizens have a stake in the stability and success of the democratic
politicaI arrangement. In essence, the public is linking the benefits (or lack
thereof) of democratization to economic success.
On the other hand, some researchers have theorized that citizens' evaluations of the pros and cons of democracy wiII be based primarily on visible
evidence that the new democratic institutions are working to represent citizens' interests. That is, instead of relying on economic evaluations, citizens
base their evaluations of democracy on how well they see the polity functioning. If people perceive that government works to benefit everyone, that
everyone has a meaningful voice in elections, that government cares what
they want and is responsive to their needs, then they will be supportive oE
democracy. But if, in contrast, people perceive government to be benefiting
a narrow range of interests, responsive to the few instead of the many, and
uncaring, then people will not view democracy positively at all. In essence,
it is political factors, not economic ones, that determine how people fee!
about democracy.
Geoffrey Evans and Stephen Whitefield investigated these questions in
a 1993 survey in nine countries, formerly a part of the Soviet bloc, all of
which were new democracies (Evans and Whitefield 1995). In each of the
nine countries - Bulgaria, Estonia, Hungary, Lithuania, PoJand, Romania,
Russia, and Ukraine - Evans and Whitefield asked a representative samplc
of citizens about their experiences with the economy, about their experiences with their new democratic government, and about their overalI commitment to democracy in their country. In aH, they surveyed approximately
15,000 respondents. Their dependent variable, commitment to democracy,
is measured by the folIowing survey question:
How do you feel about the aim of introducing democracy in [respondent's
country), in which parties compete for government? Are you a ... strong
supporter, supporter, opponent, strong opponent, neither supporter nor
opponent?
Evans and Whitefield performed a multivariate regression analysis to
examine whether polítical or economic factors are the best predictors of
support for or opposition to democratic government. A portion of their
resuJts is shown in TabJe 12.2. The resuJts in column A of the table are
consistent with the theory that economic evaluations wiII affect support
for democracy, showing that increases in a respondent's perceptions of the
performance of the free market Jead to increases in his or her support for

250

Multlple Regression Models 111: Applications

Table 12.2. Excerpts from the table of Evans and
Whitefield on the relationship between the economy
and support for democracy
Parameter
Market evaluatian
Democratic·evaluatión

R2
n
Note: The C()~fftcÍ.e~tá ~ ~h~\ai?le~e~t;andardized; • = p < 0.05. .
Other varlabi~~ weié e~timated ;;S- a part of the regression model
but were excludéd from this table for ease of presentation.

¡
t

f

democracy. The coefficients in the table are standardized, so the coefficient
of 0.14 means that a l-standard-deviation increase in a respondent's evaluation of the market is associated with a 0.14-standard-deviation increase
in support for democracy, controlling for the other factors in the mode!
(which are not shown ro keep the table a manageable size).
Compare the results in column A, however, with those in column B,
which ineludes a control for how the respondent evaluates the workings of
democracy. Two things are notable. First, because the parameter estimates
are standardized, we can compare their magnitudes to see which effect is
stronger. In this case, the effect of a respondent's evaluation of democracy
is four times as large (0.20) as their evaluation of the market (0.05) in
shaping their overall support for deinocratic government. Both effects, it
is worth keeping in mind, remain statistically significant predictors of a
citizen's support for democracy; their political evaluations, though, are a
stronger predictor than are their economic evaluations.
Second, notice how the parameter estimate for the evaluation of the
market shrank from column A to column B. The reduction from 0.14 to
0.05 indicates that, once the effect of a person's be!iefs about the functioning
of democracy is controlled for, the effect of their views of the market fall
by roughly two-thirds.
Reflecting back on the lessons learned in Chapter 10, we see that the
results here clearly indicate that the re!ationship between citizens' attitudes
about markets and their attitudes about the functioning of democracy are
re!ated, and that both of these contribute to support for or opposition to
democracy as a system of government (at least in these nine developing
democracies). It is equally clear that, given the discrepancy between the
results in colamns A and B of Table 12.2, the situation more e1ose!y resembles that of Figure 10.1, in which the variances of the two independent

251

12.4 Palitlcs and lntemational Trade

variables overlap, than that of Figure 10.2, in which they do not. As a result,
omitting any one of the variables - ás we oinitted democratic-evaluations
in column A - is like!y to produce misleading-estimates about the variables
in the mode!.

-fl' EXAMPLE 3: COMPETING THEORIES OF HOW POLITICS
AFFECTS INTERNATIONAL TRADE

What are the forces that affect inte~ationai trade? Economists have long
noted that there are economic forces that shape the extent to which twO
nations trade with one another. 5 The sizeof each nation's economy, the
physical distance between them, and the overalllevel of deve!opment have
all been investigated as economic causes of trade. 6 But in addition to economic forces, does politics help to shape international trade?
Morrow, Siverson, and Tabares (1998) investigate three competing
(and perhaps complementary) political explanations for the extent to which
two nations engage in international trade. The first theory is that sta tes with
friendly re!ations are more like!y to tcade with one another than are sta tes
engaged in conflicto Conilict, in this sense, need not be militarized disputes
(though it may be)? Conflict, they argue, can dampen trade in several ways.
First, interstate conflict can sometimes produce embargoes (or prohibitions
on trade). Second, conflicr can reduce trade by ráising the risks for ñrms
that wish to engage in cross-border trading.
The second theory is that trade will be higher when both nations
aredemocracies and lower when one (or both) is an autocracy.8 Because
democracies have more open political and judicial systems, trade should
be higher between democracies because firms in one country will have
greater assurance that an)' trade disputes will be resolved openly and fairly
in courts to which they have access. In contrast, firms in a democratic state
may be more re!uctant to trade with.nondemocratic countries, beca use it
is less certain how any disagreements will be resolved. In addition, firros
may be wary of trading with nondemocracies for fear of having their assets
s Theories of trade and, indeed, many theories about other aspects of international tracle
are usually developed wirh pairs of nations in mind. Thus all of!he relevant variables, like
nade, are measured in terms of pairs of nations, which are onen referred te as "clyacls"
by international relations scholars. The resulting dyadic data sets are often quite large
beca use they encompass each relevant pair of nations.
6 Such moclels are charmingly referred to as "gravity models," because, accorcling to these
rheories, the forces driving trade resemble che forces that determine gravitational attraction between two physical objects.
7 See Pollins (1989) for an extended discussion of this theory.
8 See Dixon and Moon (1993) for an elaboration of this theory.

..t

252

253

Multiple Regression Models III: Applications

increases in interstate peace are associated with higher amounts of trade
between countries, controlling for economic factois. In addition, the larger
the economy in general, the more trade there is. (This finding is consistent
across aH estimation equations.) The results in colurno B indicate that pairs
of democracies trade at higher rates than do pairs involving at least one
nondemocracy. FinaHy, the results in column e show that trade is highcr
between alliance partners than between sta tes that are not in an alliance
with one another. AH of these effects are statistically significant.
So far, each of the theories received at least sorne support. But, as you
can tell from looking at the table, the results in columns A through e do not
control for the other explanations. That is, we have yet to see results oi a
fuHy multivariate model, in which the theories can compete for explanatory
power. That situation is rectified in column D, in which all three political
variables are entered in the same regression model. There, we see that
the effects of reduced hostility between sta tes is actuaHy enhanced in the
multivariate context - compare the coefficient of 1.12 with the multivariate
1.45. Similarly, the effects of democratic trading partners remains almost
unchanged in the fuHy multivariate framework. However, the effect of
alliances changes. Before controlling for conflict a~d democracy, the effect
of alliances was (as expected) positive and statistically significant. However,
in column D, in which we control for conflict and democracy, the effcct
nips signs and is now negative (and statistically significant), which means
that, when we control for these factors, sta tes in an alliance are less (not
more) likely to trade with one another.
The artic1e by Morrow, Siverson, and Tabares represents a case in
which synthesizing severalcompeting explanations for the same phcnomenon - international trade - produces surprising findings. By using
a data set that allowed them to test aH three theories simultaneously,
Morrow, Siverson, and Tabares were able to sort out which theories rcceived support and which did not.

Table 12.3. Excerpts from the table of Morrow, Siverson, and
Tabares on the political causes oI in ternatiollal trade
Parameter

Peaceful relations

1.12" ..'.
(0.22)

",;-, ;;"""·,-'i~ ~-,,:

····~1.1S-

Democratic partners

. (0.12)

Alliance partners
GNP of exporter

:'.0.67.: . ,(0.07)

R2
n

.77
2631

;:0.57~

':.(0.07)
SS
21>31

0.29(0.03)
0.6S*.
(0.07)

.1.45(0.37)
1.22(0.13)
-0.50·
(0.16)
0.56(O.OS)

.7S

.77

2631

2631

Note: Standard errora are in parentheses; • =p < .05.
Other variables were estimated as a part of the regression
model but were exc1uded from this table rr ease of presentation.

;J', <

seized by the foreign government. In short, trading with an autocratic
government shouId raise the perceived risks of internationaI trade.
The third theory is that sta tes that are in an alliance with one another
are more likely to trade with one another than are sta tes that are not in
such an aHiance. 9 For states that are not in an alliance, one nation may be
reluctant to trade with another nation if the first thinks that the gains from
trade may be used to arm itself for future conflicto In contrast, sta tes in an
aIliance stand to gain from each other's increased weaIth as a resuIt from
trade.
To test these theories, Morrow, Siverson, and Tabares Iook at trade
among aH of the major powers in the international system - the United
States, Britain, France, Germany, ~ussia and Italy - during most of the
twentieth century. They consider each pair of states - caHed dyads - separately and examine exports to eacJt country on an annual basis. lO Their
dependent variable is the amount of exports in every dyadic relationship in
each year.
Table 12.3 shows excerpts from the analysis of Morrow, Siverson,
and Tabares.u In column A, they show that, as the first theory predicts,
9 See Gowa (1989) and Gowa and Mansfield (1993) for an extended discussion, including

distinctions between bipolar and multipolar organizations of the international system.
10 This research design is often referred to as a time-series cross-section design, beca use

it contains both variation between unitsand variation across time. In this sense, it is a
hybrid of the two types of quasi-experiments discussed in Chapter 3.
11 Intcrpreting the precise magnitudes of the parameter estima tes is a bit tricky in this case,
beca use the independent variables were aU transformed by use of naturallogarithms.

12.5 Conclusions

MeJ)

CONCLUSIONS

One of the fascinating features of social science research is that new theoretical developments come along from time to time to replace old theories.
That is, a scholar questions the received wisdom, proposes an alterna te
explanation, and then investiga tes how his or her new explanation fits in
with the old ones. Sometimes, the explanations are complementary; that
is, a new explanation does not replace the old one so much as it adds to
the oId one. Other times, however, the new explanations contradict the old
ones, and the evidence ends up favoring the new theory. In this chapter, we
have seen sorne examples of each type, as well as one example - the one
involving international trade - in which a new analysis synthesized three

1

1
1

i

1
1
1
1
J
¡

"e

54

1

Multiple Regression Models 111: Applications

theories that had not previously been tested against one another. These are
sorne of the typical uses ofmultiple regression in political science.

CONCEPTS INTRODUCED IN THIS CHAPTER

consumer confidence
dyadic data

\
APPENDlXA

Critical Values of X2

presidential popularity

EXERCISES

1. Find an exarnple in the politicalliterature in which the introduetion of a new
variable changes the results of a regression rnodel. Explain how this result has
an irnpacted on our understanding of the relevant causal relationships.
2. In footnote 4 we diseussed how previous theories about public support for
dernocraey were about culture. Is the researeh design of Evans and Whitefield
(1995) appropriate for testing such a theory? Why or why not?
3. In Table 12.1, colurnn A, calculate the curnulative dfeet of a 1 % inerease in
the unemployrnent rate on a president's approval rating.

Level of significance
df

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
20
25
30
35
40
50
60
70
75
80
90
100

255

.10
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
17.275
18.549
19.812
21.064
22.307
28.412
34.382
40.256
46.069
61.805
63.167
74.397
85.527
91.061
96.678
107.565
118.498

.05
3.841
5.991
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307
19.675
21.026
22.362
23.685
24.996
31.410
37.652
43.773
49.802
55.768
67.505
79.082
90.531
96.217
101.879
113.145
124.342

.025
5.024
7.378
9.348
11.143
12.833
14.449
16.013
17.535
19.023
20.483
21.920
23.337
24.736
26.119
27.488
34.170
40.646
46.979
63.203
59.342
71.420
83.298
95.023
100.839
106.629
118.136
129.661

.01
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
24.725
26.217
27.688
29.141
30.578
37.566
44.314
50.892
67.342
63.691
76.154
88.379
100.425
106.393
112.329
124.116
135.807

.001
10.828
13.816
16.266
18.467
20.515
22.458
24.322
26.125
27.877
29.588
31.264
32.910
34.628
36.123
37.697
45.316
62.620
59.703
66.619
73.402
86.661
99.607
112.317
118.599
124.839
137.208
149.449

I

APPENDlX

APPENDlXB

Translating negative X¡ ~ values into predicted probabilities

Level of significance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
20
25
30
40
50
60
70
75
80
90
100
00

.10

.05

3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.325
1.316
1.310
1.303
1.299
1.296
1.294
1.293
1.292
1.291
1.290
1.282

6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.725
1.708
1.697
1.684
1.676
1.671
1.667
1.665
1.664
1.662
1.660
1.645

!

The A Link Function for BNL ModeIs

CriticaI VaIues of t



e

.025
12.706
4.303
3.182
2.776
2.5n
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.086
2.060
2.042
2.021
2.009
2.000
1.994
1.992
1.990
1.987
1.984
1.960

.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.528
2.485
2.457
2.423
2.403
2.390
2.381
2.377
2.374
2.368
2.364
2.326

.005
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.845
2.787
2.750
2.704
2.678
2.660
2.648
2.643
2.639
2.632
2.626
2.576

.001
318.313
22.327
10.215
7.173
5.893
5.208
4.782
4.499
4.296
4.143
4.024
3.929
3.852
3.787
3.733
3.552
3.450
3.385
3.307
3.261
3.232
3.211
3.202
3.195
3.183
3.174
3.090

Xl~ -0.00

-4.5
-4.0
-3.5
-3.0
-2.5
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-.9
-.8
-.7
-.6
-.5
-.4
-.3
-.2
-.1
-.0

-0.01

0.0110
0.0180
0.0293
0.0474
0.0759
0.1192
0.1301
0.1419
0.1545
0.1680
0.1824
0.1978
0.2142
0.2315
0.2497
0.2689
0.2891
0.3100
0.3318
0.3543
0.3775
0.4013
0.4256
0.4502
0.4750
0.5000

0.0109
0.0178
0.0290
0.0470
0.0752
0.1182
0.1290
0.1406
0.1532
0.1666
0.1809
0.1962
0.2125
0.2297
0.2479
0.2670
0.2870
0.3079
0.3296
0.3521
0.3752
0.3989
0.4231
0.4477
0.4725
0.4975

-0.02
0.0108
0.0176
0.0287
0.0465
0.0745
0.1171
0.1279
0.1394
0.1519
0.1652
0.1795
0.1947
0.2108
0.2279
0.2460
0.2650
0.2850
0.3058
0.3274
0.3498
0.3729
0.3965
0.4207
0.4452
0.47áo
0.4950

-0.03
0.0107
0.0175
0.0285
0.0461
0.0738
0.1161
0.1268
0.1382
0.1506
0.1638
0.1780
0.1931
0.2092
0.2262
0.2442
0.2631
0.2829
0.3036
0.3252
0.3475
0.3705
0.3941
0.4182
0.4428
0.4675
0.4925

-0.04
0.0106
0.0173
0.0282
0.0457
0.0731
0.1151
0.1256
0.1371
0.1493
0.1625
0.1765
0.1915
0.2075
0.2244
0.2423
0.2611
0.2809
0.3015
0.3230
0.3452
0.3682
0.3917
0.4158
0.4403
0.4651
0.4900

-0.05' -0.06
0.0105
0.0171
0.0279
0.0452
0.0724
0.1141
0.1246
0.1359
0.1480
0.1611
0.1751
0.1900
0.2059
0.2227
0.2405
0.2592
0.2789
0.2994
0.3208
0.3430
0.3659
0.3894
0.4134
0.4378
0.4626
0.4875

0.0104
0.0170
0.0277
0.0448
0.0718
0.1130
0.1235
0.1347
0.1468
0.1598
0.1736
0.1885
0.2042
0.2210
0.2387
0.2573
0.2769
0.2973
0.3186
0.3407
0.3635
0.3870
0.4110
0.4354
0.4601
0.4850

-0.07
0.0103
0.0168
0.0274
0.0444
0.0711
0.1120
0.1224
0.1335
0.1455
0.1584
0.1722
0.1869
0.2026
0.2193
0.2369
0.2554
0.2749
0.2953
0.3165
0.3385
0.3612
0.3846
0.4085
0.4329
0.4576
0.4825

1

A

-0.08
0.0102
0.0166
0.0271
0.0439
0.0704
0.1111
0.1213
0.1324
0.1443
0.1571
0.1708
0.1854
0.2010
0.2176
0.2351
0.2535
0.2729
0.2932
0.3143
0.3363
0.3589
0.3823
0.4061
0.4305
0.4551
0.4800

-0.09
0.0101
0.0165
0.0269
0.0435
0.0698
0.1101
0.1203
0.1312
0.1431
0.1558
0.1694
0.1839
0.1994
0.2159
0.2333
0.2516
0.2709
0.2911
0.3112
0.3340
0.3566
0.3799
0.4037
0.4280
0.4526
0.4775

(continuad)

256

257

The A Link Function for BNL Models

:58

Appendix

e (continued)

Translating positiva XI ~ values into pradictad probabilitias PI

r; ~ +0.00
).0
).1
).2
).3

+0.05

+0.06

+0.07

+o.oa

APPENDIXD
+0.09

0.5572

0.5744

~::~~~ ~::~~! ~::~~~ ~::~!~ ~::~~: ~::~~~

0.5915

~::~~~ ~::~~~

0.6457
0.6682
0.6900
0.7109
0.7311
0.7503
0.7685
0.7858
0.8022

0.6248
0.6479
0.6704
0.6921
0.7130
0.7330
0.7521
0.7703
0.7875
0.8038

0.6271
0.6502
0.6726
0.6942
0.7150
0.7350
0.7540
0.7721
0.7892
0.8053

0.6295
0.6525
0.6748
0.6964
0.7171
0.7369
0.7558
0.7738
0.7908
0.8069

0.6318
0.6548
0.6770
0.6985
0.7191
0.7389
0.7577
0.7756
0.7925
0.80a5

0.6341
0.6570
0.6792
0.7006
0.7211
0.7408
0.7595
0.7773
0.7941
0.8100

0.6365
0.6593
0.6814
0.7027
0.7231
0.7427
0.7613
0.7790
0.7958
0.8115

0.6615
0.6835
0.7047
0.7251
0.7446
0.7631
0.7807
0.7974
0.8131

0.6411
0.6637
0.6857
0.7068
0.7271
0.7465
0.7649
0.7824
0.7990
0.8146

0.8507
0.8629
0.8744
0.8849
0.9269
0.9543
0.9718
0.9827
0.9894

0.8520
0.8641
0.8754
0.8859
0.9276
0.9548
0.9721
0.9829
0.9895

0.8532
0.8653
0.8765
0.8870
0.9282
0.9552
0.9723
0.9830
0.9'9'

0.8665
0.8776
0.8880
0.9289
0.9556
0.9726
0.9832
O....,

0.8557
0.8676
0.8787
0.8889
0.9296
0.9561
0.9729
0.9834
0.9898

r

0.5646 0.5671

~::~:

0.8468
0.8594
0.8710
0.8818
0.9248
0.9530
0.9710
0.9822
0 ....,

0.8481 0.8494
0.8606 0.8618
0.8721 0.8732
0.8829 0.8839
0.9255 0.9262
0.9535 0.9539
0.9713 0.9715
0.9824' 0.9825
0.9"2 0.9'93

l'

!,

0.6434

~:::~~
0.7089
0.7291
0.7484

Translating negativa X¡ ~ vaIues into predicted probabilities />¡

~:~::~
00·.8 0 0 6
a 161

~::~~~ ~::~~! ~::~~: ~::~~~ ~::~~: ~::~:: ~::!~~ 0.8545
~::!~: ~::!~! ~::!~~
0.8569
0.8455
0.8581
0.8699
0.8808
0.9241
0.9526
0.9707
0.9820
0.9890

The c¡, Link Function for BNP Models

~::~:~ ~::~~: ~::!~~
~::!~:
0.6695 0.5720

0.5548

~.O
~.5

.0

+0.04

0.5523

0.5498

!:~

·~.5

+0.03

0.5125
0.5374
0.5597 0.5622

1.5
1.6
1.7

3.0

+0.02

~::~~~ ~::~~: ~::~~~ ~::~~: ~::~~~

~:: ~:~~~~
).6
).7
).8
:>.9
1.0
1.1
1.2
1.3
1.4

+0.01

0.8688
0.8797
0.8899
0.9302
0.9565
0.9731
0.9835
0.9899

-3.4
-3.0
-2.5
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-.9
-.8

-.7
-.6
-.5
-.4
-.3
-.2
-.1
-.0

-.00

-.01

-.02

-.03

-.04

.0003
.0013
.0062
.0228
.0287
.0359
.0446
.0548
.0668
.0808
.0968
.1151
.1357
.1587
.1841
.2119
.2420
.2743
.3085
.3446
.3821
.4207
.4602
.5000

.0003
.0013
.0060
.0222
.0281
.0351
.0436
.0537
.0655
.0793
.0951
.1131
.1335
.1562
.1814
.2090
.2389
.2709
.3050
.3409
.3783
.4168
.4562
.4960

.0003
.0013
.0059
.0217
.0274
.0344
.0427
.0526
.0643
.0778
.0934
.1112
.1314
.1539
.1788
.2061
.2358
.2676
.3015
.3372
.3745
.4129
.4522
.4920

.0003 .0003
.0012 .0012
.0057 .0055
.0212 .0207
.0268 .0262
.0336 .0329
.0418' .0409
.0516.0505
.0630 .0618
.0764 .0749
.0918.0901
.1093 .1075
.1292 .1271
.1515 .1492
.1762 .1736
.2033 .2005
.2327 .2296
.2643 .2611
.2981 .2946
.3336 .3300
.3707 .3669
.4090 .4052
.4483 .4443
.4880 .4840

-.05

-.06

-.07

.d003
.0011
.0054
.0202
.0256
.0322
.0401'
.0495
.0606
",
.0735
.0885
.1056
.1251
.1469
.1711
.1977
.2266
.2578
.2912
.3264
.3632
.4013
.4404
.4801

.0003 .0003
.0011
.0011
.0052 .0051
.0197 .0192
.0250 .0244
.0314 .0307
.0392 . ~0384
.0485 .0475
.0594 .0582
.0721
.0708
.0869 .0853
.1038 .1020
.1230 .1210
.1446 .1423
.1685 .1660
.1949 .1922
.2236 .2206
.2546 .2514
.2877 .2843
.3228 .3192
.3594 .3557
.3974 .3936
.4364 .4325
.4761 .4721

-.08

-.09

.0003
.0010
.0049
.0188
.0239
.0301
.0375
.0465
.0571
.0694
.0838
.1003
.1190
.1401
.1635
.1894
.2177
.2483
.2810
.3156
.3520
.3897
.4286
.4681

.0002
.0010
.0048
.0183
.0233
.0294
.0367
.0455
.0559
.0681
.0823
.0985
.1170
.1379
.1611
.1867
.2148
.2451
.2776
.3121
.3483
.3859
.4247
.4641

(continued)

259

260

The

<1>

Link Function for BNP Models

Appendix D (continued)
Translating positive XI ~ values mto
.
pre dicted probabilities

Xj~
+.0
.+.1
+.2
+.3
+.4
.5
,{
+.6
.:'
+.7
+.8
+.9
+1.0
+1.1
+1.2
+1.3
+1.4
+1.5
+1.6
. +1.7
+1.8
+1.9
+2.0
+2.5
+3.0
+3.4

l

t

PI

+.00

+.01

+.02

+.03

+.04

+.05

+.06

+.07

+.08

+.09

.5000
.5398
.5793
.6179
.6554
.6915
.7257
.7580
.7881
.8159
.8413
.8643
.8849
.9032
.9192
.9332
.9452
.9554
.9641
.9713
.9772
.9938
.9987
.9997

.5040
.5438
.5832
.6217
.6591
.6950
.7291
.7611
.7910
.8186
.8438
.8665
.8869
.9049
.9207
.9345
.9463
.9564
.9649
.9719
.9778
.9940
.9987
.9997

.5080
.5478
.5871
.6255
.6628
.6985
.7324
.7642
.7939
.8212
.8461
.8686
.8888
.9066
.9222
.9357
.9474
.9573
.9656
.9726
.9783
.9941
.9987
.9997

.5120
.5517
.5910
.6293
.6664
.7019
.7357
.7673
.7967
.8238
.8485
.8708
.8907
.9082
.9236
.9370
.9484
.9582
.9664
.9732
.9788
.9943
.9988
.9997

.5160
.5557
.5948
.6331
.6700
.7054
.7389
.7704
.7995
.8264
.8508
.8729
.8925
.9099
.9251
.9382
.9495
.9591
.9671
.9738
.9793
.9945
.9988
.9997

.5199
.5596
.5987
.6368
.6736
.7088
.7422
.7734
.8023
.8289
.8531
.8749
.8944
.9115
.9265
.9394
.9505
.9599
.9678
.9744
.9798
.9946
.9989
.9997

.5239
.5636
.6026
.6406
.6772
.7123
.7454
.7764
.8051
.8315
.8554
.8770
.8962
.9131
.9279
.9406
.9515
.9608
.9686
.9750
.9803
.9948
.9989
.9997

.5279
.5675
.6064
.6443
.6808
.7157
.7486
.7794
.8078
.8340
.8577
.8790
.8980
.9147
.9292
.9418
.9525
.9616
.9693
.9756
.9808
.9949
.9989
.9997

.5319
.5714
.6103
.6480
.6844
.7190
.7517
.7823
.8106
.8365
.8599
.8810
.8997
.9162
.9306
.9429
.9535
.9625
.9699
.9761
.9812
.9951
.9990
.9997

.5359
.5753
.6141
.6517
.6879
.7224
.7549
.7852
.8133
.8389
.8621
.8830
.9015
.9177
.9319
.9441
.9545
.9633
.9706
.9767
.9817
.9952
.9990
.9998

Bibliography

Achen, Christopher H. 1982. Interpreting and Using Regression. Beverly Hills, CA:
Sage.
AImond, Gabriel A. and Sidney Yerba. 1963. The Civic Culture: Political Attitudes
and Democracy in Five Nations. Princeton, NJ: Princeton University Press.
Arrow, Kenneth. 1951. Social Choice and Individual Values. New York: Wiley•
Belsley, David A., Edwin Kuh, and Roy E. Welsch. 1980. Regression Diagnostics:
Identifying Influential Data and Sources of Collinearity. New York: Wiley•
Brady, Henry E. 2002. "Models oí Causal Inference: Going Beyond the NeymanRubin-HoJland Theory." Paper presented at the annual meeting of the Political
Methodology Society, Seattle, WA .
Brady, Henry E. 2004. "Introduction." Perspectives on Politics 2:295-300 .
CampbeJl, Donald T. and Julian C. Stanley. 1963. Experimental and Qttasiexperimental Designs for Research. Chicago: Rand McNally•
Dahl, Robert A. 1971. Polyarchy: Participation and Opposition. New Haven, CT:
Yale University Press .
Danziger, Sheldon and Peter Gottschalk. 1983. "The Measurement oí Poverty:
Implications for Antipoverty Policy." American Behavioral Scienlisl 26:739756.
Dixon, William and Bruce Moon. 1993. "Political Similarity and American Foreign
Trade Patterns." Political Research Quarterly 46:5-25.
Edmonds, David and John Eidinow. 2003. Wittgenstein's Poker: The Story of a
Ten-Minute Argument Between Two Creat Philosophers. New York: Harper
Perennial.
Elkins, Zachary. 2000. "Gradations oí Democracy? Empirical Tests oE Alternative
Conceptualizations." American Jottrnal of Political Science 44:293-300.
Evans, Geoffrey and Stephen Whitefield. 1995. "The Politics and Economics
of Democratic Commitment: Support for Democracy in Transition Societies."
British Journal of Political Science 25:485-514.
Fenno, Richard F. 1973. Congressmen in Committees. Boston: Litde, Brown.
Fiorina, Morris P. 1989. Congress: Keystone lo the Washington Establishment,
2nd ed. New Haven, CT: Yale University Press.
CaJlagher, Michael, Michael Laver and Peter Mair. 2006. Representátive Covernment in Modern Europe, 4th ed. New York: McGraw-Hill.
261

~62

Bibliography

Gibson, James L. 1992. "Alternative Measures of Political Tolerance: Must Tolerance Be 'Least-Liked'?" American Journal of Political Science 36:560-577.
Gowa, ]oanne. 1989. "Bipolarity, Multipolarity, and Free Trade." American Political Science Review 83:1245-1256.
Gowa, Joanne and Edward D. Mansfield. 1993. "Power Politics and International
Trade." American Political Science Review 87:408-420.
Granger, Clive W.]. and Paul Newbold. 1974. "Spurious Regressions in Econometrics." Journal of Econometrics 26:1045-1066.
Green, Donald P. and lan Shapiro. 1994. Pathologies of Rational Choice Theory:
A Critique of Applications in Political Science. New Haven, CT: Ya le University
Press.
King, Gary. 1986. "How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science." American Journal of Political Science 30: 666687.
Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions. Chicago: University
of Chicago Press.
Inglehart, Ronald. 1988. "The Renaissance of Political Culture." American Political Science Review 82:1203-1230.
Iyengar, Shanto and Donald R. Kinder. 1987. News that Matters: Television and
American Opinion. Chicago: University of Chicago Press.
Lazarsfeld, Paul F., Bernard Berelson, and Hazel Gaudet. 1948. The People's
Choice: How the Voter Makes Up His Mind in a Presidential Campaign. New
York: Columbia University Press.
Lewis-Beck, Michael S. 1997. "Who's the Chef? Economic Voting Under a Dual
Executive." European Journal of Political Research 31:315-325.
Luskin, Robert C. 1987. ¡'Measuring Political Sophistication." American Journal
of Political Science 31:856-899.
MacKuen, Michael B., Robert S. Erikson, and James A. Stimson. 1992. "Peasants
or Bankers: The American Electorate and the U.S. Economy." American Political
Science Review 86:597-611.
Mayhew, David R. 1974. Congress: The Electoral Connection. New Haven, CT:
Yale University Press.
Morrow, James D., Randolph M. Siverson, and Tressa E. Tabares. 1998. "The
Political Determinants of International Trade: The Major Powers, 1907-90."
American Polítical Science Review 92:649-661.
Mueller, John. 1973. War, Presidents, and Public Opinion. New York: Wiley.
Munck, Gerardo L. and Jay Verkuilen. 2002. "Conceptualizing and Measuring
Democracy: Evaluating Alternative Indices." Comparative Political Studies 35:534.
Niemi, Richard G. and M. Kent Jennings. 1974. The Political Character of Adolescence: The Influence of Families and Schools. Princeton, NJ: Princeton University
Press.
Pearl, Judea. 2000. Causality: Models, Reasoning, and Inference. New York: Cambridge University Press.
Piazza, Thomas, Paul M. Sniderman, and Philip E. Tetlock. 1990. "Analysis of
the Dynamics of Political Reasoning: A General-Purpose Computer-Assisted
Methodqlogy." Political Analysis 1:99-120.

263

Bibliography

Pollins, Brian M. 1989. "Does Trade Still Follow the Flag?" American Political
Science Review 83:465-480..
Poole, Keith T. and Howard Rosenthal. 1997. Congress: A Political-Economic
History of RolI Call Voting. New York: Oxford University Press.
Przeworski, Adam. 1991. Democracy and the Market. New York: Cambridge
University Press.
Putnam, Robert P. 2000. Bowling Alone. New York: Simon & Schuster.
Riker, William H. 1982. Liberalism Against Populism: A Confrontation Between
the Theory of Democracy and the Theory of Social Choice. San Francisco:
Freeman.
Riker, WilJiam H. and Peter C. Ordeshook. 1968. "A Theory of the Calculus of
Voting." American Political Science Review 62:25-42.
Rogers, James R. 2006. "Judicial Review Standards in Unicameral Legislative Systerns: A Positive Theoretic and Historical Analysis." Creighton Law Review
33(1): 65-120.
Salmon, Wesley C. 1993. "Probabilistic Causality." In Causation, edited by Ernest
Sosa and Michael Tooley. Oxford: Oxford University Press, Chapter 8, pp. 137153.
Sniderman, Paul M. and Thomas Piazza. 1993. The Scar ofRace. Cambridge, MA:
Harvard University Press.
Stouffer, Samuel C. 1955. Communism, Conformity, and Civil Liberties. New
York: Doubleday.
Tijms, Henk. 2007. Understanding Probability: Chance Rules in Everyday Life,
2nd ed. Cambridge: Cambridge University Press.
Sullivan, John L., James Piereson, and George E. Marcus. 1979. "An Alternative Conceptualization of Political Tolerance: Illusory Increases 1950s-1970s."
American Political Science Review 73:781-794.
Weatherford, M. Stephen. 1992. "Measuring Political Legitimacy." American PoIítical Science Review 86:149-166.

Index

Achen, Christopher, 227
activism, judicial, 90
addirive models, 210
aggregate, 70
alternarive hypothesis, 172-3, 176

American Political Science Review, 34
Appropriations Committee, 39
Arrow, Kenneth, 38
Arrow's theorem, 38
assignment. random
sampling V., random, 74
to treatment groups, 74
assumptions
autocorrelation, 179
bias, 177-8
causal variable, 180
heteroscedasticity, 178-9
homoscedasticity, 178-9
measurement, 179-80
model specification, 180--1
noncausal variable, 180
normal distribution, 177
parametric linearity, 180--1,215
perfect multicollinearity, 192
about population stochastic component,
177-80
unequal error variance, 178-9
uniform error variance, 178-9
variance, 178-9
autocorrelation, 179
auxiliary regression model, 228
ballot irregularities, 221, 224n8
behavioral revolution, 76
bell curve, 122-3
Berelson, Bernard, 6
Berlin Wall, 248
best pracrice, 48
bias
assumprions, 177-8
direerion of, 191

265

magnirude of, 191
measurement,93-4
omitted-variables, 189,200,227,229
BilI and Melinda Gates Foundation, 33
binomiallogit models (BNL), 135-36,
215-19
link function for, 257-8
binomial probit models (BNP), 135-36,
215-19
link function for, 259-60
bivariate, 46
conneetion between two variables, 49
bivariate hypothesis tests, 139-55
causal relationships and, 134-5
choosing correet, 135-6
variable types and, 135
Blair, Tony, 145n7
BNL. See binomiallogit
BNP. See binomial probit
boundary, 160n1
box-whisker plot
of incumbent-party presidential vote
percentage, 114
of rank statistics, 113
Britain,249
Broward County, Florida, 223
Buchanan, Patrick, 221
Florida counties and votes for, 222-5
Buffet, Warren, 33
Bulgaria, 249
Bundesrat, 59
Bush, George W., 128, 140--3, 145n7, 212-14
education and, 231
ideology and, 231
partisanship and, 213, 217, 219
performance evaluations and, 213, 217,
219
Thermometer Rating, 230--2
butterfly ballot, 221
Campbell, Donald, 79n4

I

Index

,

I
f

¡
~,

categorical variables, 105, 106
describing, 109-10
mode, 109
causal c1aims
crodibility of, 52
(Iirecrional componcnt of, 69
evaluating,45
example in political campaign, 51-2
explicit,50
identifying, 50-1
underlying, 53
causal explanations, 3-7, 40
variables and, 7-14
causal relationships,2
bivariate hypothesis testing and, 134-5
comparison and, 67-8
confidence in, 48
correlation and, 49
deterministic, 47
evaluating,51
everyday language and, 45-8
how/why questions about, 48
hurdles to, 48-50
probabilistic, 47
reverse,49
causal theory, 4, 15-16,40
components of, 9
causal variables, 180
Census, U.S., 120-1
centrallimit theorem, 122-7, 129
central tendency, 109
Chirac, Jacques, 27
chi-squared test, 143
civilliberties, 99
c1assification table, 219
Clinton, BiII, 203
Clinton, Hillary
gender and, 204-7, 211
income and, 205, 209
as left-wing feminist, 203
religion and, 209
Thermometer Rating, 203-9
Women's Movement and, 210-11
coding rules, 93
Poliry IV measure, 98
coefficients
standardized, 197
unstandardized, 196
Cold War, 248
Communist Parry, East German, 60
comparison, 68
causal relationships and, 67-8
decontaminate, 72
results from experiment, 70
results from nonexperiment, 70
complete information, 34
conceptual c1arity, 91-2
confidence interval, 127
formulae for, 171-2

regression line and, 171-2
two-tailed hypothesis test and, 175
conflict, 251
confounding variables, 48
congressional roll-callliberalism, 90
Congressmen in Committees (Fenno), 39
consumer confidence, 245, 247
continuous variables, lOS, 107-8
describing, 11 0-17
oudiers, 111
control, 46, 53
automatic, 183
faiJing to, 53-4,188-93,199-200
group,70
for growth, 195
introducing new, 244
controversy,54
Copernicus, Nicolai, 6
correlation, 16n5
causal relationships and, 49
correlation coefficient, 136, 150-5
caJculating, 154
Pearson's, 154
standardizarion of, 154-5
covariance, 150-1
contributions of individual election years to,
153
formula for, 151
variance and, 168
covariation, 16
covary,16
critical value, 144
cross-sectional measure, 23
example, 25-6
cumulative effect, 242
cumulative logistic distribution íunction, 217
cumulative normal distribution function, 217
currency exchange rates, 25
Dade Counry, Florida, 223
Dahl, Robert, 97
data, 79-81
driven theory, 41
grammatical misuse oí, 79
data set, 79-81
dimensions, 80
dyadic, 251n5, 252
datum,79-81
debt, U.S. government, 42
as percentage of GNP, 80
decimal places, 166n7
degrees oí íreedom (drs), 144, 149
democracy
components oí, 97
contestation and, 97, 98
as continuum, 96
economy and, 250
international trade and, 251
measurement, 96-9

I

I

I
I

I
I

267

Index

participation and, 97, 98
political tolerance and, 91
pros/cons of, 249
democratic elections
fairness of, 38
goals of, 38
democratic stabiliry
cause of, 54
Jife sarisfacrion and, 54-5
Democritus,45
dependent variables, 7
differenced, 239-40
lagged,241-2
operationalization of, 10
variation in, 31-6
DFBETA score, 223-4
df's. See degrees oí freedom
Diagnostic and Statistieal Manual of Mental
Disorders (DSM-N), 89n4
difference of means test, 136, 145-50
differenced dependent variables, 239-40
directional hypothesis, 175
testing, 173
dispersion, 112
distributed lag model, 241
diversionary use of force, 30
Dole, Robert, 203
Downs, Anthony, 35
DSM-IV. See Diagnostic and Statistieal
Manual of Mental Disorders
dummy variables
goodness-o~fitand,219-20

in ordinary least-squares regression, 203-9
Palm Beach Counry, Florida, 224-5
to test hypotheses, 203-9
trap, 205
dyadic data set, 251 n5, 252
East Germany
Communist Party, 60
West Germany unification with, 59
economic voting, theory of, 9, 10, 166
hypothetical cases oí, 12, 13
Edmonds, David, 47n3
Eidinow, John, 47n3
EITM. See "Empiricallmplications of
Theoretical Models·
electoral systems
political parties and, 57-61
proportional, 58
single-member district pluraliry, 58
Weimar RepubJic, 58-60
West Germany, 59-60
EJizabeth, Queen, 145n7
"EmpiricallmpJications of Theoretical
Mode1s· (EITM), 36nl0
equal error variance, 215
equal unit differences, 107
Erie Counry, Ohio, 6

error measurement, 179-80
error, proportionate reduction of, 220
error term, population, 162
Estonia, 249
Evans, Geoffrey, 249
evidence, empirical, 17
exchange rates, 25
expected utiJiry, 32-4
Experimental and Quasi-Experimental
Designs for Research (Campbell &
Stanley), 79n4
experiments, 67
comparison results from, 70
components of, 69
drug-trial, 75
emulating, 78
ethical dilemmas and, 76
generalizing broader populadon with, 76
results from non, 70
scientific meaning of, 69
survey, 75n2
external validiry, 75
Fenno, Richard, 39
Fifth Republic France, 27-8
Florida, 222-5
force, diversionary use of, 30
formal theory, 31
criticizing, 33
populariry of, 38
usefulness of, 36nl0
formula(e)
confidence interval, 171-2
covariance, 15 1
ordinary least-squares regression, 164
regression Jine, 160
France,249
frequency distribution, 124
íunctional form, 215
game theory, 39-40
Gaudet, Hazel, 6
Gaullist RPR parry, 27
GDP. See gross domestic product
gender
Clinton, Hillary, and, 204-7, 211
poJitical participation and, 74
voting and, 141-3
generaJiry, 18,41
oí theory, 41
Germany, 249. See also East Germany; West
Germany
goodness-of-fit, 167-9
dummy variables and, 219-20
good,169
Gore, Al, 221
Florida counties and votes for, 222-5
government duration, 147
government rype and, 149

1'
¡

268

" Governments, 1950-1995" (McDonaId &
Mendes),146
graphs, descriptive, 104
gravity, law of, 47
gravity models, 251n5
Great Depression, 113n5
Green, Donald, 36n10
Green Party, 59
gross domestic product (GDP), 25
divorce rates and, 238-9
government debt as percentage of, 80
gross U.S. government debt as percentage of,
42
incumbent-party vote share and, 151, 162

T

1

269

lndex

heteroscedasticity, 178-9,215
HiIIsborough County, Florida, 223
histogram
of incumbent-party presidential vote
percentage, 116, 117
problem with, 116
homoscedasticity, 178-9
Hungary, 249
hypothesis, 3. See o/so null hypothesis
alternative, 172-3, 176
directional, 175
empirical tests of, 4
rethinking, 11
to theory, 9,14
hypothesis tesring, 3--4, 5
directional, 173
dummy variables in, 203-9
one-tailed, 175-6
ordinary least-squares regression and,
172-3
hypothesis testing, two-tailed, 173-5
confidence interval and, 175
¡ncome
alcohol consumption and, 61-2
Clinton, Hillary, and, 205
of fifry states, 119
measurement, 92
posttransfer definirion of, 89
pretransfer definition of, 89
¡ncomplete information, 34
"Incumbent Vote," 112
independent variables, 7
omitting, 190
shift in, 234
inflation, 82, 246, 247
presidential approval and, 81
influence, 221--4
creating, 224
dealing with, 224-5
dummying out, 224-5
identifying, 221--4
Inglehart, Ronald, 54

instantaneous effect, 241
institurional arrangements, 36
instrumentarion, 88
interactive models, 210
interest, 88
measurement,91-6
internal validity, degrees of, 73
¡nternational trade, 251-3
democracy and, 251
political causes of, 252
interquartile range (IQR), 112
interval variables, 108n4
intransirive preference orderings, 37
IQR. See interquarrile range
Iron Currain, 248
Italy,249

I

Jocobel/is v. Ohio, 86
Jenning, Kent, 7
kernel density plot, 117
Kerry,John, 140-3, 212-14
Keynes, John Maynard, 236n 13
Koyck transformation, 241
Kuhn, Thomas, 5, 7
lag model, distributed, 241
lagged dependent variable, 241-2
lagged values, 233
Lazarsfeld, Paul, 6
lead values, 233
least-squares property, 115
legislative rules, 36-8
leverage, 220
Lewis-Beck, Michael, 27
Liberolism Agoinst Popu/ism (Riker), 38
liberalism, congressional roll-call, 90
linear probability model (LPM), 212-15
avoiding,215
correctly c1assified, 219
problems with, 214-15
linearity, parametric, 180-1, 215
link functions, 216-17
for binomiallogit models, 257-8
for binomial probit models, 259-60
difference of, 217
Lithuania, 249
LPM. Se, linear probability model
macro politics, 30
McDonald, Michael D., 146
mean-delimited quadrants, 152
measurement, 86-8
assumptions, 179-80
bias, 93-4
conceptual c1arity and, 91-2
cross-sectional, 23, 25-6
democracy,96-9

\

lndex

error, 179-80
income,92
interest, 91-6
of interest concepts, 91-6
metric, 105-9
OLS regression uncertainty, 165-76
political tolerance, 99-101
poor, 101
problem of, 87, 88
reliability, 92-3
residual, 164
social science, 88
time-series, 23, 24-5
validity, 92, 94-5
variable, 9
Mendes, Silvia M., 146
metric measurement, 105-9
microeconomics,39
micronumerosity,227
Milgram, Stanley, 76-7
military
spending 2005, 25
Mitterand, Fran~ois, 27
model(s), 3. See o/so binomiallogit models;
binomial probit models; linear probabiliry
model; mulriple regression models;
regression models; sample regression
models; two-variable regression models
additive,210
auxiliary regression, 228
distributed lag, 241
gravity,251n5
interactive,210
mapsand,15
misspecification of, 226
naive,219
of politics, 14-15
- reality, statistical, 160
as simplifications, 15
spatial,39
specification assumptions, 180-1
theoretical, 14
model sum of squares (MSS), 168
MSE. See root mean-squared error (MSE)
MSS. See model sum of squares
Mueller,]ohn, 27, 30
multicollinearity, 202, 225-32
dealing with, 232
detecting, 227-8
inducing, 226-7
less-than-perfect, 226
perfect, 192, 204
real-world example, 230-2
simulated example, 228-30
Venn diagram with, 226
multiple regression models, 183--4
first step towards, 159
interpreting, 193-6

mathematical requirements, 192-3
two-variable to, 184-8
multivariate, 46
Musharraf, Pervez, 98
naive mode! (NM), 219
National Election Study (NES), 106
National Science Foundation, 36nl0
Nazi Party, 59
negative relationship, 13
NES. See National Elecrion Study

Newsweek, 128
Niemi, Richard, 7
NM. See naive model
nonuniform partern of mode! error variance,
215
normal distribution, 122-7
assumptions, 177
68-95-99 rule and, 123-4, 127
normative statements, 17-18
null hypothesis, 3-4, 5,172-3,176
p-values and, 138-9
OLS. See ordinary least-squares regression
(OLS)

omirted-variables bias, 189,200,227,229

On Revo/utions of the Heoven/y Bodies
(Copernicus),6
one-tailed hypothesis testing, 175-6
operationalization, 87
of concepts, 10
oC dependent variables, 10
of variables, 9
Orange County, Florida, 223
Ordeshook, Peter, 34, 36
ordinal variables, 106-7
ordinary least-squares regression (OLS), 164
dummy variables in, 203-9
extensions of, 202
formulae for, 164n6
hypothesis testing and, 172-3
mathematical properties of, 164
oudiers in, 220-5
problems in, 202
uncertainty in, 165-76
outliers, 111, 202
continuous variable, 111
identify, 112
in ordinary least-squares regression, 220-5
univariate, 220
pairwise voting, 38
Pakistan, 98-9
Palm Beach Counry, Florida, 223
dummy variable, 224-5
paradigm,5
shih, 6, 7
paradox of voting theory, 34-6

270

271

Index

parametric Iinearity, 180-1, 215
parliamentaty governments, 145-50
parsimony, 18,41
"Party Identification," 107-8
Party of Democratic Socialism, 60
Pathologies of Rational Choice Theory (Green
& Shapiro), 36nl0
Pearson, Karl, 143, 148, 154
Pearson's correlation coefficient, 154
The People's Choice (Lazarsfeld, Bernard &
Gauder),6
Pinellas County, Florida, 223
Poland,249
political campaign
advertising, 6
causal c1aims example from, 51-2
education and, formal, 99
The Politieal Character of Ado/escence: The
Influence of Families and Schoo/s (Niemi
& Jenning), 7
political interactions, choices in, 31
politicallegitimacy, 90-1 .
political participation, gender and, 74
political pardes. See also specific parties
electoral systems and, 57-61
number of, causes of, 57
school children and, 7
polirical psychology, 90
political sophisticarion, 91
polirical tolerance
democracy and, 91
measuring, 99-101
Polity IV measure, 97
coding rules, 98
cri ticism of, 98
for Pakistan, 98-9
population
error term, 162
regression funccion, 184
samples and, 120-1
positive relarionship, 13
poverty, 88-9
federal government definirion of, 88
predicted probabilities, 212, 214
preference orderings, 37
prejudice, 90, 94
presidencial approval, 128-31,234
economy and, 234, 245-8
inflarion and, 81
1995-2005,24,30
peace and, 234
sample standard deviadon of, 128
standard error of mean of, 129
unemployment rate and, 246
p-value, 136-9
limitations of, 137-8
logic of, 136-7
null hypothesis and, 138-9
to statistical significance, 138

proportionare reduction of error, 220
psychology, 76, 87, 89
political, 77, 90
public opinion, 6
economy and, 28, 82, 245-6
international incidents and, 27
polirical control and, 28
polls, 122
rallies in, 27, 30, 31
randomness, 72
rank staristics, 111-13
box-whisker plot of, 113
ratio variables, 108n4
rational choice, 31
rational utility maximizers, 31
Red Scare, 99
reelection rates, 157
reference category, 208
regression Eunction, population, 184
regression line, 165
confidence intervals and, 171-2
estimating, 162-5
formula, 160
parameters, 160, 161
possible, 163
regression models, 159. See a/so multiple
regression models; sample regression
models; two-variable regression models
of elections, 193
population, 160
reliability
measurement, 92-3
over-time variability and, 93
validity and, 95-6
rdigion identification, 106, 208
bar graph of, 110
frequency table for, 109
pie graph of, 110
replication, 76
research design, 54
correlational,78
experimental, 68-77
experimental, drawbacks to, 74-7
failures of, 54
strategies, 67
research design, observational, 67, 77-83
causal hurdles in, 78-9, 81-2
cross-sectional, 79,80,81-2
difficulty with, 83
hybrid design, 79
time-series, 67, 79, 82-3
residual, 161-2
measures of, 164
sum oE absolute, 163
values, 220
residual sum of squares (RSS), 163-4, 168
"Retrospective Family Financial Situation,"
106, 107

Index

Riker, William, 34, 36, 38
"rise over run," 160
Rogers, James, 15
Romania, 249
root mean-squared error (MSE), 167
R-squared statistic, 167-9
RSS. See residual sum of squares
Rules Commirtee, 39
Russell, Berreand, 120
Russia, 249

j
¡

¡
1

I
i

i

1

I

Salmon, Wesley, 47
sample
oE convenience, 76, 129
error term, 162
population and, 120-1
sample regression models, 161 .
uncerrainty about, 169-71
sample size, efEects of, 130-1
sampling distribution, 126
sampling, random, 121, 129
assignment V., random, 74
truly,138
science, medical, social science v., 74
science, normal, 5
science, physical, 47
social science V., 74
science, social
measurement, 88
physical science V., 74
scientific knowledge, rules of road to, 15-18
selection effect, 56
self-interest, 32
September 11, 2001, 26
Shapiro, Ian, 36nl0
single-member district plurality electoral
systems,58
skepticism, 4
previous research, 28.::9
social capital, 91
Social Choice and Individual Values (Arrow),
38
social science, 74
sophistication, political, 91
Soviet communism, 99
spadal dimension, 23
spatial models, 39
spacial units, 79
variation between individual, 81
spurious regression problem, 236-7, 236n13
avoiding, 239-40
standard deviation, 115
of presidential approval, 129
standard error of mean, 126
of presidential approval, 129
standardized tests, 56
Stanley, Julian, 79n4
statistical analysis, 54, 108-9
sratistical inference, 121

statistical mode! of reaJity, 160
statistical moments, 111
statistical significance, 134, 198-9
from p-value to, 138
statistics, descriptive, 104
Stewarr, Porter, 86, 96
stochastic component, 160
assumptions about, 177-80
Stouffer, Samuel, 99
The Structure of Scientific Researeh (Kuhn), 5
substantive significance, 198-9
Sullivan, John, 100
survey experiments, 75n2
Symbianise Liberation Army, 100
tabular analysis, 136, 139-45
process of, 139
as stepping stone, 13 9
theoretical models, 14
theory, 3, 87. See also formal theory
aggregarion levels of, 30-1
applicability of, 41
causal, 4, 9, 12, 13, 15-16,40
confirmation of, 14
data driven, 16-17,41
developing, 7
evaluating, 8
game, 39-40
generality of, 41
good,22,40-2
guiding,17
hypothesis to, 9, 14
natural selection, 47
newness, 41, 42
nonobviousness of, 42
paradox of voting, 34-6
rethinking, SO
specific event to general, 26-7
strategies, 22
"A Theory of the Caleulus of Voting" (Riker
& Ordeshook), 34
Thermometer Ratings
Bush, G., 230-2
Clinton, Hillary, 203-9
Tijms, Henk, 122
time
dimension, 23
intervals, 233
series anal ysis, 234-6
unit,79
time-series measure, 23
example,24-5
time-series notation, 233-4
tolerance, political
democracy and, 91
measuring, 99-101
total sum of squares (TSS), 168
totalitarianism, 96
transitive preference orderings, 37

272

Index

trap, dummy-variable, 205
t-ratio, 173, 176
treatment group, 70
random assignment to, 74
TSS. See total sum of squares

t-test, 148
Tullock, Gordon, 35
turnout, 34-6
two-tailed hypothesis tests, 173-5
confidence ratio and, 175 '
two-variable regression models, 159
mathematical requirements for, 181
to multivariate, 184-8
stata results for, 166
Ukraine, 249
uncertainty
OLS regression measurement of, 165-76
sample regression model, 169-71
unemployment, 80
presidential approval and, 246
union households, voting and, 140
United Sta tes, 249
univariate outliers, 220
utility, 32-4
expected, 32-4
maximization, 34, 37
unusual,34
validity, 94-5
construct, 95
content,95
externa 1, 75
Eace,95
measurement, 92
reliability and, 95-6
variables, 3. See 0150 categorical variables;
continuous variables; dependent variables;
- dummy variables; independent variables
bias omined, 189,200,227,229
bivariate connection between two, 49
bivariate hypothesis tests and types oE,
135
causal,180
causal explanations and, 7-14
characteristics of, 104
confounding,48
covariation between two, 109
describing, 104
expected value oE, 115
identiEying interesting, 23-4
interval, 108n4
kurtosis oE, 116n9
label,7, 105
mean value Eor, 114

measuring, 9
median value oE, 112
noncausal, 180
operarionalization oí, 9
ordinal, 106-7
properries of, 104
ratio, 108n4
relationships between two or more, 104,
131-2
skewness oE, 116n9
statistical assaciations between, 86
statistical moments oE, 114-17
types, 108-9
value,7
variance of, 115
variarion, 109
variance
assumprions,178-9
covariance and, 168 .
equal error, 215
unequal error, 178-9
uniEorm error, 178-9
variance inflation factor (VIF), 228
VIF. See variance inflation factor
voting
rationality of, 34
,for Buchanan, 222-5
COSIS oE, 35
equation, 35
gender and, 141-3
for Gore, 222-5
pairwise,38
records,81
strategic, 39
union households and, 140
why pecple borber with, 36
voting theory, paradox of, 34-6

¡

J

I

I
1

Warsaw Pact, 248
Ways and Means Comminee, 39
Weimar Republic, 58-60
Nazi Parry vote and number of parties
winning seats in, 59
West Germany
constitution of, 59
East Germany unificarion with, 59
electoral system, 59-60
Whitefield, Stephen, 249
"Who's the Chef?" (Lewis-Beck), 27

Wittgenstein's Poker: Tbe Story of a Ten
Minute Argument Between Two Great
Pbi/osopbers (Edmonds & Eidinow),
47n3
zero-sum properry, 114

I

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close