$$ \newcommand{\bs}{\boldsymbol} \newcommand{\mb}{\mathbf} \newcommand{\E}{\mathbb{E}} \newcommand{\V}{\mathbb{V}} \newcommand{\var}{\text{var}} \newcommand{\cov}{\text{cov}} \newcommand{\N}{\mathcal{N}} \newcommand{\Bern}{\text{Bern}} \newcommand{\Bin}{\text{Bin}} \newcommand{\Pois}{\text{Pois}} \newcommand{\Unif}{\text{Unif}} \newcommand{\se}{\textsf{se}} \newcommand{\au}{\underline{a}} \newcommand{\du}{\underline{d}} \newcommand{\Au}{\underline{A}} \newcommand{\Du}{\underline{D}} \newcommand{\xu}{\underline{x}} \newcommand{\Xu}{\underline{X}} \newcommand{\Yu}{\underline{Y}} \renewcommand{\P}{\mathbb{P}} \newcommand{\U}{\mb{U}} \newcommand{\Xbar}{\overline{X}} \newcommand{\Ybar}{\overline{Y}} \newcommand{\real}{\mathbb{R}} \newcommand{\bbL}{\mathbb{L}} \renewcommand{\u}{\mb{u}} \renewcommand{\v}{\mb{v}} \newcommand{\M}{\mb{M}} \newcommand{\X}{\mb{X}} \newcommand{\Xmat}{\mathbb{X}} \newcommand{\bfx}{\mb{x}} \newcommand{\y}{\mb{y}} \renewcommand{\bfbeta}{\bs{\beta}} \newcommand{\e}{\bs{\epsilon}} \newcommand{\bhat}{\widehat{\bs{\beta}}} \newcommand{\XX}{\Xmat'\Xmat} \newcommand{\XXinv}{\left(\XX\right)^{-1}} \newcommand{\hatsig}{\hat{\sigma}^2} \newcommand{\red}[1]{\textcolor{red!60}{#1}} \newcommand{\indianred}[1]{\textcolor{indianred}{#1}} \newcommand{\blue}[1]{\textcolor{blue!60}{#1}} \newcommand{\dblue}[1]{\textcolor{dodgerblue}{#1}} \newcommand{\indep}{\perp\!\!\!\perp} \newcommand{\inprob}{\overset{p}{\to}} \newcommand{\indist}{\overset{d}{\to}} \newcommand{\eframe}{\end{frame}} \newcommand{\bframe}{\begin{frame}} \newcommand{\R}{\textsf{\textbf{R}}} \newcommand{\Rst}{\textsf{\textbf{RStudio}}} \newcommand{\rfun}[1]{\texttt{\color{magenta}{#1}}} \newcommand{\rpack}[1]{\textbf{#1}} \newcommand{\rexpr}[1]{\texttt{\color{magenta}{#1}}} \newcommand{\filename}[1]{\texttt{\color{blue}{#1}}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} $$

1  Introduction


This book, like so many books before it, will try to teach you statistics. The field of statistics describes how we learn about the world from quantitative data. In the social sciences, the vast majority of empirical studies use statistical methods to provide evidence for their arguments. While it is possible to conduct quantitative research without understanding statistics, one must advise against it. Quantitative research involves a host of choices about what model to use, what variables to include, what tuning parameters to set, what assumptions to make, and so on. Without a deep understanding of statistics, you will find these choices bewildering and often yield to the default settings of your statistical software. The goal of this book is to give you the foundation to confidently make those choices for your specific application.

We will focus on two key goals in this book.

  1. Understand the basic ways to assess estimators With quantitative data, we often want to make statistical inferences about some unknown feature of the world. We use estimators (which are just ways of summarizing our data) to estimate these features. One major goal of this book is to show the basics of this task at a general enough level to be applicable to almost any estimator that you are likely to encounter in research. The ideas of bias, sampling variance, consistency, and asymptotic normality are common to such a large swath of (frequentist) inference that you get a tremendous return on your investment of time in these topics. Understand these core ideas and you will have a language to analyze any fancy new estimator that pops up in the next few decades.

  2. Apply these ideas to estimation of regressions This book will apply these ideas to one particular workhorse task in the social sciences: estimating regression functions. So many methods are either use regression estimators like ordinary least squares or extend it in some way. Understanding how these estimators work is vital for conducting research in the social sciences. Regression and regression estimators also provide an entry point for discussing parametric models explicitly as approximation and projections rather than as rigid assumptions about the truth of a given specification.

Why write a book on statistics and regression when so many already exist? Aside from hubris, my goal in this book is to find a level of mathematical sophistication that will challenge and push political scientists to develop stronger foundations in the material. While some textbooks at this level exist in statistics and economics, they tend to focus on applications less relevant to political science. This book attempts to correct this.