Be Less Wrong
Why Use Quantitative Methods?
References
Footnotes
An analogous process can occur with multilevel data, in which there are often many groups, such as many schools, many neighborhoods, or many countries. Failure to account for the grouping of the data–in schools, neighborhoods or countries–can sometimes lead to dramatically incorrect results (Gelman et al. 2007; Nieuwenhuis 2015; Lang and Bliese in press).↩︎
Since developing this tutorial, I’ve been reminded in some conversations about additional issues. For example, in this tutorial, I’m arguing for including as many control variables as possible. However, for some social issues, only small samples are available. Such small samples may be statistically underpowered, and may not have sufficient sample size to include many different control variables.↩︎
Additionally, since developing this tutorial, I’ve also been reminded that one must be careful and thoughtful about choosing control variables. As a simple example, consider a hypothetical situation in which \(x\) is a cause of \(y\): \(x \rightarrow y\). If \(m\) is a mediator of the relationship between \(x\) and \(y\), then including \(m\) in one’s statistical model changes the meaning of the estimate of \(x\). \(\beta_x\) is now an estimate of the direct effect of \(x\) on \(y\), accounting for the presence of \(m\). There may be an indirect effect of \(x\) on \(y\) through \(m\) (\(x \rightarrow m \rightarrow\) y) that needs to be accounted for using special procedures (CF Westreich and Greenland 2013). Including a control variable \(c\) that is a function of both \(x\) and \(y\) (\(x \rightarrow c \ \& \ y \rightarrow\) c) may introduce additional complications (Elwert and Winship 2014).↩︎
Further, one could argue, somewhat convincingly, that an RCT (randomized controlled trial) would solve the major issue inspiring this presentation. By randomly assigning study participants to a treatment and control group, we would avoid the possibility that our results could be statistically confounded by other factors, and thus avoid the possibility that our results would flip or substantially change as we add more variables to the model. However, what is not often enough acknowledged is that RCT’s are often based upon small clinically available or conveniently available samples that may not generalize well to other populations or people. Large observational studies with diverse populations–and models with many appropriate control variables–certainly have their role.↩︎