I chose the already critiqued article by Hawkins and colleges (2003) on the effect of government funded marriage promotion initiatives. Other scholars have previously pointed out concerns regarding this article such as arbitrarily creating two time periods for comparison and strategically manipulating an outlier (District of Columbia) in order to produce statistically significant results.
Here, I restrict my critique to inaccuracies as described in my class readings and share my response to the class assignment.
Substantive vs Statistical Significance
First, the authors used the words “significant” and “non-significant” throughout the results and discussion section (a total of 26 times). In some of these instances they clarified that significance was in reference to statistical significance, such as in this statement, “…all of the regression coefficients for the years 2006 – 2010 were statistically significant (pp 508)." However, differentiation between statistical significance and substantive significance as described by McLoskey was missing, as the majority of the time the authors were referencing statistical significance, not substantive significance in their analysis. In at least one instance, it was entirely unclear if the authors meant statistical of substantive significance:
“Our analyses found that cumulative per capita funding was associated with a small but significant decrease in the percentage of nonmarital births and children living with single parents, an increase in the percentage of children living with two parents, a decrease in the percentage of children who are poor or near poor, and an increase in the percentage of married adults in the population (but only for 2005 – 2010).”
At one point, they seek to address the problem of statistical significance and substantial significance by stating:
“Nevertheless, our study found statistically significant associations between per capita funding and several other important population-level outcomes. Still, one can ask where these associations are large enough to be substantively important. We address this question with reference to a particular high activity state: Oklahoma” (pp 510).
Comparing coefficients vs Conducting tests of difference
More problematic was the incorrect comparison of significant and non-significant coefficients versus conducting tests of difference between coefficients, as described by Gelman and Stern. While the two time periods are subjectively constructed as comparisons to start with, the authors made this error by contrasting the time periods by comparing the degree of statistical significance. This is evidenced by this statement regarding Table 2:
“Table 2 shows the results of regression analyses conducted separately for two time periods: 2000 – 2005 and 2006 – 2010. No regression coefficients for the years 2000 – 2005 were statistically significant. In contrast, the exception of percentage divorced, all of the regression coefficients for the years 2006 – 2010 were statistically significant” (pp 508).
Confidence Intervals, Quantitative Precision, and Effect Size
Finally, they failed to convey quantitative information necessary to make meaning of the results as indicated by Fidler. Confidence intervals were missing entirely from the results although they did report standard errors in the tables and made occasional mention of them, “The table also reveals that the standard errors were considerably larger in the earlier period” (pp 509). However, their mention were limited to obvious statements of statistical analysis, rather than adding quantitative precision. Again, reporting the effect size was largely absent and mentioned in passing such as in this statement, “Although not significant, the coefficients for the remaining variables were in the same direction and comparable in magnitude to their counterparts in Table 2. The lack of significance can be explained by the larger standard errors.” The exact magnitude of the outcomes remained unstated.
Fidler, Fiona, Neil Thomason, Geoff Cumming, Sue Finch, and Joanna Leeman. 2004. “Editors Can Lead Researchers to Confidence Intervals, but Can’t Make Them Think: Statistical Reform Lessons from Medicine.” Psychological Science 15(2):119–26.
Gelman, Andrew, and Hal Stern. 2006. “The Difference between ‘significant’ and ‘not Significant’ Is Not Itself Statistically Significant.” The American Statistician 60(4):328–31.
Hawkins, Alan J., Paul R. Amato, and Andrea Kinghorn. 2013. “Are Government-Supported Healthy Marriage Initiatives Affecting Family Demographics? A State-Level Analysis.” Family Relations 62(3):501–13.
McCloskey, Deirdre N. 1985. “The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests.” American Economic Review 75(2):201–5.