The NYTimes ran an article this week about attitudes regarding working mothers. This time the conversation was sparked by critiques of Wendy Davis, running for governor of Texas, who was financially supported by her husband to attend Harvard Law School while her husband parented her two children in Texas. The article cited National Marriage Project director Brad Wilcox, who stated the following:
I took issue with this flip statement and the impression that he suggested women both want to be primary caretakers and will judge other women running for political office because of it. To my surprise, Wilcox responded to my (possibly regrettable) sarcastic tweet:
I followed up on the data he provided, which comes from this Pew Research article, Mothers and work: What's 'Ideal'? The question respondents were asked wasn't about judgments of other moms. They were simply asked: “Considering everything, what would be the ideal situation for you — working full time, working part time, or not working at all outside the home?” In fact, the Pew article goes on to describe this data as representative of a fluctuating economy, differences in personal economic circumstances, and indicates common challenges faced by dual-earner couples.
I'd also venture to guess that the differences in married and unmarried women's ideals has to do with not just economic characteristics, but also represents a selection effect. Conservative women are more likely to be married than their more liberal counterparts.
Secondly, a better question to assess political views of working moms should evoke a judgment response, not an attitude question about a personal situation. I like this question from the General Social Survey (GSS): "It is usually better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family." Respondents can strongly disagree, disagree, agree, or strongly agree. I graphed this over time for women with at least one child by marital status and political views.
Indeed, conservative married mothers are more likely to agree with this stereotypical division of labor than liberal mothers. Liberal mothers, married or not, are less likely to agree.
I'm no political pundit, but my guess is that married conservative women are not likely to vote for Wendy Davis, regardless of how she chose to balance parenting and career opportunities.
For my Advanced Statistics class this week, I had to choose a peer-reviewed journal article using regression methods and "evaluate the appropriateness of their statistical testing and of their reporting on the same criteria as used by McCloskey (1985), Gelman and Stern (2006), and Fidler and colleagues (2004)."
I chose the already critiqued article by Hawkins and colleges (2003) on the effect of government funded marriage promotion initiatives. Other scholars have previously pointed out concerns regarding this article such as arbitrarily creating two time periods for comparison and strategically manipulating an outlier (District of Columbia) in order to produce statistically significant results.
Here, I restrict my critique to inaccuracies as described in my class readings and share my response to the class assignment.
Substantive vs Statistical Significance
First, the authors used the words “significant” and “non-significant” throughout the results and discussion section (a total of 26 times). In some of these instances they clarified that significance was in reference to statistical significance, such as in this statement, “…all of the regression coefficients for the years 2006 – 2010 were statistically significant (pp 508)." However, differentiation between statistical significance and substantive significance as described by McLoskey was missing, as the majority of the time the authors were referencing statistical significance, not substantive significance in their analysis. In at least one instance, it was entirely unclear if the authors meant statistical of substantive significance:
“Our analyses found that cumulative per capita funding was associated with a small but significant decrease in the percentage of nonmarital births and children living with single parents, an increase in the percentage of children living with two parents, a decrease in the percentage of children who are poor or near poor, and an increase in the percentage of married adults in the population (but only for 2005 – 2010).”
Also of concern, the authors chose to set significant levels at .10 level, rather than the conventional .05 level. The authors argue a .10 level is appropriate "because the risk of a type II statistical error (a false negative) is relatively high with a sample of 51 cases, we adopted a .10 alpha for significance testing.” While Fisher’s p < .05 is arbitrary, they go on to conflate the likelihood of making a type II error with meaningful significance. Setting the p value at .10 only means that there is a 10 percent chance of observing a result when in fact no real effect exists. Given that their statistically significant findings depend on one outlier case (District of Columbia) and a higher threshold than conventional practice, I’m incredibly skeptical of their conclusion of meaningful difference.
At one point, they seek to address the problem of statistical significance and substantial significance by stating:
“Nevertheless, our study found statistically significant associations between per capita funding and several other important population-level outcomes. Still, one can ask where these associations are large enough to be substantively important. We address this question with reference to a particular high activity state: Oklahoma” (pp 510).
I find it odd and misleading to cherry-pick 1 unit of observation (Oklahoma) to make the connection between statistical and meaningful results. Their research question wasn't about changes in Oklahoma and other states (or states with more marriage promotion funding vs states without this funding) but rather a statistical analysis of a national program. Furthermore, the authors reviewed the changes in Oklahoma by talking about point increases and decreases in percentages on a number of indicators but never provided the actual percentages for reference nor did they provide a comparison of changes among states without marriage promotion spending. Thus, the reader is left to his or her own interpretation of whether a “3-point increase in the percentage of children living with two parents” in Oklahoma is substantially meaningful.
Comparing coefficients vs Conducting tests of difference
More problematic was the incorrect comparison of significant and non-significant coefficients versus conducting tests of difference between coefficients, as described by Gelman and Stern. While the two time periods are subjectively constructed as comparisons to start with, the authors made this error by contrasting the time periods by comparing the degree of statistical significance. This is evidenced by this statement regarding Table 2:
“Table 2 shows the results of regression analyses conducted separately for two time periods: 2000 – 2005 and 2006 – 2010. No regression coefficients for the years 2000 – 2005 were statistically significant. In contrast, the exception of percentage divorced, all of the regression coefficients for the years 2006 – 2010 were statistically significant” (pp 508).
As we know from Gelman and Stern, comparing significance levels is inappropriate and stating “2000- 2005 is not statistically significant but 2006 – 2010 is significant” is misleading. The authors failed to test the difference between the coefficients for the two time periods and across each of their variables of interest.
Confidence Intervals, Quantitative Precision, and Effect Size
Finally, they failed to convey quantitative information necessary to make meaning of the results as indicated by Fidler. Confidence intervals were missing entirely from the results although they did report standard errors in the tables and made occasional mention of them, “The table also reveals that the standard errors were considerably larger in the earlier period” (pp 509). However, their mention were limited to obvious statements of statistical analysis, rather than adding quantitative precision. Again, reporting the effect size was largely absent and mentioned in passing such as in this statement, “Although not significant, the coefficients for the remaining variables were in the same direction and comparable in magnitude to their counterparts in Table 2. The lack of significance can be explained by the larger standard errors.” The exact magnitude of the outcomes remained unstated.
Fidler, Fiona, Neil Thomason, Geoff Cumming, Sue Finch, and Joanna Leeman. 2004. “Editors Can Lead Researchers to Confidence Intervals, but Can’t Make Them Think: Statistical Reform Lessons from Medicine.” Psychological Science 15(2):119–26.
Gelman, Andrew, and Hal Stern. 2006. “The Difference between ‘significant’ and ‘not Significant’ Is Not Itself Statistically Significant.” The American Statistician 60(4):328–31.
Hawkins, Alan J., Paul R. Amato, and Andrea Kinghorn. 2013. “Are Government-Supported Healthy Marriage Initiatives Affecting Family Demographics? A State-Level Analysis.” Family Relations 62(3):501–13.
McCloskey, Deirdre N. 1985. “The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests.” American Economic Review 75(2):201–5.