Blog Archives

Damn lies, and statistics

9/17/2018

In many scientific fields number are the golden standard: if you cannot quantify what you're investigating, you're not doing (scientific) research. While I won't dispute the value of quantification, this has always struck we and my qualitative colleagues as a strange obsession. Numbers, after all, have no meaning in and of themselves. How do you know what you should quantify and what adequate means of quantification are. And once you've done quantification, how should you interpret those findings?

Causal or unifying

In the hard sciences these problems have been largely addressed for centuries. Physics, genetics, astronomy: none of them have problems interpreting the meanings of numbers and combining qualitative and quantitative findings, because their goal is to produce unifying theories. In the social sciences and humanities on the other hand, most scholars are busy investigating causal mechanisms: if I put a pen in my mouth, will I be happier; if I put a graphic warning label on sigaret packages, will that cause fear; if there is dissonance between word and picture, will that affect speech production, etc. (Dr Trafimow provides a comprehensible explanation of these two different methods.)

The results of these studies may be very interesting—or they be entirely useless—but they do not contribute to a deeper understanding of the human condition. At least, not by themselves. Without a proper underlying theory, findings are nothing more than mere curiosities. One could build a digital museum and maybe do something interactive for children with them, but that's about the extent of their use.

There are of course disciplines in the Humanities and Social Sciences that are supported by proper theories. Generative syntax is a prime example. You can disagree with the field based on its findings or its maxims, but you cannot deny that one reason that it's been so productive for the past sixty years, is that there is an underlying theory that is continuously being refined or revised, and that is to account for human language. A proper theory makes the field inherently superior to many of its competitors that do nothing more than ad hoc analysis. (Although, to be fair, there is plenty of ad hoc analysis going on in generative syntax as well. But even the hard sciences are guilty of that sometimes.)

Anecdotes

he problem with how to interpret numbers came to the forefront again this week. After the US Open final, in which Serena Williams claimed the umpire treated her unfairly because she is woman, the question was whether women are indeed treated more unfairly than men on the court. For some prominent feminists this was not a matter of discussion: Billy Jean King claimed that women who speak out on the court are 'hysterical' whereas men are just 'outspoken'. She, and others like even the WTA, saw the incident as a larger problem of sexism in tennis. And while I do not dispute that women are treated differently than men, I would like at least some form of evidence that this was not a one off.

The days following the incident were livened up by a bunch of anecdotal "evidence". It did not take long for men to point out that, contrary to claims by Williams' coach, Ramos has been just as strict to top men like Rafael Nadal. He has been one of few umpires who has repeatedly sanctioned Nadal for taking too long in between points. An infraction that is barely more sever than a coaching violation, for which Williams was punished. But of course, there were also cases where Ramos did not sanction men for behavior similar to that of Williams. Both sides claimed their points vindicated, showing just how useless anecdotal evidence is.

The New York Times attempted to resolve the issue definitively by investigating the past twenty years. They counted all the times men were penalized and compared that to all the times women were penalized. After correcting for the fact that men play more tennis—best of 5 instead of best of 3—and there are more men in the qualifications at all slams except the US Open—128 men vs 96 women—they claimed that men were still penalized more. Interestingly enough, the only exception was coaching, for which women were more often sanctioned.

What do numbers mean?

Should we conclude from these numbers that Serena, Billy Jean King, and the WTA were wrong? That women are not sanctioned more severely than men, and thus that Ramos' penalty was not a symptom of an underlying problem of sexism? The answer will not surprise you: it's a resounding no. While it's great that the NYT has gathered all these data, they are meaningless. And no statistical test in the world could give them meaning. The problem is that there may be a lot of underlying factors. Men may be more outspoken than woman are on the court. There may be a few men, like Nick Kyrgios and Benoit Paire, both of whom have faced severe financial penalties and even suspensions for their behavior, who completely mess up the numbers. We simply don't know.

These kind of articles reinforce the methodological point that you cannot do quantitative analysis without also doing qualitative analysis (and vice versa). In this case, the numbers provided by the NYT would be a nice starting point to figure out if something is really going on—are men treated more unfairly?—and if so, what. Every case has to be appreciated for its individual relevance: no two sanctions are alike. That sounds like a massive amount of work, and it is, but without such an investigation, all we have are numbers. Just as without those numbers, all we have is a series of anecdotes. Real evidence comes in the form of theories that are both qualitatively and quantitatively supported.

Damn lies, and statistics

Causal or unifying

Anecdotes

What do numbers mean?

Info

Archives