In recent years some Conversation Analysts have moved from a purely qualitative approach to a mixed-method approach, in which Conversation Analytic findings are combined with quantitative methods, such as experiments. In a special issue of Research on Language and Social Interaction from 2017 a large group of scholars showed how CA can be used in the lab to study phenomena like gaze, blinking and turn-taking. De Ruiter and Albert go on to argue that the viability of CA as a method in essence relies on scholars connecting to other fields, combinging the strengths of different methodologies. Similarly in 2015 Tanya Stivers made an appeal that coding of data is not heretical for CA, which is a fundamental step to subsequently doing many experimental studies. While I applaud integration of methods—a point I argue for interactional linguistics in my dissertation—there is one crucial methodological point that bears pointing out.
Distributional evidence
The first question we ask is whether coding is possible at all. As Stivers herself points out, coding massively reduces the complexity of a phenomenon under investigation, while CA's primary strength is in explicating a phenomenon in all its complexity. But, she goes on to argue, conversation analysts already code, albeit not in the way we may typically understand coding. We study specific practices in specific sequential environments: "characterizations of practices are necessarilly specific with respect to composition and poition. And subtypes of practices are further specified." These features could serve as a basis for formal coding.
Additionally, Stivers argues, CA already relies on distributional evidence to support its findings. Relative frequencies are indicative of various preferences, such as the one for self- over other-correction, minimization in person reference, and recogitional reference. In these and other papers the authors rely on characterizations in terms of "massively", "quite common", and "scarcely ever". The idea being that if a phenomenon is rare, there cannot possibly be a preference, so when we find a skewed distribution of self-correction over other-correction, this provides evidence of a preference for the former.
Intuitively this makes sense. If people do A a lot more often than B, then A is obviously preferred over B. I bike to work far more often than I take the bus, hence there is a preference for taking the bike. The problem is—and this is a persistent problem with frequentist analyses—the reasoning goes the wrong way. If there is a preference, we expect there to be a skewed distribution; a skewed distribution is thus a symptom and indicative of a preference. But if all we see is a skewed distribution, than the preference is only one among a possibly large number of explanations.
Additionally, Stivers argues, CA already relies on distributional evidence to support its findings. Relative frequencies are indicative of various preferences, such as the one for self- over other-correction, minimization in person reference, and recogitional reference. In these and other papers the authors rely on characterizations in terms of "massively", "quite common", and "scarcely ever". The idea being that if a phenomenon is rare, there cannot possibly be a preference, so when we find a skewed distribution of self-correction over other-correction, this provides evidence of a preference for the former.
Intuitively this makes sense. If people do A a lot more often than B, then A is obviously preferred over B. I bike to work far more often than I take the bus, hence there is a preference for taking the bike. The problem is—and this is a persistent problem with frequentist analyses—the reasoning goes the wrong way. If there is a preference, we expect there to be a skewed distribution; a skewed distribution is thus a symptom and indicative of a preference. But if all we see is a skewed distribution, than the preference is only one among a possibly large number of explanations.
Circular reasoning
Note at this point that obviously a skewed distribution can be taken as evidence of a preference. If we develop a theory that predicts a skewed distribution, and we find that distribution, this distribution does indeed function as evidence for our theory, and we simultaneously have evidence against any theory that does not predict such a distribution. But a skewed distribution does not help us distinguish between theories that predict that distribution.
I a recent Perspective paper in Nature Human Behavior Michael Muthukrishna en Joseph Henrich provide a pretty intuitive analogy. If we walk into a room and we find a broken vase, there are different explanations: wind coming from the window, a rowdy child, a playful cat, etc. Just having a broken vase does not help us distinguish between these and other explanations, so it cannot in and of itself be understood as evidence for e.g. the cat having gone on rampage and not e.g. a very stormy afternoon.
And here we come to the central issue: CA as a primarily inductive methods develops its theories from the data. This means that we get the idea for, for example, an interactional preference for self-repair because we find that self-repair is far more frequent in our data than other-repair. But at this point all we have done is make an observation. We cannot then decide to develop a theory that fits those data, and claim that the data are evidence of the theory. That would be the essence of circular reasoning.
Now fortunately, and Stivers points this out as well, when developing the notion of a preference for self-repair, Schegloff et al. try to account for the skewed distribution by exploring the data itself. As far as I'm aware, this is what Conversation Analysts always do; we provide an argument from the data. And that is obviously exactly what we'd want to do: we observe something strange in the data and we want to see if there is a clear and coherent explanation for that observation. There need not be; our observation may be a coincidence. As Stivers says, "patterns suggest that there may be a preference"; nothing more. And indeed, this is a typical problem with observed frequencies: they are an accidential result of the situation(s) in which the data were recorded. But the only way to find out is to study the data themselves, not the distributions in those data.
I a recent Perspective paper in Nature Human Behavior Michael Muthukrishna en Joseph Henrich provide a pretty intuitive analogy. If we walk into a room and we find a broken vase, there are different explanations: wind coming from the window, a rowdy child, a playful cat, etc. Just having a broken vase does not help us distinguish between these and other explanations, so it cannot in and of itself be understood as evidence for e.g. the cat having gone on rampage and not e.g. a very stormy afternoon.
And here we come to the central issue: CA as a primarily inductive methods develops its theories from the data. This means that we get the idea for, for example, an interactional preference for self-repair because we find that self-repair is far more frequent in our data than other-repair. But at this point all we have done is make an observation. We cannot then decide to develop a theory that fits those data, and claim that the data are evidence of the theory. That would be the essence of circular reasoning.
Now fortunately, and Stivers points this out as well, when developing the notion of a preference for self-repair, Schegloff et al. try to account for the skewed distribution by exploring the data itself. As far as I'm aware, this is what Conversation Analysts always do; we provide an argument from the data. And that is obviously exactly what we'd want to do: we observe something strange in the data and we want to see if there is a clear and coherent explanation for that observation. There need not be; our observation may be a coincidence. As Stivers says, "patterns suggest that there may be a preference"; nothing more. And indeed, this is a typical problem with observed frequencies: they are an accidential result of the situation(s) in which the data were recorded. But the only way to find out is to study the data themselves, not the distributions in those data.
Evidence
It should be clear that in the inductive approach to Conversation Analysis, we cannot use frequency distributions, either exact or discriptive, as evidence for our analyses—neither definitive, nor partial as Stivers calls it. They are the observations on which we build our analyses, they are not the analyses themselves. Doing inferential statistical analyses on them is just bad science, and while this is fortunately not (or rarely?) done in CA, it serves to warn against such a woeful misunderstanding of what statistics and logic can offer. That is not to say distributions can never serve as evidence, but we would need to first develop a theory (possibly based on observed distributions), then collect data to test that theory, and only then could we use a distribution as evidence of our theory. Since CA is generally limited to the first step in this process, we should not pretend that our distributions in any way evidence our theories.