Thursday, June 16, 2016

Naming, not shaming: Criticising a weak result is not the same as launching a personal attack

You are working on a theoretical paper about the proposed relationship between X and Y. A two-experiment study has previously shown that X and Y are correlated, and you are trying to explain the cognitive mechanisms that drive this correlation. This previous study makes conclusions based on partial correlations which take into account a moderator that has not been postulated a priori; raw correlations are not reported. The p-values for each of the two partial correlations are < 0.05, but > 0.04. In a theoretical paper, you stress that although it makes theoretical sense that there would be a correlation between these variables, we cannot be sure about this link.

In a different paradigm, several studies have found a group difference in a certain task. In most studies, this group difference has a Cohen’s d of around 0.2. However, three studies which all come from the same lab report Cohen’s ds ranging between 0.8 and 1.1. You calculate that it is very unlikely to obtain three huge effects such as these by chance alone (probability < 1%). 

For a different project, you fail to find an effect which has been reported by a previously published experiment. The authors of this previous study have published their raw data a few years after the original paper came out. You take a close look at this raw data, and find some discrepancies with the means reported in the paper. When you analyse the raw data, the effect disappears.

What would you do in each of the scenarios above? I would be very happy to hear about it in the comments!

From each of these scenarios, I would draw two conclusions: (1) The evidence reported by these studies is not strong, to say the least, and (2) it is likely that the authors used what we now call questionable research practices to obtain significant results. The question is what we can conclude in our hypothetical paper, where the presence or absence of the effect is critical. Throwing around accusations of p-hacking can turn ugly. First, we cannot be absolutely sure that there is something fishy. Even if you calculate that the likelihood of obtaining a certain result is minimal, it is still greater than zero – you can never be completely sure that there really is something questionable going on. Second, criticising someone else’s work is always a hairy issue. Feelings may get hurt, and the desire for revenge may arise; careers can get destroyed. Especially as an early-career researcher, one wants to stay clear of close-range combat.

Yet, if your work rests on these results, you need to make something of them. One could just ignore them – not cite these papers, pretend they don’t exist. It is difficult to draw conclusions from studies with questionable research practices, so they may as well not be there. But ignoring relevant published work would be childish and unscientific. Any reader of your paper who is interested in the topic will notice this omission. Therefore, one needs to at least explain why one thinks the results of these studies may not be reliable.

One can’t explain why one doesn’t trust a study without citing it – a general phrase such as: “Previous work has shown this effect, but future research is needed to confirm its stability” will not do. We could remain general in our accusations: “Previous work has shown this effect (Lemmon & Matthau, 2000), but future research is needed to confirm its stability”. This, again, does not sound very convincing.

There are therefore two possibilities: either we drop the topic altogether, or we write down exactly why the results of the published studies would need to be replicated before we would trust them, kind of like what I did in the examples at the top of the page. This, of course, could be misconstrued as a personal attack. Describing such studies in my own papers is an exercise involving very careful phrasing and proofreading for diplomacy by very nice colleagues. Unfortunately, this often leads to the watering down of arguments, and tip-toeing around the real issue, which is the believability of a specific result. And when we think about it, this is what we are criticising – not the original researchers. Knowledge about questionable research practices is spreading gradually; many researchers are still in the process of realising that they can really damage a research area. Therefore, judging researchers for what they have done in the past would be neither productive, nor wise.

Should we judge a scientist for having used questionable research practices? In general, I don’t think so. I am convinced that the majority of researchers don’t intend to cheat, but they are convinced that they have legitimately maximised their chance to find a very small and subtle effect. It is, of course, the responsibility of a criticiser to make it clear that a problem is with the study, not with the researcher who conducted it. But the researchers whose work is being criticised should also consider whether the criticism is fair, and respond accordingly. If they are prepared to correct any mistakes – publishing file-drawer studies, releasing untrimmed data, conducting a replication, or in more extreme cases publishing a correction or even retracting a paper – it is unlikely that they will be judged negatively by the scientific community, quite on the contrary.

But there are a few hypothetical scenarios where my opinion of the researcher would decrease: (1) If the questionable research practice was data fabrication rather than something more benign such as creative outlier removal, (2) if the researchers use any means possible to suppress studies which criticise or fail to replicate their work, or (3) if the researchers continue to engage in questionable research practices, even after they learn that it increases their false-positive rate. This last point bears further consideration, because pleading ignorance is becoming less and less defensible. By now, a researcher would need to live under a rock if they have not even heard about the replication crisis. And a good, curious researcher should follow up on hearing such rumours, to check whether issues in replicability could also apply to them.


In summary, criticising existing studies is essential for scientific progress. Identifying potential issues with experiments will save time as researchers won’t go off on a wild-goose-chase for an effect that doesn’t exist; it will help us to narrow down on studies which need to be replicated before we consider that they are backed up by evidence. The criticism of a study, however, should not be conflated with criticism of the researcher – either by the criticiser or by the person being criticised. A strong distinction between the criticism of a study versus criticism of a researcher would result in a climate where discussions about reproducibility of specific studies will lead to scientific progress rather than a battlefield.

2 comments:

  1. Excellent post, captures the problems with this sort of criticism nicely. Criticism is essential but I agree people can take it personally (especially if it's their idea or the study that has made them famous). We need to try and be objective (which is obviously difficult). Perhaps before publishing you could try post-publication peer review (if the journal it's published in allows it). Alternatively you could contact the authors politely outlining your concerns with the paper? Then you can publish your clear but not overly harsh criticisms confident you've engaged in a dialogue with them (or at least tried). Someone (can't remember who, will try and find out) argued we should have a "year zero" rule: all past instances of QRP's are forgiven and we start afresh. I like this idea and it will get more people to be open about their past use of QRP's. I agree that 2 and 3 would also lower my estimation of a researcher. I would also add anyone who refuses to admit they used QRP's, even when there is evidence to the contrary. Of course there will be some who legitimately haven't but they are some common and so easy to do I'd be surprised.

    ReplyDelete
  2. As a very, very early career researcher, I find a good part of the discussion about QRP intimidating, and I greatly appreciate your point of view! I know it is a lot to ask, but if I were one of the authors in the examples above, I would probably appreciate being contacted as well. The fact that I published something (has yet to happen in the field of my thesis, so purely hypothetical) doesn't mean that I stop thinking about it. As it is likely that I still work on the same topic, I might have found room for improvement myself. A direct discussion of potential issues could thus be useful for your paper or even the research topic in question in general, while also giving me the possibility to save my face, if only "behind the scenes".

    ReplyDelete