SPORTSCIENCE · sportsci.org

Latest issue

Perspectives / Research Resources

This issue

Advice on the Use of MBI: a Comment on The Vindication of Magnitude-Based Inference

Will G Hopkins, Alan M Batterham

Sportscience 22, sportsci.org/2018/CommentsOnMBI/wghamb.htm, 2018
Victoria University, Melbourne, Australia; Teesside University, Middlesbrough, UK.
william.hopkins@vu.edu.au, A.Batterham@tees.ac.uk

Summary: The editor of Medicine and Science in Sports and Exercise has ordered rejection of manuscripts containing magnitude-based inference (MBI). We therefore advise authors submitting such manuscripts to that journal to describe their inferences as reference Bayesian with a dispersed uniform prior, which identifies MBI with a formal "objective" Bayesian method. Decisions about the true magnitude of an effect can be justified by referring to the existing MBI threshold probabilities, which are similar to but more conservative than those used by the Intergovernmental Panel on Climate Change. We also counter the latest attack on MBI by a journalist and explain how a return to null-hypothesis significance testing will reduce the generalizability of meta-analyses and impair the career development of some young researchers.

For those who missed recent news on social media, Bruce Gladden, the editor-in-chief of Medicine and Science in Sports and Exercise (MSSE), has instructed his associate editors not to accept manuscripts with effects assessed using magnitude-based inference (MBI). The news that he currently intends to enshrine this decision in journal policy was announced in the latest ill-conceived attack on MBI by a journalist, Christie Aschwanden. We say ill-conceived rather than ill-informed, because Aschwanden was well informed by us prior to publication of her news. More on that at the end of this article. First, though, we may still be able to publish inferences about the magnitudes of effects in MSSE. Let's see why.

MSSE has published a letter ahead of print (Borg et al., 2018), in which the authors call for the use of Bayesian inference as an answer to the apparent problems of MBI that Kristin Sainani (2018) identified in her critique and that spawned attacks on MBI online (see our earlier comment for links). Others have previously recommended use of Bayesian inference in sport science (Mengersen et al., 2016), including Sainani herself. Presumably, therefore, the editor of MSSE will not order "desk" rejection of manuscripts containing Bayesian inference.

What is Bayesian inference? Very simply, you make probabilistic statements about the magnitude of the true or population value of the particular effect statistic that you have investigated. Sound familiar? Yes, it's MBI. MBI is Bayesian, so what is the problem? The problem is that Sainani and authors of a previous critique of MBI (Welsh and Knight, 2015) made the astonishing claim that MBI is not Bayesian, and because Sainani and those previous authors are card-carrying statisticians at top institutions (Stanford and the Australian National University), people who have not looked closely at the literature assume that MBI is not Bayesian. We point you now to earlier references written by or for clinicians, and a recent comment by statistician Roderick Little, where it is stated unequivocally that the estimates of probabilities of the magnitude of effects provided by the same method as MBI are valid Bayesian estimates: Burton (1994), Burton et al. (1998), Gurrin et al. (2000), Shakespeare et al. (2001), Shakespeare et al. (2008), and Little (2018). Furthermore, the estimates are obtained with the same straightforward calculations used for p values and confidence limits. This quote from Gurrin et al. (2000) is a compelling summary: "The congruence between a Bayesian analysis using a uniform prior and a conventional analysis provides a non-threatening introduction to Bayesian methods and means that analyses of the type we describe can be carried out on standard software. Our approach is straightforward to implement, offers the potential to describe the results of conventional analyses in a manner that is more easily understood, and leads naturally to rational decisions."

There is, however, a subtle issue we need to address. Bayesian statisticians are true to the spirit of their founder, Thomas Bayes, according to whose eponymous theorem the probability of something (e.g., that a treatment is beneficial) can be derived from the data of your study combined with the probability of the something before you got your data. Modern-day Bayesians have worked out ways to express the latter probability, known as a prior, and to combine it with the usual data from the study of a sample to get the posterior probability. The issue here is that it is difficult to justify the prior objectively, so Bayesians quite rightly regard the prior as a belief, and the posterior probability gets labelled with terms like credibility.

Fine, but that doesn't solve the problem of quantifying the prior belief. To give you an example, at the recent conference of the European College of Sport Science, one of us (WGH) presented on behalf of a Chinese colleague, FeiFei Li, a study of the effects of high-intensity continuous vs intermittent exercise on markers of cardiac damage in a crossover study of marathon runners. How can Feifei quantify her prior belief in what the change in biomarkers will be? It turns out she has to provide a probability distribution not only for the mean change but also for the variance of the change. It's unrealistic for her and any reasonably skeptical researcher to do that. For all she knows, the periods of low-intensity exercise between the intervals might more than compensate for the effects of the higher-intensity exercise during the intervals, or maybe there will be the opposite effect. And that's not all. Her analysis includes estimation of the modifying effect of the intensity of exercise on the change in the biomarkers. What is her prior belief about this effect? She has no idea, so she has to go into the analysis without a prior belief. If there were any studies on this effect in print, she might be able to do a meta-analysis of the outcomes in those studies to get an objective prior, but the posterior would then be relevant only to her study setting, whereas readers want an outcome generalizable to their setting and indeed to any setting. The meta-analysis, if it hasn't already been done, therefore should be done as a comprehensive random-effect meta-regression after her study.

For researchers who can't or won't provide priors, perhaps owing to the concern that the prior will not be acceptable to a skeptical scientific audience, there is a legitimate form of Bayesian inference known as reference or calibrated Bayes with a dispersed uniform prior. The prior here is minimally informative and is sometimes referred to as flat and objective. As noted above, it gives the same answers as MBI (e.g., Gurrin et al., 2000; see also Roderick Little's comment). So, if we refer to our inferences about magnitudes in this manner, Bruce Gladden and any other editors considering a ban on MBI should have no objection. Well, they may still have two objections, but these can be addressed.

The first potential objection is that, to perform Bayesian analysis with a dispersed uniform prior, we should use the formal mathematics of the Bayesians, otherwise it is not Bayesian. This was the basis of the assertions by Sainani and by Welsh and Knight that MBI was not Bayesian, and we consider it to be an unjustifiable technicality. Those previous authors who recommended straightforward MBI-type analyses referred to their inferences as Bayesian. Were they wrong, too?

The second potential objection takes a little longer to address. It arises from the fact that MBI provides advice for making decisions about the true magnitude of effects. Some Bayesians don't like making decisions; they are obviously concerned about exactitude with estimation of probabilities, but when it comes to making decisions, they are often silent. For example, we recently invited a noted critic of NHST, who was one of the reviewers of our article in Sports Medicine (Hopkins and Batterham, 2016), to support MBI. He declined, on the grounds that he prefers not to "dichotomize outcomes". Despite repeated entreaties, he would not be drawn on the responsibility of the risk-savvy statistician to advise clinicians and practitioners about implementation of a treatment. As practitioners of sport and exercise science, we accept responsibility for decisions, and we have devised decision guidelines based on thresholds for probability labeled with terms ranging from most unlikely through various levels of possibility to most likely. We have discovered very recently that these thresholds are remarkably similar to those used by the Intergovernmental Panel on Climate Change (IPCC; Mastrandrea et al., 2010), except that ours are a little more conservative. For example, they define about as likely as not with probabilities between 33% and 66%, whereas our possible is defined by 25% to 75%. Importantly, the IPCC qualify their scale with this comment: "About as likely as not should not be used to express a lack of knowledge." We take this caveat to mean that possible effects represent useful information.

We have two sets of decision guidelines. The first is for clinically or practically relevant effects–those that could be used to make a decision about implementation of a treatment that could benefit athletes, patients or clients (improve performance or health) or harm them (impair performance or health). If the effect is possibly beneficial (>25% chance) and most unlikely harmful (<0.5% risk), the effect is deemed clear and potentially implementable; if it is unlikely beneficial (<25% chance), it is deemed clear and not recommended for implementation; and if it is possibly beneficial but has an unacceptable risk of harm (>0.5%), it is deemed unclear and the researcher is advised to get more data before making a decision. There is a less conservative approach to clinical decision-making based on odds of benefit and harm, but you can read about that elsewhere (e.g., Hopkins et al., 2009; Hopkins and Batterham, 2016).

The second set of guidelines is for effects that are not implementable as treatments or strategies, such as a comparison of males and females. Here it is simply a matter of a substantial magnitude of the effect: a difference in endurance between females and males, for example. For such effects, it is only when a substantial or trivial difference is very likely (>95% chance) or very unlikely (<5% chance) that you declare the effect to be clear. For example, if the effect is very likely to be a substantial increase, it is clearly not trivial or a substantial decrease. Or for another example, if the effect is very unlikely to be a substantial decrease, it is clearly something else: some likelihood of trivial and/or substantially positive, depending on the probabilities of those effects. This description of clear and unclear non-clinical effects is consistent with our previous descriptions but is more concise.

Obviously you can make mistakes with your decisions: for example, the true effect is harmful, but you decide it is implementable; or for another example, the true effect is trivial, but you decide it is substantial. We investigated the error rates for all the clinical and non-clinical decisions, and we found them to be generally less and often much less than those associated with decisions about magnitude based on null-hypothesis significance testing. And the error rates are acceptable. The high error rates calculated by Sainani are based either on her own unjustifiable re-definitions of our errors or on the bizarre notion that a clear effect deemed possibly substantially positive and possibly trivial (it is clear, because it is very unlikely substantially negative) represents a Type-I error, if the true effect is trivial. These and other assertions aimed at discrediting MBI have been accepted as gospel by Christie Aschwanden, by the authors of the recent letter to MSSE (Borg et al., 2018), and by Bruce Gladden. We absolutely stand by our definitions of error and by the resulting error rates that we calculated in our Sports Medicine article (Hopkins and Batterham, 2016). For a thorough account of Sainani's erroneous errors, see our vindication article (Hopkins and Batterham, 2018).

Finally, the ill-conceived attack  by Christie Aschwanden. She contacted us for comment before she submitted her article to her editor. Here is what we sent her, with the plea that "it would be so cool if your editor allowed this message to be published verbatim." The text in red was omitted from her published item, presumably because it represents inconvenient truths:

What you regard as "shoddy statistics," and what has motivated the editor of MSSE to refuse manuscripts using magnitude-based inference (MBI), is probably the fact that MBI allows publication of some statistically non-significant effects. Apparently you and he have the mistaken idea that authors and readers of the publications of such effects will consider that the effects are "real" and that the literature is getting corrupted with fake findings. But such effects are published with probabilistic terms that properly reflect the uncertainty in the true magnitude: not only the confidence intervals but also qualitative terms such as possibly, likely, and so on. These are proper estimates of the uncertainty, because they are legitimate "reference" Bayesian estimates with a uniform dispersed prior, as evidenced in our rebuttal article at sportsci.org and in the associated post-publication comments. Kristin Sainani's assertion that MBI is not Bayesian is absolutely wrong, along with her estimates of error rates and other disgracefully incorrect assertions. Furthermore, effects published according to the rules of MBI do not corrupt the literature. In fact, the reverse is true: there is trivial publication bias with "clear" MBI effects, whereas there is substantial publication bias with "significant" effects. The qualitative probabilistic terms of MBI are based on a scale that is remarkably similar to a scale used by the Intergovernmental Panel on Climate Change to assess likelihood of climatological effects in terms that their readers and the public can understand. Apparently climatologists and sport scientists are so far the only researchers concerned with making decisions about effects informed by an understanding of probability of the true magnitudes rather than a p value based on the null hypothesis, which, incidentally, is always false.

In her news item, Aschwanden claimed that "over the years, statisticians have identified numerous problems with MBI." Her first problem was the critique of Welsh and Knight, which we comprehensively dismissed in our Sports Medicine article. The next problem is an interesting one: a claim that MBI has not been published in a recognized statistics journal and should not be used until it is. The articles promoting the computational approach and Bayesian interpretation that MBI uses were published in Statistics in Medicine (Burton, 1994) and in other journals specializing in the clinical application of statistics: Journal of Epidemiology and Community Health (Burton et al., 1998), Journal of Evaluation in Clinical Practice (Gurrin et al., 2000), and Medical Decision Making (Shakespeare et al., 2008). Our probability decision thresholds were supported by an informal survey of researchers on a mailing list, so it should not be surprising that they agree closely with the thresholds used by the IPCC. In any case, our extensive simulations showed that they resulted in acceptable error rates and trivial publication bias. In a message to the MSSE editor, we pointed out that the conversation should now be about the values of the decision thresholds, not error rates based on the impractical and ever-false null hypothesis. He has not responded to that suggestion.

Aschwanden then claimed that "if MBI were really the revolutionary new method that its inventors claim it is, it should be taken up among many fields, but Gladden notes with concern that MBI is only used in sports and exercise science." What is at first surprising is that the articles promoting the MBI equivalent in medical journals have not been cited to anything like the extent that our MBI articles have been cited. There is a good reason for the disparity. Medical researchers have been encouraged to interpret the traditional confidence interval in a Bayesian fashion, but they have not been provided with the tools for calculating the probabilities and making decisions based on chances of benefit and risk of harm. Instead, they are still stuck in the perceived rut of having to get statistical significance before they are prepared or permitted say anything more. Enlightened medical researchers might then interpret the magnitude of the lower and upper confidence limits, but the statistical packages do not automatically calculate probabilities above or below user-defined magnitude thresholds, and researchers can get their larger studies into print without the probabilities, so what's the point? MBI has become popular in exercise and sport science, because we have provided the tools, and the tools provide the researchers with an avenue for publishing previously unpublishable effects from small samples.

Aschwanden next went over the same old ground of Sainani's error rates, making the fatuous claim that a possibly substantial and possibly trivial effect incurs an error, if the true effect is trivial. She then made the patently false claim that "in practice, MBI may deem an intervention 'likely beneficial' even if the error bars show that it could be almost as likely to be useless".

It is dispiriting to note this quote from the MSSE editor near the end of Aschwanden's news: "We need better reproducibility, not less." Reproducibility is purely a question of error rates, and MBI stacks up better than the traditional method for making inferences about magnitudes that matter.

Following publication of Aschwanden's news item, we sent her the following message.

…You and Sainani are doing your best to prevent publication of small-scale studies in sport science and any other disciplines, and that is really, really bad. Why? For two reasons.

First, MBI-published effects from small-scale studies could contribute to a meta-analysis and thereby push the overall sample size up for that effect to something that gives definitive outcomes. The effects from MBI-published studies are NOT biased, so the meta-analysis will not be biased. MBI does not result in shoddy effects, Christie. It does not contaminate the literature. And provided the published small studies are not substantially biased–to repeat again, they aren't, with MBI–it's actually better to meta-analyze a large number of small studies than a few large studies, because you get better estimates of the modifying effects of study and subject characteristics and thereby better generalizability to more settings.

Secondly, and equally importantly, you are making it harder for research students to get publications, because they will need larger sample sizes to get significance, often impractically large, if we are talking about studies of competitive athletes. There will be many other disciplines where there is a similar problem with getting enough subjects. You probably have no idea how dispiriting it is to be a young researcher and get your manuscripts rejected… We have a climate of manuscript rejection rather than one of manuscript acceptance, engendered mainly by journals seeking ever higher impact factors. They take pride in a high rejection rate! MBI is a big step in the direction of getting more of the students' stuff into print. And it does not result in publication bias, for the last time.

The incredible irony of all this is that, when they do occasionally get statistical significance and manage to get their studies into print, the result is publication bias! If you don't understand that, you need to, ASAP, and then start setting the record straight by making your next quest the rehabilitation of magnitude-based inference.

We also sent the above message to the editor of MSSE. Neither he nor Aschwanden has replied so far.

To end on a positive note, we received the following message just before publishing the present comment, echoing Martin Buchheit's cri de coeur

I am a sport physiologist across the pond in Canada. I wanted to quickly reach out and thank you for all your work and resilience with MBIs to support us scientists, practitioners and educators. You have provided a more valid option for statistics in sport science and most importantly its application in the real world. Having recently completed my PhD, MBIs were a cornerstone for my work examining high performance athletes and utilizing its findings to support coaches and the integrative support team.

Realizing the importance of MBIs is much like Neo deciding to swallow the RED pill in the movie The Matrix. When making the decision to learn and understand MBIs, one can never revert to "conventional" methods unless forced to! You simply know better.

As a young and developing scientist, I believe it is our job to support fellow researchers. Please continue to stand tall with confidence. Time will allow the dust to settle, and your work will be left standing for everyone to see.

 

Borg DN, Minett GM, Stewart IB, Drovandi CC (2018). Bayesian methods might solve the problems with magnitude-based inference. A letter in response to Dr. Sainani. Medicine and Science in Sports and Exercise (in press), https://eprints.qut.edu.au/119403/

Burton PR (1994). Helping doctors to draw appropriate inferences from the analysis of medical studies. Statistics in Medicine 13, 1699-1713

Burton PR, Gurrin LC, Campbell MJ (1998). Clinical significance not statistical significance: A simple Bayesian alternative to p values. Journal of Epidemiology and Community Health 52, 318-323

Gurrin LC, Kurinczuk JJ, Burton PR (2000). Bayesian statistics in medical research: An intuitive alternative to conventional data analysis. Journal of Evaluation in Clinical Practice 6, 193-204

Hopkins WG, Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise 41, 3-12

Hopkins WG, Batterham AM (2016). Error rates, decisive outcomes and publication bias with several inferential methods. Sports Medicine 46, 1563-1573

Hopkins WG, Batterham AM (2018). The vindication of Magnitude-Based Inference. Sportscience 22, 19-27

Little R (2018). Calibrated Bayesian inference: a comment on The Vindication of Magnitude-Based Inference. Sportscience 22, sportsci.org/2018/CommentsOnMBI/rjl.htm

Mastrandrea MD, Field CB, Stocker TF, Edenhofer O, Ebi KL, Frame DJ, Held H, Kriegler E, Mach KJ, Matschoss PR, Plattner G-K, Yohe GW, Zwiers FW (2010). Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Consistent Treatment of Uncertainties. Intergovernmental Panel on Climate Change (IPCC): https://www.ipcc.ch/pdf/supporting-material/uncertainty-guidance-note.pdf

Mengersen KL, Drovandi CC, Robert CP, Pyne DB, Gore CJ (2016). Bayesian estimation of small effects in exercise and sports science. PloS One 11, e0147311

Sainani KL (2018). The problem with "magnitude-based inference". Medicine and Science in Sports and Exercise (in press)

Shakespeare TP, Gebski VJ, Veness MJ, Simes J (2001). Improving interpretation of clinical studies by use of confidence levels, clinical significance curves, and risk-benefit contours. Lancet 357, 1349-1353

Shakespeare TP, Gebski V, Tang J, Lim K, Lu JJ, Zhang X, Jiang G (2008). Influence of the way results are presented on research interpretation and medical decision making: the PRIMER Collaboration Randomized Studies. Medical Decision Making 28, 127-137

Welsh AH, Knight EJ (2015). "Magnitude-based Inference": A statistical review. Medicine and Science in Sports and Exercise 47, 874-884

Back to index of comments.

Back to The Vindication of Magnitude-Based Inference.

Published 15 July 2018.

©2018