Metrics News

  • Sharing of participant-level clinical trial data has potential benefits, but concerns about potential harms to research participants have led some pharmaceutical sponsors and investigators to urge caution. Little is known about clinical trial participants’ perceptions of the risks of data sharing.
  • Randomized controlled trials (RCTs) have devoted admirers and critics across diverse disciplines. A long-standing debate exists about their relative strengths and weaknesses. In a very thoughtful paper, Deaton and Cartwright (2018) present the latest sequel to these meth- odological ambushes. The points raised are not new, but the overview is extremely helpful. It is also timely, because under the “real-world evidence” buzzword (Miksad and Abernethy, 2018), regulators and other stakeholders seek ways to replace the dominance of traditional RCTs. While real-world evidence” includes RCTs (pragmatic/naturalistic ones), it is mostly fed by spurious non-randomized data (Hemkens et al., 2016). Is this a desirable change in study design priorities or a looming disaster? The current commentary will try to address some key emerging issues.
  • P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.
  • Not all scientific information is created equal. Large differences exist across topics on how much is known, and with what degree of certainty. Some questions are more difficult to answer, and some research tools are more reliable than others. Not all methods can be applied to answer every question. Credibility depends [1] on how large and rigorous studies are, how well researchers have contained conflicts of interest (financial or other), and how successfully the study design and analysis have limited bias, properly accounting for the complexity inherent in each scientific question. Coordinated efforts among scientists instead of furtive competition help improve the odds of success. Transparency with full sharing of data, protocols and computer codes improves trust in research findings. Re-analysis of data by independent teams adds to that trust and replication in new studies further enhances it.
  • he definition of “normal” values for common laboratory tests often governs the diagnosis, treatment, and overall management of tested individuals. Some test results may depend on demographic traits of the tested population including age, race, and sex. Ideally, laboratory test results should be interpreted in reference to a population of “similar” “healthy” individuals. In many settings, however, it is unclear exactly who these individuals are. How much population stratification and what criteria for healthy individuals are optimal?