Prediction of robust scientific facts from literature


The growth of published science in recent years has escalated the difficulty that human and algorithmic agents face in reasoning over prior knowledge to select the next experiment. This challenge is increased by uncertainty about the reproducibility of published findings. The availability of massive digital archives, machine reading, extraction tools and automated high-throughput experiments allows us to evaluate these challenges computationally at scale and identify novel opportunities to craft policies that accelerate scientific progress. Here we demonstrate a Bayesian calculus that enables positive prediction of robust scientific claims with findings extracted from published literature, weighted by scientific, social and institutional factors demonstrated to increase replicability. Illustrated with the case of gene regulatory interactions, our approach automatically estimates and counteracts sources of bias, revealing that scientifically focused but socially and institutionally diverse research activity is most likely to replicate. This results in updated certainty about the literature, which accurately predicts robust scientific facts on which new experiments should build. Our findings allow us to identify and evaluate policy recommendations for scientific institutions that may increase robust scientific knowledge, including sponsorship of increased diversity of and independence between investigations of any particular scientific phenomenon, and diversity of scientific phenomena investigated.

In Nature Machine Intelligence