Researchers call for big data infrastructure to support future of personalized medicine

Researchers from the George Washington University (GW), the U.S. Food and Drug Administration (FDA), and industry leaders published in PLOS Biology, describing a standardized communication method for researchers performing high-throughput sequencing (HTS) called BioCompute.

HTS is a catalyst for novel drug development and personalized medicine inroads. However, this new technology has outpaced the development of much-needed infrastructure around its use. Lead author Raja Mazumder, PhD, associate professor of biochemistry and molecular medicine at the GW School of Medicine and Health Sciences, and colleagues call for a big data environment where genomic findings are robust and reproducible, and experimental data captured adheres to findable, accessible, interoperable, and reusable guiding principles.

"Without standards or infrastructure around this new technology, we are left with a poor foundation for future work," said Mazumder. "Instead of focusing on new discovery, we will be burdened with inefficiencies. Robust and reproducible data analysis is key to the future of personalized medicine, which is why we need to create a standard moving forward."

Mazumder, colleagues at the FDA, and several industry leaders collaborated on the BioCompute Object Specification Project, which enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This project includes a framework, which facilitates communication and promotes interoperability. The standard is freely accessible as a GitHub organization.

"Without an infrastructure like the BioCompute Object, we will create silos of unusable data, making building upon this research more difficult," said Mazumder. "We hope creating a standard now will clear this potential bottleneck. The initiatives discussed in our article aim to make data and analyses communicable, repeatable, and reproducible to facilitate collaboration and information sharing from data producers to data users."

"Enabling precision medicine via standard communication of HTS provenance, analysis, and results" was published in PLOS Biology. Collaborators include researchers from Otsuka Pharmaceutical Development & Commercialization, Inc., Merck & Co., Inc., Harvard Medical School, and more.

"The BioCompute standards and related consortium has flourished under Dr. Mazumder's leadership and is now at an inflection point -- with interest in adoption across various industries," said Gil Alterovitz, PhD, assistant professor at Harvard Medical School and the Computational Health Informatics Program at Boston Children's Hospital, and co-author on the publication.

Alterovitz G, Dean D, Goble C, Crusoe MR, Soiland-Reyes S, Bell A, Hayes A, Suresh A, Purkayastha A, King CH, Taylor D, Johanson E, Thompson EE, Donaldson E, Morizono H, Tsang H, Vora JK, Goecks J, Yao J, Almeida JS, Keeney J, Addepalli K, Krampis K, Smith KM, Guo L, Walderhaug M, Schito M, Ezewudo M, Guimera N, Walsh P, Kahsay R, Gottipati S, Rodwell TC, Bloom T, Lai Y, Simonyan V, Mazumder R.
Enabling precision medicine via standard communication of HTS provenance, analysis, and results.
PLoS Biol. 2018 Dec 31;16(12):e3000099. doi: 10.1371/journal.pbio.3000099.