What do the Phanerozoic eon and Precision Medicine have in common? For one thing: the statistical analyses of Alan Hubbard, UC Berkeley Associate Professor of Biostatistics, and head of the Division of Biostatistics at the School of Public Health.
Having begun his research career studying the epidemiology of the fossil record at Virginia Polytechnic University, Hubbard employed statistical methods on a large, fossil record data set, providing strong statistical support for “unusual” events causing three mass extinctions in the Phanerozoic eon, a geological period that began 541 million years ago. Hubbard became interested in the potential applications of big data (massive data sets that can be analyzed using powerful computation to reveal patterns or trends) for other fields, and, encouraged by the possibility of “doing some good,” as Hubbard modestly put it, in the study of human health, pivoted his academic focus to Public Health and Biostatistics.
Hubbard has since found his research niche in the utilization of statistical and machine learning techniques to develop diagnostic tools that enable evolving prediction of outcomes after traumatic injury. This work branches off from the tree of Precision Medicine, an emerging approach to disease and injury treatment and prevention that proposes to tailor medical care to the characteristics of the patient.
Precision medicine is driven by the use of medical diagnostic tools, including software that guides treatment decisions by comparing a patient’s characteristics with those of many others who have had similar symptoms or injuries. With a diverse team, including Professor Mark van der Laan and a team of talented graduate students in the School of Public Health, as well as Chris Kennedy, a doctoral student in the Department of Political Science, Hubbard is working to improve his prognosis software, making it increasingly accurate in functionally mimicking the implicit understanding of an experienced doctor, whose evaluation of an injury and its affecting physiological variables changes with the observed condition of the patient. Hubbard has partnered with Berkeley Research Computing (BRC) to obtain support for his research through use of the campus-shared High Performance Computing cluster, Savio, and BRC’s consultation service.
The Making of Traumatic Injury Prognosis Software
The power of this software is most apparent where it will be put to work: in a triage room; with a broken and bleeding human body on the operating table; “a cluster of nurses, an anesthesiologist, a resident, and the attending surgeon [descending]” on the patient, preparing to “[perform] dozens of complicated procedures at once,” as Nicola Twilley describes in her New Yorker article on the University of Maryland’s R Adams Cowley Shock Trauma Center. The leading cause of human death, from infancy to middle age, is trauma.
With their software, Hubbard and his team are tackling an essential problem in the treatment of traumatic injury, one that undermines the efficacy of that descending cluster of practitioners. Each patient, in their very bodies, harbors a massive clinical data set, composed of vital signs -- including factors like pulse, temperature, and blood pressure -- type and quantity of blood product infusions, blood pH, and blood electrolyte levels, among other rich, but usually messy data. These characteristics are measured and recorded repeatedly over the course of treatment, and thus, the data set swells. The problem is, no human being working in a tense operating room environment can assess the time-varying impact of all of these factors on the mortality outcome of traumatic injury.
However, algorithms, tuned by a training dataset composed of hundreds of trauma patients’ complete clinical and outcome data, that assess associations between patient characteristics, treatment options, and mortality outcomes, may be able to do just that. This is the aim of Hubbard’s prognosis tool. As doctoral student Chris Kennedy describes, the likely iPad-based prognosis tool would take important clinical characteristics about a patient, and quickly analyze connections between her specific and current condition and data about prior treatment and mortality outcomes of patients with similar injuries. The attending physician would then use the tool’s analysis to inform decisions about the best course of medical action for the patient on the table.
The analysis described above involves the application of “complex statistical algorithms accompanied by powerful computers able to carry out a large number of computations in large data sets within clinically-relevant time frames” (Hubbard et al, 2013). The attentions of Hubbard’s team are now directed at encapsulating a collection of statistical algorithms in a single program. The framework they use for this work is a machine learning technique called super learning, an ensemble learning method developed with the Division of Biostatistics, with major contributions from Prof. van der Laan. Using super learning, an optimal combination of a collection of prediction algorithms are used to obtain a more accurate predictive result than could be derived by a single constituent algorithm. Hubbard’s team is training their predictive model on the PRospective Observational Multicenter Major Trauma Transfusion (PROMMTT) clinical dataset (as well as several other multi-trauma center data sources) composed of the clinical and outcome data from a select 980 trauma patients treated at one of ten trauma centers across the U.S. between 2009-2010 (Holcomb et al, 2013). This data ‘trains’ the software to recognize patterns in a specific patient’s clinical characteristics, and from those patterns offer an injury outcome prediction, establish which characteristics have the greatest influence over that outcome, and present an optimized treatment plan that can guide the decisions of the practitioner.
Due to the complexity and breadth of the training clinical datasets, and the “extremely computationally intensive, but trivially parallelizable,” as Hubbard says, super learning method, his team needs powerful, scalable computing resources. With Professor Mark van der Laan, Hubbard is a condo contributor on Savio, the campus shared HPC cluster provided by BRC. Chris Kennedy, one of the key graduate students working with Hubbard, describes the benefits of the group’s computational investment: “It’s tough to iterate on these algorithms if you’re computing on a laptop that you need for other purposes. We have 8 nodes in the condo cluster, and now we’re looking to explore the Low Priority queue,” a service which allows condo contributors to use any of the available nodes on Savio, in addition to their own, under certain circumstances. This option is worth the small wait of “between ten minutes or a day,” Kennedy continues, referring to delays imposed by Savio’s queueing system, as these combined computational resources “help a lot in research and development mode. We can fix problems faster, get papers out the door, and get results to doctors.”
What’s Next, and the Biomedical Big Data Training Program
Hubbard and his team plan to do “more of the same, just more,” as he puts it, in the immediate future. This means they will continue to develop and improve their precision medicine software, using the large and growing patient data sets they are now receiving from collaborators in Paris, Colorado, and UCSF. Recent recipients of a grant stemming from The Healthy Birth, Growth, and Development (HBGD) knowledge integration (HBGDki) initiative, in partnership with van der Laan, the primary investigator on the grant, van der Laan and Hubbard’s research groups are also hoping “to develop more software in the realm of precision public health in developing countries,” with a focus on child health. This software would be geared specifically toward prediction of child growth stunting and impaired cognitive development, and would, like the traumatic injury prognosis tool, examine optimal treatment plans and their possible outcomes. For both software projects, the team has globally-impactful goals. They are striving to produce programs that are accessible to non-statisticians, can reliably prescribe the best intervention options through the utilization of big data, and that have the potential to take the global medical and public health communities closer towards standardization in the treatment of traumatic injury, impaired child neurocognitive development and health outcomes in both medical and population health settings.
With primary investigator van der Laan, Hubbard is co-director of the Biomedical Big Data Training Program, a program recently funded by the National Institutes of Health (NIH). The program will train a select group of multidisciplinary Ph.D. students at UC Berkeley -- in fields like Molecular and Cell Biology, Computer Science, and Epidemiology, among others -- in the biomedical applications of biostatistics, machine learning techniques, and Big Data tools and computing. BRC will support the program with computational resources and consultation. Chris Paciorek, a statistical computing consultant with BRC, will provide training to graduate students on the grant.
The Biomedical Big Data Training Program, in promoting the power of big data and computing to advance biomedical knowledge and practice, echoes Hubbard’s conclusions about the utility of big data drawn from his investigation of mass extinction patterns in the Phanerozoic eon. The program will encourage the development of tools that emulate Hubbard’s use of clinical datasets to guide treatment of traumatic injury -- which will not only advance biomedical knowledge, but may be crucial to saving lives in the future. The program will also lay groundwork for other scientists to pursue academic journeys akin to Hubbard’s own, as it trains a new cohort of big data scientists. Berkeley Research Computing looks forward to supporting their contributions to science and medicine as well.
If you’d like to learn more about the services BRC provides, please contact research-it@berkeley.edu.