“I have always believed in following the science,” says Stephen Floor, a postdoctoral researcher in the Doudna Lab, led by UC Berkeley Professor of Biochemistry, Biophysics, and Structural Biology, Jennifer Doudna. Floor’s self-described “circuitous” academic journey has taken him from computer science, to physics, and now, to molecular biology. En route, he has picked up a deep sense of the power of computing to explore and make discoveries about the natural world: “it was the most powerful thing I personally felt I could do with computers” he says, in reference to an undergraduate research stint for which he had to sift through gigabytes of data resulting from an experiment that involved “slamming atomic nuclei into sheets of gold.”
These roots in scientific exploration and computation have surfaced in Floor’s current research efforts to understand translational efficiency, a process by which two strands of genetic material, encoding the same exact protein, may produce different amounts of that protein; and to uncover the role of the Dead-box protein DDX3 -- one in a broad class of proteins that act as RNA chaperones -- in this phenomenon.
Floor has been utilizing Savio, the High Performance Computing (HPC) cluster maintained by Berkeley Research Computing (BRC), to process data from RNA sequencing experiments. Savio has allowed him to reduce this processing step from 4 days to 3 hours. Floor considers the accessibility of this powerful computing source remarkable: “I’ve been in science for 10 to 15 years now, and it is extremely rare to see clusters like [Savio] administered through universities. It’s not like every university has this. This is pretty special.”
Regulation of Protein Translation, and DEAD-box Proteins
The Central Dogma Theory, a defining principle of modern biology, describes a two-step process by which basic genetic information yields a functional product -- a protein -- in the cell. The genetic instructions encoded in a gene are initially expressed via transcription, whereby a double strand of DNA yields a single strand of RNA. During translation, the second step in gene expression, the strand of RNA interacts with a ribosome, an enzyme that catalyzes this step, to build a chain of amino acids, the protein, off of the framework of nucleotides in the RNA. Inefficient translation, the focus of Floor’s work, stems from the inherent structure of messenger RNA (mRNA), the modified product of transcription. Floor says, “in the mRNA there is an open reading frame [ORF], the part of the strand of mRNA that has the potential to be translated, and thereby form a protein.” The ORF is bounded, on each end, by an untranslated region, a segment of the mRNA that doesn’t encode a protein. Floor explains that these bounding regions “act as signals to the ribosome, telling the ribosome how many proteins to actually make from the ORF” by attracting and uniting with secondary structures or specific RNA binding proteins that “either promote or repress translation” of the mRNA strand. In this way, translation is regulated by the activity of untranslated regions. But to what benefit for the cell?
Floor offers an example of this regulation mechanism at work in neurons: while waiting for neurotransmitters to blast out from the preceding neuron, “individual mRNAs will go to the base of a synapse, and pause their translation.” Translation is reactivated once neurotransmitters are released onto the waiting neuron, signaling the mRNA to reboot protein production at the synapse, strengthening that synapse and even developing new brain cells.
The terms “inefficient” or “regulation” may evoke a negative limitation to genetic expression, or a reduction of RNA’s capacity to perform its biological function. However, Floor emphasizes that the upregulation of translation is the real danger, as a majority of cancers, and numerous brain disorders, “involve hyperactivated translation.” Recalling neurons, Floor explains the adverse effects of upregulation in individuals with the rare fragile x syndrome. He says, “a certain protein slows down ribosomes” as they’re facilitating translation at the synapse, bringing the process to a paused state. “This protein is insufficiently expressed” in individuals with the syndrome, causing excessive translation in neurons and inappropriate hyperactivity in the brain, from which many adverse developmental effects ensue.
Recognizing the importance of inefficient translation, and “how often translation is dysregulated in disease,” Floor is now investigating the role of the DEAD-box protein, DDX3, in this process. He explains that DEAD-box proteins are catalytic enzymes that act as “RNA chaperones,” or motor proteins capable of changing the shape of RNA, and which structures or proteins are bound to it, through the conversion of chemical energy (ATP) into mechanical work. DEAD-box proteins “promote translation of specific classes of mRNA” by altering structures bound to the mRNA that inhibit translation. Floor has shown that, in the absence of the DEAD-box protein DDX3, a selection of individual genes in transformed human cells have significantly reduced expression, ultimately suggesting that DDX3 is a kind of “regulatory switch” for those genes.
Floor is now investigating exactly which genes are regulated by DDX3, how this regulation works at the molecular level, and what the adverse effects may be of DDX3 mutations, especially with the knowledge that hyperactive translation is a major driving force behind tumor growth. Along these lines, Floor notes, “DDX3 mutations are present in ten to fifteen different kinds of cancer”.
Deep Sequencing and Computational Support
One of the fundamental techniques Floor employs in his investigation of inefficient translation is deep sequencing, by which fragmented strands of RNA are read out hundreds or even thousands of times. Researchers can encode biological properties of RNA in these data, including levels of translation or RNA structure, by preparing samples in different ways. Sequencing experiments generate vast amounts of data: Floor says it’s typical for “experiments to generate a billion reads,” that is, a billion spelled-out sequences of a unique section of RNA. Analysis of this data on Floor’s personal computer would take “days to a week,” he says.
Despite its inconvenience, this computation remains possible. In fact, because he can use his own computer to complete the task, Floor says “there isn’t an incentive [...] to spend a lot of money on computer hardware to make this process faster.” Savio resolves these opposing concerns of time and money. As Floor describes it, Savio is “an infrastructure provided by the university, there when you need to process big data quickly, but requires neither huge investment nor maintenance upkeep.” Processing RNA sequencing data is an easily parallelized effort, and because of this, Floor only needs about 3 hours to finish a task on Savio, compared to the previous 4 days on his own computer. And because the Doudna Lab’s aggregate computing needs fit within the free BRC Faculty Computing Allowance, research funding remains available for other necessary expenditures.
Perhaps more importantly, this computational resource allows Floor to cycle between “molecular and biophysical approaches” to his research, and “big data experiments” that recall his undergraduate days of slamming nuclei into sheets of gold. Indeed, this combination of techniques is powerful, and has brought Floor closer to answers to a question that has preoccupied the Doudna Lab, and many other scientists across the world, for years: “how do cells decide how much protein to make from a strand of mRNA”
The answer to this question holds huge medical implications for “biologics, or protein therapies,” Floor says. To treat diabetes, patients have to inject themselves with insulin, a protein. If translation could be controlled, however, “biologics could potentially be replaced by mRNA therapeutics,” with the mRNA acting as a factory for production of the needed protein. Floor and his colleagues have shown that inefficient translation offers a means to control translation of any gene, if the right sequences of the untranslated regions of the associated mRNA can be uncovered and leveraged. With this potential application, and others, in mind, Floor continues to seek a deepening knowledge of the mechanism of inefficient translation. The Berkeley Research Computing Program is proud to support his efforts.