Increasingly, the scale of data analysis means that computation can no longer be carried out on a laptop — often the only resource easily available to researchers. And while the Berkeley Research Computing (BRC) Program offers many powerful resources to researchers, it’s not always clear where to get started.
Earlier this year, Heather Haveman, Professor of Sociology and Business, and Rachel Wetts, Sociology PhD candidate, reached out to BRC Consulting to ask for guidance about which campus resource might best fit processing a corpus of 20,000 articles from major newspapers. Professor Haveman wrote:
One of my PhD students, Rachel Wetts, needs help with a text-analysis project. She is using plagiarism-detection software (WCopyFind) to link news-media reports (about climate change) to press releases from business concerns, government agencies, scientists, and activists. She has been doing this analysis on her laptop, but it is taking far, far, far too long -- a week or more for each news source.
The analysis software, WCopyFind, runs only on Windows machines, so the BRC team recommended its new Analytics Environment on Demand (AEoD) service. AEoD provides a remote Windows-based research desktop environment to campus researchers who need to scale-up beyond the capabilities of their laptops. BRC offers the AEoD Service to departments or organizations who can provide direct support to researchers. Though not currently offered to individual researchers, BRC staff are exploring other AEoD service support models. Rachel Wetts’ computational needs were an opportunity for the providers to learn more about how this service could be extended, while supporting a concrete UC Berkeley research project.
To process data quickly at the scale Rachel needed, the BRC Consultants provisioned an AEoD research desktop during the first in-person consultation. On the laptop, chunking the data into smaller batches for processing worked without crashing, but required a much longer processing time; whereas processing in larger batches improved speed, but required more memory so the program would not crash. On the AEoD platform, administrators can quickly add up to 128GB of RAM if needed. With some back-and-forth, we worked out the right amount of RAM for Rachel’s work. She says that the system “has been a great help and saved me a LOT of time on my dissertation work.”