For researchers running computation on the Savio high-performance compute cluster, data transfer can be a challenge. A new IPython notebook simplifies data transfer from the free Box collaboration platform to a Savio user’s scratch folder, and provides a template for users to develop their own algorithms that analyze data stored in Box.
UC Berkeley affiliates receive unlimited storage through bConnected Box. Storing research data through a cloud storage provider such as Box is a step towards ensuring against data loss due to disk failure or other local storage issues. Box features include unlimited storage at no cost to the users, encryption, and fine-grained access control. In addition, Box has a user-friendly interface that can be accessed from a web browser.
Moving data from Box to Savio in an ad hoc workflow can be a cumbersome, multi-step process. First, the user would need to download their data from Box via an FTP client or web browser. They would then need to install and configure the Globus Connect Personal web client to transfer the data to Savio, so that it is accessible to the IPython notebook. Finally, the multi-step transfer may be required again when the notebook’s code has been executed, in order to move results back to Box.
Maurice Manning, Research IT’s Cyberinfrastructure Engineer, recently developed an IPython notebook that leverages the Box API to pull data directly from Box into Savio, and return the results of an analysis to Box. After performing some initial configuration steps, users can freely transfer data to and from any Box folder where they have access.
Using the JupyterHub server on Savio for analysis workflows in an IPython notebook requires both an account on Savio, as well as registration for pilot-participant status for the cluster’s JupyterHub notebook server (see info at end of this article). Additionally, a few setup steps are required for using the Box API, including Box user authentication, described in detail in the IPython notebook itself.
Research IT hopes that the notebook that leverages the Box API can be used for more than its data transfer capabilities. The IPython notebook is well-suited for use as scaffolding for development of new notebooks that include analytical processing steps. The notebook includes a clearly marked place for inserting additional processing logic, to facilitate extension and reuse.
The iPython notebook configured for Box access, TransferFilesFromBoxToSavioScratch.ipynb, is currently available in the notebooks folder of the brc-cyberinfrastructure github repository. We welcome any feedback from campus users, and would be interested to see examples of new notebooks that build upon this work.
To register to use Jupyter notebooks on Savio, please write to research-it@berkeley.edu.