UC Berkeley’s Research IT group was invited to participate in the 2017 Binder workshop hosted at UC Davis in October of last year. The workshop’s goal was to “enhance and extend the functionality of the binder notebook computing platform,” and specifically “ to brainstorm and prototype support for credentials so that private resources can be used to source and execute binders (on, e.g., AWS accounts and private repositories).”
The workshop was funded by a grant from the Sloan Foundation from a proposal submitted by C. Titus Brown, Associate Professor at UC Davis in the School of Veterinary Medicine where he leads the Lab for Data Intensive Biology.
What is Binder?
Binder is a concept and technology that makes it easy for researchers to reproduce and share their code, data, and computational environments.
The concept is about making a repository (collection) of Jupyter notebooks and data easy to share with collaborators in an ready-to-run executable environment, allowing contributors’ code to be immediately reproducible by anyone, anywhere.
The technology works as an always-on web service that anyone can connect to, running at: https://mybinder.org/
This publicly-available infrastructure is (currently) offered without authentication or payment requirements, which makes it a truly zero configuration/single-click experience for the user. The service is operated on the Google Cloud Platform by the Project Jupyter team, and Binder is supported through a grant from the Gordon and Betty Moore Foundation.
Why Binder?
The workshop proposal explains why researchers are motivated to use Binder: “Fully specified and perfectly repeatable computational data analyses have long been a goal of open scientists - after all, if we can eliminate the dull parts of reproducibility, we can get on with the more exciting bits of arguing about significance, interpretation and meaning.”
How does Binder work?
tl;dr: Just try it and see:
When an end-user (e.g. a research collaborator or the general public) uses Binder:
- the user supplies a link to a GitHub repository;
- Binder clones the repository and builds a custom Docker image based on a simple set of configuration files added to the GitHub repository that list software dependencies;
- then, Binder spins up the Docker container and redirects the user’s web browser to a Jupyter Notebook-based interface;
- at some point, binder detects lack of activity and shuts down the container.
A researcher who wishes to make use of Binder to share their work with collaborators or the public does the following:
- create a GitHub repository populated with Jupyter Notebooks
- adds Python package software dependencies to a requirements.txt file added to the GitHub repository
- (optional) in the case of more complex dependencies beyond Python, use additional files such as environment.yml for Anaconda, apt.txt for apt-get installing Linux packages, or even run RStudio using a Dockerfile.
Binder provides a low barrier-to-entry way to both publish and execute Jupyter notebooks in a fully customizable environment, where dependencies and software requirements can be specified in a simple, standard way tied directly to the project. Today Binder is available via the public mybinder.org service which provides a limited amount of computing capacity (1 CPU core, 2GB RAM) to a user, however this workshop enabled individual researchers to run their own instance of Binder in a more capable compute environment, as well as exploring questions of sustainability and how library and research computing organizations might provide institutionally-supported deployments.
For a detailed explanation of the internals, please see the newly released Binder 2.0, a Tech Guide.
Who attended the workshop?
The workshop attendees were from a variety of communities — researchers from various disciplines in the sciences and humanities (including Ecology, Statistics, Neuroscience, Microbiology, and German Literature), librarians, instructors, data scientists, programmers, HPC admins, and research infrastructure specialists.
The organizing committee included Abby Cabunoc Mayes (Mozilla Foundation), Tim Head (Wild Tree Tech), Chris Holdgraf (UC Berkeley/Jupyter team) and Yuvi Panda (UC Berkeley/Jupyter team).
Research IT was represented by Research Computing Architect, Aaron Culich.
In support of a diverse community
The workshop format was based on a hackfest-like unconference that was organized on the principle of small, diverse groups of people in a friendly environment.
To help support this kind of environment, the organizers used the Software Carpentry’s of Code of Conduct. They also took some time at the start of the workshop to call everyone’s attention to the code of conduct, and to summarize the key expectations for the audience — an important step that helped set the tone of safety and inclusivity for everyone in the workshop.
Join the community
You can check out the official blog post from the workshop for additional highlights, use cases, and notes. Developers may be interested in joining the binderhub-dev Google Group, and can find real-time community support via the binder gitter.im.