Researchers at large, well-funded institutions have a lot of options when it comes to support. They may have a supercomputing center embedded on campus or access to funds to purchase time on private HPC resources, like AWS. They probably have experts on hand to help guide them through the process of utilizing these resources and integrating them into their research. Smaller, less well-funded teams don’t necessarily have the same level of access to these resources, yet they are still performing important research. This resource gap is something lead authors Tyler McIntosh and Erick Verleye, and their team, are attempting to bridge with their work. “A digital divide exists between scientists who have the privilege to accelerate their science with large cyberinfrastructure and those who don’t,” said McIntosh, project manager at the University of Colorado Boulder’s Cooperative Institute for Research in Environmental Sciences (CIRES) Earth Lab. “Our work contributes to the suite of tools available to help scientists cross this divide.”
A digital divide exists between scientists who have the privilege to accelerate their science with large cyberinfrastructure and those who don’t. Our work contributes to the suite of tools available to help scientists cross this divide.
– Tyler McIntosh, project manager, CIRES Earth Lab, University of Colorado Boulder
To solve this issue, McIntosh and Verleye, a software developer at the Earth Lab Analytics Hub and the Environmental Data Science Innovation & Inclusion Lab (ESIIL), were looking to create scalable virtual machines running on public cloud infrastructure. The goal was to make cyberinfrastructure (CI) an accessible resource for collaboration and education, and as easy to use for participants as a personal computer. Once they finalized the parameters of their project, they needed a way to test and deploy it. ACCESS was an ideal solution. “ACCESS allowed us to provision the large number of computing resources that our CI required and which may have been cost prohibitive on the commercial cloud. We didn’t feel the cost pressure during testing and development that is usually felt when using a commercial cloud provider,” said Verleye.
When looking at the options for ACCESS allocations, the team decided to go with the Explore allocation – a brand-new allocation option within ACCESS. Explore ACCESS allocations are intended for purposes that require small resource amounts. These kinds of allocations are great for testing benchmarks, classroom activities and developing code. Explore allocations are even open to graduate students for their thesis or dissertation work.
I would recommend the Explore allocation for anyone looking to recreate the deployment described in our paper.
– Erick Verleye, software developer, Earth Lab Analytics Hub, University of Colorado Boulder
“We chose an Explore allocation because it struck the right balance between the amount of resources we would have access to and how much we would use, said Verleye. “The Explore allocation allowed us to host multiple workshops for between 40 and 200 users without having to worry about going over our limit. We also had access to GPU and high-memory nodes for specialty workflows which was great. I would recommend the Explore allocation for anyone looking to recreate the deployment described in our paper.”
The research team was pleased with the simple application process. “Applying for an Explore allocation was very straightforward,” said Verleye. “After submitting an overview of ESIIL’s purpose, goals and plans for the resources that would be provided, we were given an allocation. We were happy with the short amount of time that the review process took.” They were also very happy with the level of support they received from ACCESS. “ACCESS support has been very helpful as well,” Verleye said. “When we needed to increase our usage limits for certain resources like RAM, vCPUs and floating IPs, ACCESS and Jetstream2 were quick to respond.”
Participant confidence completing tasks independently before and after the case study working group Credit: McIntosh et al, Cyberinfrastructure deployments on public research clouds enable accessible Environmental Data Science education, PEARC23
The team created their scalable cyberinfrastructure, orchestrated in partnership with CyVerse, and tested it in a working group as a case study. The impressive results are very promising, with participants giving highly positive evaluations after using the CI to work through computation-intensive educational modules. From the figure above, it’s easy to see that there’s a marked increase in participant confidence after using the scalable CI created by the research team. The team published a paper and presented their findings at the recent PEARC conference in Portland, Oregon. The potential benefits of research like this are huge, allowing researchers of all types to take advantage of HPC resources regardless of funding.
“Our cyberinfrastructure architecture, which is built on public research resources, can be easily replicated and scaled,” said McIntosh. “This will allow other teams to more easily run cyberinfrastructure-based workshops, working groups and classes. We’ve test-driven the system with an event focused on collaborative data synthesis in the field of environmental data science, which was hosted by Earth Lab and ESIIL at CU Boulder. Such spaces will increase equity in the scientific community by providing easier access to computation resources and helping participants in any discipline develop fundamental data science and CI skills.”
If you’re a researcher interested in cyberinfrastructure resources, consider an ACCESS allocation. You can find out more about applying for an allocation here.
Project Details
Resource Provider Institution: Indiana University (Jetstream2)
Affiliations: University of Colorado Boulder, Cooperative Institute for Research in Environmental Sciences (CIRES) Earth Lab and Environmental Data Science Innovation & Inclusion Lab (ESIIL)
Funding Agency: NSF
Grant or Allocation Number: BIO220085
The science story featured here was enabled by the ACCESS program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. This work used Jetstream2 at Indiana University through allocation BIO220085. CyVerse is based upon work supported by the NSF under Grant Nos. DBI-0735191, DBI-1265383, and DBI-1743442. The working group was supported by the Environmental Data Synthesis Innovation and Inclusion Lab (ESIIL, NSF award DBI-2153040) and additional NSF grants DEB-2017889 and DEB-1846384. Additional funding was provided by Earth Lab through the CU Boulder’s Grand Challenge Initiative and the Cooperative Institute for Research in Environmental Sciences (CIRES).