Understanding Cancer with Big Data

By
An extreme close-up view of cancer cells, tinted blue.

Research produces a ton of data. You may think researchers inherently know what to do with this data, but when it comes to big data, the scale of information is so vast that tackling it requires some training. That’s what the BigCare 2023 Summer Workshop aims to do – show cancer researchers how to use all that data, specifically by incorporating cyberinfrastructure resources into their work. This year, the 10-day workshop held at Purdue used ACCESS resource Anvil to support both the in-person and online workshops.

“All of our participants are cancer researchers, so they have zero experience in computing and minimal experience in statistics,” said Min Zhang, founder of BigCare and a professor of epidemiology and biostatistics at the University of California. “The data we use in the workshop is essentially the data they collected in their own research, and a lot of times, the participants have already sent it to a company and paid money to have it analyzed for them. Then, every time they need something else from that data, the company always asks them to pay more. So the idea is, since they generated the data by themselves, why not enable them to analyze the data too?”

Empowering researchers is also a goal of ACCESS. By giving researchers cyberinfrastructure (CI) resources to use in their work and offering the support they need to use these high-performance machines, ACCESS expands the reach of powerful outcomes by making CI an affordable and accessible aspect of research. Training workshops like BigCare also allow researchers to focus on their expertise rather than spending time learning the basics of supercomputing.

We are trying to break the barrier between the researchers and the data, to at least give them an opportunity to see their data, to work with their data, and to extract information from their data.

Min Zhang, founder of BigCare

“We don’t want to turn everyone into a computer scientist because they have more important things to do,” said Zhang. “Previously, we had to teach users the front end, back end, command lines, all this kind of stuff, and now it’s all gone! Life is so much easier. And everyone was so excited that they wanted to take Anvil to their own institution. Some of them would even say, ‘We do have HPC, we do have cloud, but it’s not as user-friendly as Anvil.’”

You can read more about this story here: Cancer researchers learn about big data analysis using Anvil


Project Details

Resource Provider Institution: Rosen Center for Advanced Computing (RCAC)
Affiliations: National Cancer Institute, University of California, Purdue University
Funding Agency: NSF
Grant or Allocation Number(s): Anvil is funded under NSF award No. 2005632

The science story featured here was enabled by the ACCESS program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Sign up for ACCESS news and updates.

Receive our monthly newsletter with ACCESS program news in your inbox. Read past issues.