On dealing with large volumes of sensitive data

How the De Preter lab became the first users of VIB Data Core’s new secure compute cluster

When researchers at the De Preter lab set out to benchmark tools for detecting cancer biomarkers in cell-free DNA, they knew the science would be demanding—large volumes of sensitive patient data, GDPR compliance, and months of intensive computation. They also knew they would need infrastructure beyond what their own lab could provide. The VIB Data Core stepped in with the compute, storage, and data transfer services that carried the project from first sample to published preprint.

The challenge: measuring tumor burden in cell-free DNA

Cancer patients release fragments of DNA into their body fluids from both healthy and tumor cells. This circulating cell-free DNA preserves tumor-specific methylation patterns, making it a promising tool for non-invasive cancer detection and monitoring. Computational deconvolution methods can estimate the tumor-derived fraction of cell-free DNA, but accurately doing so remains difficult. And while several tools exist, none had been specifically benchmarked for their sensitivity and accuracy in estimating tumor fraction in cell-free DNA contexts.

That was the gap Edoardo Giuili, working in Katleen De Preter's lab at VIB, set out to fill. His benchmarking study required generating large in silico datasets by mixing methylation sequencing data from DNA of human tumor samples and cell-free DNA of healthy donors.

"This posed two main challenges," Edoardo explains: "the substantial storage capacity required and the need for a secure, GDPR-compliant environment due to the sensitive nature of the data."

It was exactly the kind of challenge the VIB Data Core was built for.

A different kind of core facility

Most VIB core facilities offer specialized lab techniques—sequencing, imaging, proteomics... The Data Core is different. Established in 2023 and hosted by VIB.AI, it provides the digital backbone that researchers need once the data has been generated.

Rafael Buono, Team Lead Compute of the Data Core describes the mission as "providing data infrastructure and making it possible to use, and hopefully even easy to use." Its services include compute clusters for both standard and GDPR-sensitive workloads, storage at multiple tiers, from active compute storage to long-term project buckets, and secure data transfer tools, among others.

The Data Core does not build pipelines or direct the science. "The users themselves tend to be the domain experts," Rafael says. "There is no point meddling there." Instead, when a lab approaches them with a project, the team considers which combination of services might help—sometimes in ways the researchers hadn't initially considered. "It's part of our internal thinking: which parts of what we can offer can they actually plug in?" Rafael explains. That might mean suggesting a different approach to storage, or a configuration that scales more efficiently on VIB's systems.

Moving sensitive data safely with Globus

With large volumes of patient data to bring into the secure cluster, the lab needed a reliable and compliant way to move it. That's where Globus came in, which is an encrypted data transfer service that the Data Core makes available for its users. Rather than requiring researchers to download data locally and re-upload it, Globus orchestrates direct transfers between systems—checking file integrity, retrying dropped connections, and logging exactly who triggered each movement for GDPR compliance.

Maria Tsontaki, a Data Core project manager who worked closely with the De Preter lab, notes that Globus makes it very easy for researchers at every stage from external ingestion into the secure cluster, through internal moves during processing, and finally into long-term project storage. "Through a simple web interface, researchers can select their data, choose a destination, and the transfer takes care of itself—including integrity checks and retries."

From left to right: Maria Tsontaki (VIB Data Core) and Edoardo Giuili and Sofie Van de Velde (De Preter lab, VIB-UGent Center for Medical Biotechology)

Testing, configuring, and learning together

The Data Core had recently rolled out a new secure compute cluster, and the De Preter lab became its first users. "They were the first testers, actually, of the cluster," Rafael says. "Since they were dealing with human data, they were the perfect candidate."

With their data securely in the cluster, the lab needed to organize their analysis into a reproducible workflow. They built a pipeline using Nextflow, a workflow management system widely used in bioinformatics. The Data Core stepped in with custom configuration files so it could run efficiently on VIB's systems.

"If they can spend more time with their research instead of figuring that out, it's better for everybody involved," Maria says.

 

A full lifecycle

The VIB Data Core provided the infrastructure that carried the project from first data ingestion to published preprint. "Their support made it possible to carry out this computationally demanding and sensitive analysis smoothly," Edoardo says.

The lab's data has since migrated from the active compute cluster into the Data Core's GDPR-compliant longer-term project storage, accessible to fulfill reviewer requests or future work. From ingestion to computation to publication, the project used a range of Data Core services.

Rafael, characteristically, frames it in terms of the researchers.

"The really exciting thing here is that they got to use the full system, and they got to push out a very nice piece of work. As an enabler, this sounds great."

For any VIB research group working with sensitive data: the Data Core's door is open. As Rafael puts it: "Come in. Let's see what we can do."

 

 

 

Website preview
About | VIB Data Core
We help you with the technical aspects of data management and optimize computing resources for data analysis
VIB Data Core

 

Share

Latest stories

Website preview
Introducing the Neurodiversity Guide: a workplace guide from the lived experience perspective
"This guide isn’t just for neurodivergent people — it’s for everyone"
blog.vib.be
Website preview
Two labs, one question: MAGISTER begins
First laureates of the Fund Stéphanie Willems (King Baudouin Foundation), tackling glucocorticoid resistance in polymyalgia rheumatica (PMR) and giant cell arteritis (GCA)
blog.vib.be
Website preview
Proteomics for patients and society
How the VIB Proteomics Core is rebuilding its pipeline to power clinical studies, patient diagnostics, and biomarker discovery
blog.vib.be

About VIB Blog

On our blog, you can find content curated by the VIB community. Discover our research through the eyes of our scientists.

Want to be kept up-to-date on our biotechnological news and stories? Join our community and subscribe to our bi-monthly newsletter here.

Contact

Suzanne Tassierstraat 1 9052 Ghent

communications@vib.be

vib.be