"Is This Real Everywhere?" — Testing Your Gene Across Thousands of Datasets with Megasampler

May 05, 2026

Here's a scenario every researcher knows.

You've found something exciting in a neuroblastoma dataset. Your gene of interest is high in aggressive tumours and low in the benign ones. The statistics are solid, the plot looks beautiful, and your PI is cautiously optimistic. Then comes the question you were dreading: "But does it hold up in other cancers? Other cohorts? Other platforms?"

In the old world, answering that question meant emailing collaborators, hunting down GEO accession numbers, and waiting for someone with a computing cluster to process it all. In R2, it takes about ten minutes — using a module called Megasampler.

The premise is simple: R2 hosts over 3,000 public datasets, covering dozens of cancer types, normal tissues, developmental series, and disease models. The Megasampler lets you query your gene of interest across any combination of those datasets simultaneously, producing a single integrated overview plot.

You start by assembling your collection — maybe a handful of neuroblastoma cohorts, a few breast cancer datasets, some normal tissue references, and a paediatric pan-cancer set you've been meaning to explore. You type your gene name, click through, and what appears is a stacked bar chart of expression distributions: one bar per dataset, ordered however you like, with each dataset's spread of values shown as a box plot or dot cloud.

The patterns that emerge can be humbling or electrifying, often both. Your gene might be universally high across all tumour types — interesting, but perhaps not specific enough for a targeted therapy angle. Or it might be strikingly elevated in just two or three cancer types, with near-silence everywhere else. That specificity is a story worth telling.

The Megasampler also has a Megasearch function: rather than looking at one gene across many datasets, you ask which genes are differentially expressed in a particular way across your selected collection. It's a way of interrogating the entire transcriptome at the level of multiple cohorts — the kind of analysis that used to require a bioinformatics grant.

From any point in the overview, you can click through directly to a single-dataset view and drill down. The Megasampler is the wide-angle lens. The rest of R2 is the zoom.

This is Part of an ongoing series on the R2 Genomics Analysis and Visualization Platform, developed at Amsterdam UMC. All analyses can be freely performed at r2.amc.nl. Full tutorials at r2-tutorials.readthedocs.io.

Search This Blog

R2platform Data Science Annals

"Is This Real Everywhere?" — Testing Your Gene Across Thousands of Datasets with Megasampler

Comments

Post a Comment

Popular posts from this blog

Plotting updates for the open online R2platform. The data science platform for biomedical researchers

Get introduced to free open online data science platform R2platform with this 90 minutes workshop

The galectin family: 15 glycan readers reshaping cancer biology