Posts

Showing posts from May, 2026

"From Molecule to Medicine" — Using Target Actionability Reviews to Bridge Data and Drug Development

 Most research stories end with a publication. The best ones end with a treatment. Between those two endpoints lies a long and difficult journey: from a statistically significant finding in a patient cohort, through mechanistic validation, to a druggable target, through preclinical models, to a clinical trial. It's a journey that requires not just good science, but an organised, evidence-based case for why this particular target, in this particular cancer, deserves the investment of a drug development programme. R2's Target Actionability Review (TAR) module is a tool built specifically to support that case-building process — and it is unlike almost anything else in a genomics platform. A TAR is a manually curated, structured literature review focused on a single gene target in a single cancer context. It brings together evidence from multiple domains: the genomic prevalence of alterations in the target, the functional evidence linking it to disease biology, the availability...

"Making It Yours" — Uploading Data, Building Tracks and Collaborating in R2

 At some point, the public datasets stop being enough. You've spent months characterising a cohort of your own — patient samples collected through your clinical network, cell lines you've treated and profiled, an in vivo experiment that generated a dataset unlike anything in the public domain. The analysis tools in R2 are exactly what you need. But your data isn't there yet. The Adapting R2 tools are what bridge that gap — and they're more capable than most users realise. Uploading your own dataset to R2 is a structured process, but a manageable one. You prepare your expression matrix and sample annotation file in the required formats, submit them through the platform's data addition workflow, and within a defined turnaround time, your dataset is live in your private R2 workspace — fully accessible through every analysis module the platform offers, while remaining invisible to anyone outside your authorised group. Once your data is in, you can enrich it with cu...

"The Whole Genome Story" — Visualising Structural Variation with WGS Data

 Gene expression tells you what a cell is doing right now. But cancer is, at its heart, a disease of the genome — of broken chromosomes, rearranged sequences, amplified oncogenes and deleted tumour suppressors. To understand why a gene is expressed the way it is, sometimes you need to see the structural context: the copy number landscape, the chromosomal rearrangements, the mutations that preceded everything else. R2's WGS/NGS integration tools bring whole-genome sequencing data into the same analytical space as expression data, and the entry point is one of the most visually striking displays in the platform: the Circos plot . A Circos plot is a circular representation of the entire genome. Each chromosome occupies an arc of the circle, and lines drawn across the interior of the circle connect genomic regions that have been rearranged relative to each other — translocations, inversions, insertions of one chromosome into another. For complex cancer genomes, these plots can look ...

"Seeing the Panel at Once" — Comparing Multiple Genes Across Your Cohort

 Research rarely lives at the level of a single gene for long. Within a few weeks of finding your gene of interest, you're building a panel — related family members, known interactors, upstream regulators, downstream targets. The question shifts from "what is this gene doing?" to "how does the whole set behave together?" R2's Multiple Genes View is built for exactly this moment in a project. Rather than clicking through each gene individually, you type a list — or paste it in from a spreadsheet — and R2 generates a side-by-side expression overview for all of them simultaneously. Each gene gets its own column of dots, arranged by sample, so you can scan across the panel and immediately see which genes are high, which are low, and crucially, whether they go up and down together or in opposition. The track annotation system brings this to life. You split all samples by a clinical variable — tumour subtype, for instance — and suddenly each column of dots is ...

"Finding the Axes of Variation" — Understanding Your Data with Principal Component Analysis

 There's a thought experiment that helps explain what Principal Component Analysis does. Imagine you're trying to describe the differences between a large group of people, and you have a thousand measurements for each person — height, weight, age, dozens of blood markers, hundreds more. That's an impossibly high-dimensional space. PCA's job is to find the most important directions of variation in that space and let you look along those directions instead. In genomics, the same logic applies. You have thousands of gene expression measurements per sample. PCA collapses that complexity into a small number of "principal components" — axes that capture the most variance in the dataset. The first principal component captures the most variation of all. The second captures the most of what remains, and so on. When you plot your samples along the first two or three of these axes, you're looking at a compressed but surprisingly faithful summary of the whole datase...

"Two Truths About the Same Sample" — Integrating Expression and Methylation Data

 There's a question that keeps coming up in modern cancer biology, and it goes something like this: we know this gene is silenced in these tumours — but why ? Is it a genetic event, a deletion, a mutation? Or is the promoter methylated, the gene quietly switched off by an epigenetic mechanism that leaves no trace in the sequence itself? DNA methylation and gene expression are two different measurements of the same biological reality, made on different platforms, stored in different formats, analysed by different tools — or at least, they used to be. In R2, they can live side by side in the same analysis. The Cross-Platform Integration module is designed for exactly the scenario where the same patient samples have been profiled on multiple technologies: expression measured by RNA-seq or microarray, methylation measured by an Illumina methylation array, perhaps copy number from a third platform. R2 treats these as a "collection" — a set of datasets that share samples in ...

"The Clinical Picture" — Using Annotation Analyses to Understand Your Cohort

 Before you can interpret expression data, you need to understand the samples it came from. How old were the patients? What were their tumour stages? Was there a relationship between age and stage in this particular cohort? Do the survival outcomes cluster in ways that align — or don't align — with the clinical categories you've been using as grouping variables? These questions live one level below the gene expression data, in the clinical annotation — the metadata that describes your samples. In R2, that annotation is stored as "tracks," and the Annotation Analyses module is where you interrogate those tracks directly, before a single gene comes into view. The module handles three fundamental types of comparison, and it routes you to the right one automatically based on what you're comparing. If you want to know whether two categorical variables co-occur — are MYCN-amplified tumours more likely to be stage 4? — R2 computes a Fisher's exact test and shows ...

"Reading the Switches" — Integrating ChIP-seq with Gene Expression

 Gene expression is the readout. But what controls the switches? If you work on transcription factors, chromatin remodelling, or gene regulation, you've probably done — or dreamed of doing — a ChIP-seq experiment. ChIP-seq tells you where proteins bind on the genome: where your transcription factor of interest sits, which promoters are decorated with active histone marks, where the cell has placed its regulatory bets. When you combine that information with expression data from the same system, you start to see causality rather than just correlation. R2 makes that integration surprisingly accessible, even if ChIP-seq data processing is not your area. The platform hosts a growing collection of pre-processed ChIP-seq datasets, covering transcription factor binding, histone modifications and chromatin accessibility across a range of cell lines and tumour types. You don't need to run a single alignment or peak-calling pipeline. The processed data is already there, waiting to be i...

"Seeing the Neighbourhood" — Exploring Your Gene in the Genome Browser

 Every gene lives somewhere. It has neighbours — other genes upstream and downstream, regulatory elements scattered around it, histone marks that signal whether the local chromatin is open or closed. When something unusual is happening with your gene of interest, sometimes the explanation isn't in the gene itself — it's in the surrounding landscape. R2's integrated Genome Browser is where you go to see that landscape. Unlike external genome browsers that feel disconnected from your expression data, R2's genome browser is woven directly into the platform. You can arrive there naturally — from a gene expression plot, a differential expression result, a correlation list — and the browser opens up showing you the genomic context of whatever you were just looking at, with your data already loaded. The visual logic is familiar to anyone who has used a genome browser before: horizontal tracks stacked on top of each other, each one showing a different layer of information ...

"Watching Biology Happen" — Analysing Gene Expression Across Time

 Most genomics datasets are snapshots. A tumour biopsy, a cell line harvested at a single moment — they tell you the state of the system, but not how it got there. Time series experiments are different. They follow a biological process as it unfolds: a cell line treated with a drug, sampled at 0, 6, 12, 24 and 48 hours. A differentiation protocol tracked from stem cell to mature neuron, hour by hour. The transcriptome not as a photograph, but as a film. Analysing that film used to be technically demanding. In R2, it's built right in. The Time Series module is designed specifically for experiments where multiple measurements have been taken from the same system across successive time points. When you load a time series dataset in R2 — say, a cell line treated with a differentiation agent and profiled at six intervals — the module presents the data in a way that respects its sequential nature. Each gene gets a trajectory: a line connecting its expression values across time, rising...

"A Figure Worth a Thousand Tables" — Building Publication-Ready Heatmaps with Genesets

 The grant deadline is in three weeks. Your collaborator has just asked for "a figure showing the expression of those immune genes across the cohort." Your PI wants it colour-coded by subtype, with hierarchical clustering, looking "like the ones in that Nature paper." You open R2. The Genesets and Heatmaps module is where R2 shifts from analytical tool to presentation machine — and the quality of what it produces is genuinely publication-ready without any post-processing in Illustrator. The starting point is your gene list. Maybe it's a set of immune checkpoint genes you've curated from the literature. Maybe it's the output of your differential expression analysis from earlier in the week. Maybe it's a published signature from a paper your PI circled in red pen. In R2, you save this list as a geneset, and from that point it becomes a reusable object — available for heatmaps, signature scoring, pathway analyses, and more. To build the heatmap, yo...

"It's Not Just About Genes" — Navigating Biological Meaning with the Pathway Finder

 There's a conceptual leap that happens partway through most research projects, and it goes something like this: you stop thinking about individual genes and start thinking about programmes. A single gene rarely acts alone. It sits inside a network — a pathway — where it communicates with dozens of partners, responds to upstream signals, and drives downstream consequences. Understanding your gene's pathway context is often the difference between a result that feels isolated and one that connects to a broader biological narrative. R2's Pathway Finder is built for exactly this transition. At its core, the Pathway Finder asks: which known biological pathways are behaving differently between your groups of samples? Rather than reporting a list of genes, it reports a list of processes — Wnt signalling, DNA damage response, cell cycle regulation, MAPK activity — ranked by how strongly their constituent genes are deregulated in your dataset. The starting point is familiar. Y...

"Boiling It Down to a Number" — Using Gene Signatures to Score Biological Programmes

 You've identified a set of genes that you believe represents a biological programme — maybe it's a hypoxia response signature from the literature, or a list of targets you identified from your own ChIP-seq experiment, or simply the genes that came out of your differential expression analysis. The question now is: can I reduce all of that complexity to a single meaningful score for each patient? That's exactly what R2's gene signatures module is designed to do. In R2, a signature is defined as a collection of genes — grouped together because they share something: a functional role, a genomic location, a co-expression pattern, or a biological programme you've defined yourself. Once you've assembled your gene list and saved it as a signature in R2, the platform can calculate an activity score for that signature across every sample in your dataset. The mechanics are intuitive. R2 computes a score that reflects the overall expression activity of your gene set in...

"Carving Nature at Its Joints" — Discovering Tumour Subtypes with K-Means Clustering

 The dataset has been sitting on your desk — metaphorically speaking — for months. One hundred and seventy-five medulloblastoma tumours. Clinically, they're all "medulloblastoma." But you've read enough papers to suspect that name is hiding several very different diseases underneath it. The question is whether the data agrees, and if so, how many distinct groups actually exist. This is the job of K-means clustering — and in R2, it's a surprisingly tactile, exploratory experience. The idea behind K-means is straightforward: you tell the algorithm how many groups (K) you want to find, and it sorts your samples into those groups by minimising the expression differences within each group while maximising the differences between them. The result is a coloured heatmap where each row is a gene, each column is a sample, and the colour — from deep blue through white to vivid red — reflects whether that gene is low, average, or high in that sample. What makes R2's i...

"Is This Real Everywhere?" — Testing Your Gene Across Thousands of Datasets with Megasampler

 Here's a scenario every researcher knows. You've found something exciting in a neuroblastoma dataset. Your gene of interest is high in aggressive tumours and low in the benign ones. The statistics are solid, the plot looks beautiful, and your PI is cautiously optimistic. Then comes the question you were dreading: "But does it hold up in other cancers? Other cohorts? Other platforms?" In the old world, answering that question meant emailing collaborators, hunting down GEO accession numbers, and waiting for someone with a computing cluster to process it all. In R2, it takes about ten minutes — using a module called Megasampler . The premise is simple: R2 hosts over 3,000 public datasets, covering dozens of cancer types, normal tissues, developmental series, and disease models. The Megasampler lets you query your gene of interest across any combination of those datasets simultaneously, producing a single integrated overview plot. You start by assembling your collect...

"The Map of Your Data" — Seeing Hidden Structure with t-SNE and UMAP

Sometimes you don't have a hypothesis. Sometimes you just have a dataset and a feeling that there's structure in it that you haven't found yet. Maybe you've collected 150 tumour samples from three different clinical sites. Maybe you're working with a published dataset covering multiple subtypes of a disease. You know, intellectually, that these samples aren't all the same — but where exactly the boundaries are, and whether they reflect biology or clinical annotation or something else entirely, is unclear. You need to see the data as a whole. This is what dimensionality reduction is for. And R2 makes it surprisingly approachable. t-SNE (t-distributed Stochastic Neighbour Embedding) and UMAP are algorithms that take a dataset with thousands of gene expression measurements per sample and compress that information down to two dimensions — a map where samples that look similar genomically end up close together, and samples that are different end up far apart. T...

"Guilt by Association" — Finding Genes That Travel With Yours

Science loves a good collaborator. In the molecular world, genes that are co-expressed — that go up and down together across many samples — are often part of the same biological programme. If you understand your gene's "company," you understand something deeper about what it actually does. R2's correlating genes module is built on this principle, and it's one of the most unexpectedly delightful tools in the platform. Here's the scenario: you've been working on a transcription factor for two years. You know it matters. You have good evidence it regulates a handful of known targets. But you have a nagging suspicion there's more to the story — that it's coordinating a much broader programme that you haven't fully mapped. Your collaborator in computational biology just went on sabbatical. What now? You go to R2, select your dataset, and run Find Correlated Genes . R2 calculates the Pearson correlation between your gene of interest and every oth...

"Something Changed — But What?" — Finding Differentially Expressed Genes Between Two Groups

 You've done the experiment. Two groups of samples — maybe treated versus untreated, maybe two tumour subtypes, maybe pre- and post-therapy biopsies. The RNA-seq or array data is back from the facility, processed, and sitting in R2 as a dataset. Now comes the real question: what actually changed? This is the moment that used to separate wet-lab researchers from computational ones. Not anymore. R2's differential expression tools let you ask, across all genes simultaneously, which ones are significantly up or down between your groups. But rather than dumping an intimidating table of 20,000 rows on you, R2 walks you through it in a structured, visual way. Start with a single gene to get your bearings. Say you suspect MYCN behaves differently between INSS stage 4 tumours and the rest. You select "View a Gene in Groups," choose your grouping track (tumour stage), and R2 produces a grouped box plot with statistics. It even runs the appropriate statistical test automaticall...

"Who Survives, and Why?" — Using Kaplan-Meier Analysis to Link Gene Expression to Patient Outcomes

 There's a moment in every cancer biologist's research when the question shifts from what is this gene doing? to does it actually matter for patients? That shift used to require a statistician, a clinical dataset, and a lot of back-and-forth. R2 collapses all of that into a single afternoon. The Kaplan-Meier module in R2 is built for exactly this question. A Kaplan-Meier curve is one of medicine's most powerful visual tools — it shows you, over time, what fraction of patients in a group are still alive (or relapse-free). When you split patients into two groups — say, high versus low expression of your favourite gene — and their survival curves diverge dramatically, you feel it in your stomach. That divergence is a signal. Here's how a typical session goes. You've been studying a receptor gene that you suspect is linked to poor prognosis. You open R2, select a neuroblastoma dataset with survival annotation, and navigate to the Kaplan-Meier module. You choose to ...

"My Gene, Front and Centre" — Visualising a Single Gene Across Hundreds of Tumours

 It starts, as it so often does, with a hunch. You've spent months in the lab chasing a gene — let's call it MYCN. You've seen it behave strangely in your cell lines. It seems to go up when it shouldn't, or maybe it's suspiciously quiet in samples where you expected it to roar. The question nagging at you over your morning coffee is: is this pattern real? And does it matter clinically? The old answer was: ask your bioinformatics colleague, wait two weeks, receive a spreadsheet you can't interpret. The new answer is: open a browser, go to r2.amc.nl , and spend fifteen minutes finding out yourself. R2's View a Gene module is the front door to the platform, and it's a remarkably welcoming one. You select a dataset — say, a public neuroblastoma cohort with 88 tumour samples — type your gene name into the search box, and within seconds you're looking at a dot plot of expression levels across every single sample. Each dot is a patient. The Y-axis is ex...