R2platform Data Science Annals

Posts

Showing posts from May, 2026

The galectin family: 15 glycan readers reshaping cancer biology

May 22, 2026

T hey bind sugar. They talk to tumours. And they may be the next frontier in cancer immunotherapy. Every cell in your body is coated in a dense forest of sugar chains — glycans. These structures are not decoration. They are read, constantly, by a class of proteins called lectins. Among them, the galectins stand out: a family of 15 human genes whose products specifically recognise β-galactoside motifs and translate glycan patterns into cellular decisions about growth, survival, and immune response. In the language of glycobiology, galectins are glycan readers — they do not build or break sugar chains (that is the job of glycosyltransferase writers and glycosidase erasers), but they interpret them. And increasingly, it is clear that what they read in tumours spells trouble for the immune system. The family tree: three structural archetypes All 15 members share a conserved carbohydrate recognition domain (CRD), but differ in how many CRDs they carry and how those domains are arranged. T...

The SOX Family: Master Regulators of Development

May 22, 2026

Few protein families in molecular biology are as versatile — or as consequential — as the SOX transcription factors. From the earliest moments of embryonic development to the maintenance of adult tissues, these proteins quietly orchestrate some of the most fundamental decisions a cell ever makes. What Are SOX Factors? SOX proteins take their name from S RY- b ox, a reference to SRY ( Sex-determining Region Y ), the founding member of the family discovered in 1990. SRY turned out to encode a transcription factor with a distinctive DNA-binding domain called the HMG (High Mobility Group) box. When researchers began scanning the genome for proteins sharing this domain, they found not one or two relatives, but an entire family — 19 members in humans, now classified into subgroups A through H. What all SOX proteins share is that HMG box: a roughly 80-amino-acid domain that grips the minor groove of DNA and bends it sharply, sometimes by as much as 70–85 degrees. This bending is not i...

Unlocking Cancer Insights: Introducing the R2 Platform

May 20, 2026

Every year, millions of people are diagnosed with cancer — a disease that, despite its name, is not one condition but thousands. Understanding why tumors behave differently from patient to patient is one of the most pressing challenges in modern medicine. A major breakthrough came with The Cancer Genome Atlas (TCGA), a landmark project that molecularly mapped over 30 types of cancer across thousands of patients. But having data is only half the battle. Making sense of it — quickly, reliably, and without needing a team of bioinformaticians — is where progress stalls. That's the problem R2 was built to solve. R2 is a free, browser-based platform that puts powerful cancer genomics analysis in the hands of any researcher or clinician, no coding required. Drawing on TCGA RNA-seq and clinical data from 31 cancer types, R2 lets you ask — and answer — complex biological questions in minutes. Here's what you can do with it: Compare gene activity across cancer types and patient g...

A Multi-Omics Resource for Neuroblastoma Research: Molecular Profiles and Drug Response Data for Classical NB Cell Lines

May 10, 2026

Neuroblastoma remains one of the most challenging pediatric cancers to treat. Despite decades of research, high-risk disease still carries a poor prognosis, and finding effective therapies requires a deep understanding of the molecular landscape driving each tumor. To support that mission, we are excited to announce the release of a comprehensive multi-omics dataset covering the most widely used classical neuroblastoma cell lines — an openly accessible resource designed to accelerate discovery across the field. What Is in the Dataset? This resource brings together multiple layers of molecular and pharmacological data for a panel of classical neuroblastoma cell lines. In one place, researchers can now access: Transcriptomics (mRNA expression) Genome-wide gene expression profiles capturing the transcriptional state of each cell line. These data allow researchers to explore pathway activity, subtype classification, and gene regulatory networks. DNA copy number variation Genome-scal...

"From Molecule to Medicine" — Using Target Actionability Reviews to Bridge Data and Drug Development

May 05, 2026

Most research stories end with a publication. The best ones end with a treatment. Between those two endpoints lies a long and difficult journey: from a statistically significant finding in a patient cohort, through mechanistic validation, to a druggable target, through preclinical models, to a clinical trial. It's a journey that requires not just good science, but an organised, evidence-based case for why this particular target, in this particular cancer, deserves the investment of a drug development programme. R2's Target Actionability Review (TAR) module is a tool built specifically to support that case-building process — and it is unlike almost anything else in a genomics platform. A TAR is a manually curated, structured literature review focused on a single gene target in a single cancer context. It brings together evidence from multiple domains: the genomic prevalence of alterations in the target, the functional evidence linking it to disease biology, the availability...

"Making It Yours" — Uploading Data, Building Tracks and Collaborating in R2

May 05, 2026

At some point, the public datasets stop being enough. You've spent months characterising a cohort of your own — patient samples collected through your clinical network, cell lines you've treated and profiled, an in vivo experiment that generated a dataset unlike anything in the public domain. The analysis tools in R2 are exactly what you need. But your data isn't there yet. The Adapting R2 tools are what bridge that gap — and they're more capable than most users realise. Uploading your own dataset to R2 is a structured process, but a manageable one. You prepare your expression matrix and sample annotation file in the required formats, submit them through the platform's data addition workflow, and within a defined turnaround time, your dataset is live in your private R2 workspace — fully accessible through every analysis module the platform offers, while remaining invisible to anyone outside your authorised group. Once your data is in, you can enrich it with cu...

"The Whole Genome Story" — Visualising Structural Variation with WGS Data

May 05, 2026

Gene expression tells you what a cell is doing right now. But cancer is, at its heart, a disease of the genome — of broken chromosomes, rearranged sequences, amplified oncogenes and deleted tumour suppressors. To understand why a gene is expressed the way it is, sometimes you need to see the structural context: the copy number landscape, the chromosomal rearrangements, the mutations that preceded everything else. R2's WGS/NGS integration tools bring whole-genome sequencing data into the same analytical space as expression data, and the entry point is one of the most visually striking displays in the platform: the Circos plot . A Circos plot is a circular representation of the entire genome. Each chromosome occupies an arc of the circle, and lines drawn across the interior of the circle connect genomic regions that have been rearranged relative to each other — translocations, inversions, insertions of one chromosome into another. For complex cancer genomes, these plots can look ...

"Seeing the Panel at Once" — Comparing Multiple Genes Across Your Cohort

May 05, 2026

Research rarely lives at the level of a single gene for long. Within a few weeks of finding your gene of interest, you're building a panel — related family members, known interactors, upstream regulators, downstream targets. The question shifts from "what is this gene doing?" to "how does the whole set behave together?" R2's Multiple Genes View is built for exactly this moment in a project. Rather than clicking through each gene individually, you type a list — or paste it in from a spreadsheet — and R2 generates a side-by-side expression overview for all of them simultaneously. Each gene gets its own column of dots, arranged by sample, so you can scan across the panel and immediately see which genes are high, which are low, and crucially, whether they go up and down together or in opposition. The track annotation system brings this to life. You split all samples by a clinical variable — tumour subtype, for instance — and suddenly each column of dots is ...

"Finding the Axes of Variation" — Understanding Your Data with Principal Component Analysis

May 05, 2026

There's a thought experiment that helps explain what Principal Component Analysis does. Imagine you're trying to describe the differences between a large group of people, and you have a thousand measurements for each person — height, weight, age, dozens of blood markers, hundreds more. That's an impossibly high-dimensional space. PCA's job is to find the most important directions of variation in that space and let you look along those directions instead. In genomics, the same logic applies. You have thousands of gene expression measurements per sample. PCA collapses that complexity into a small number of "principal components" — axes that capture the most variance in the dataset. The first principal component captures the most variation of all. The second captures the most of what remains, and so on. When you plot your samples along the first two or three of these axes, you're looking at a compressed but surprisingly faithful summary of the whole datase...

"Two Truths About the Same Sample" — Integrating Expression and Methylation Data

May 05, 2026

There's a question that keeps coming up in modern cancer biology, and it goes something like this: we know this gene is silenced in these tumours — but why ? Is it a genetic event, a deletion, a mutation? Or is the promoter methylated, the gene quietly switched off by an epigenetic mechanism that leaves no trace in the sequence itself? DNA methylation and gene expression are two different measurements of the same biological reality, made on different platforms, stored in different formats, analysed by different tools — or at least, they used to be. In R2, they can live side by side in the same analysis. The Cross-Platform Integration module is designed for exactly the scenario where the same patient samples have been profiled on multiple technologies: expression measured by RNA-seq or microarray, methylation measured by an Illumina methylation array, perhaps copy number from a third platform. R2 treats these as a "collection" — a set of datasets that share samples in ...

"The Clinical Picture" — Using Annotation Analyses to Understand Your Cohort

May 05, 2026

Before you can interpret expression data, you need to understand the samples it came from. How old were the patients? What were their tumour stages? Was there a relationship between age and stage in this particular cohort? Do the survival outcomes cluster in ways that align — or don't align — with the clinical categories you've been using as grouping variables? These questions live one level below the gene expression data, in the clinical annotation — the metadata that describes your samples. In R2, that annotation is stored as "tracks," and the Annotation Analyses module is where you interrogate those tracks directly, before a single gene comes into view. The module handles three fundamental types of comparison, and it routes you to the right one automatically based on what you're comparing. If you want to know whether two categorical variables co-occur — are MYCN-amplified tumours more likely to be stage 4? — R2 computes a Fisher's exact test and shows ...

"Reading the Switches" — Integrating ChIP-seq with Gene Expression

May 05, 2026

Gene expression is the readout. But what controls the switches? If you work on transcription factors, chromatin remodelling, or gene regulation, you've probably done — or dreamed of doing — a ChIP-seq experiment. ChIP-seq tells you where proteins bind on the genome: where your transcription factor of interest sits, which promoters are decorated with active histone marks, where the cell has placed its regulatory bets. When you combine that information with expression data from the same system, you start to see causality rather than just correlation. R2 makes that integration surprisingly accessible, even if ChIP-seq data processing is not your area. The platform hosts a growing collection of pre-processed ChIP-seq datasets, covering transcription factor binding, histone modifications and chromatin accessibility across a range of cell lines and tumour types. You don't need to run a single alignment or peak-calling pipeline. The processed data is already there, waiting to be i...

"Seeing the Neighbourhood" — Exploring Your Gene in the Genome Browser

May 05, 2026

Every gene lives somewhere. It has neighbours — other genes upstream and downstream, regulatory elements scattered around it, histone marks that signal whether the local chromatin is open or closed. When something unusual is happening with your gene of interest, sometimes the explanation isn't in the gene itself — it's in the surrounding landscape. R2's integrated Genome Browser is where you go to see that landscape. Unlike external genome browsers that feel disconnected from your expression data, R2's genome browser is woven directly into the platform. You can arrive there naturally — from a gene expression plot, a differential expression result, a correlation list — and the browser opens up showing you the genomic context of whatever you were just looking at, with your data already loaded. The visual logic is familiar to anyone who has used a genome browser before: horizontal tracks stacked on top of each other, each one showing a different layer of information ...

"Watching Biology Happen" — Analysing Gene Expression Across Time

May 05, 2026

Most genomics datasets are snapshots. A tumour biopsy, a cell line harvested at a single moment — they tell you the state of the system, but not how it got there. Time series experiments are different. They follow a biological process as it unfolds: a cell line treated with a drug, sampled at 0, 6, 12, 24 and 48 hours. A differentiation protocol tracked from stem cell to mature neuron, hour by hour. The transcriptome not as a photograph, but as a film. Analysing that film used to be technically demanding. In R2, it's built right in. The Time Series module is designed specifically for experiments where multiple measurements have been taken from the same system across successive time points. When you load a time series dataset in R2 — say, a cell line treated with a differentiation agent and profiled at six intervals — the module presents the data in a way that respects its sequential nature. Each gene gets a trajectory: a line connecting its expression values across time, rising...

"A Figure Worth a Thousand Tables" — Building Publication-Ready Heatmaps with Genesets

May 05, 2026

The grant deadline is in three weeks. Your collaborator has just asked for "a figure showing the expression of those immune genes across the cohort." Your PI wants it colour-coded by subtype, with hierarchical clustering, looking "like the ones in that Nature paper." You open R2. The Genesets and Heatmaps module is where R2 shifts from analytical tool to presentation machine — and the quality of what it produces is genuinely publication-ready without any post-processing in Illustrator. The starting point is your gene list. Maybe it's a set of immune checkpoint genes you've curated from the literature. Maybe it's the output of your differential expression analysis from earlier in the week. Maybe it's a published signature from a paper your PI circled in red pen. In R2, you save this list as a geneset, and from that point it becomes a reusable object — available for heatmaps, signature scoring, pathway analyses, and more. To build the heatmap, yo...

"It's Not Just About Genes" — Navigating Biological Meaning with the Pathway Finder

May 05, 2026

There's a conceptual leap that happens partway through most research projects, and it goes something like this: you stop thinking about individual genes and start thinking about programmes. A single gene rarely acts alone. It sits inside a network — a pathway — where it communicates with dozens of partners, responds to upstream signals, and drives downstream consequences. Understanding your gene's pathway context is often the difference between a result that feels isolated and one that connects to a broader biological narrative. R2's Pathway Finder is built for exactly this transition. At its core, the Pathway Finder asks: which known biological pathways are behaving differently between your groups of samples? Rather than reporting a list of genes, it reports a list of processes — Wnt signalling, DNA damage response, cell cycle regulation, MAPK activity — ranked by how strongly their constituent genes are deregulated in your dataset. The starting point is familiar. Y...

"Boiling It Down to a Number" — Using Gene Signatures to Score Biological Programmes

May 05, 2026

You've identified a set of genes that you believe represents a biological programme — maybe it's a hypoxia response signature from the literature, or a list of targets you identified from your own ChIP-seq experiment, or simply the genes that came out of your differential expression analysis. The question now is: can I reduce all of that complexity to a single meaningful score for each patient? That's exactly what R2's gene signatures module is designed to do. In R2, a signature is defined as a collection of genes — grouped together because they share something: a functional role, a genomic location, a co-expression pattern, or a biological programme you've defined yourself. Once you've assembled your gene list and saved it as a signature in R2, the platform can calculate an activity score for that signature across every sample in your dataset. The mechanics are intuitive. R2 computes a score that reflects the overall expression activity of your gene set in...

"Carving Nature at Its Joints" — Discovering Tumour Subtypes with K-Means Clustering

May 05, 2026

The dataset has been sitting on your desk — metaphorically speaking — for months. One hundred and seventy-five medulloblastoma tumours. Clinically, they're all "medulloblastoma." But you've read enough papers to suspect that name is hiding several very different diseases underneath it. The question is whether the data agrees, and if so, how many distinct groups actually exist. This is the job of K-means clustering — and in R2, it's a surprisingly tactile, exploratory experience. The idea behind K-means is straightforward: you tell the algorithm how many groups (K) you want to find, and it sorts your samples into those groups by minimising the expression differences within each group while maximising the differences between them. The result is a coloured heatmap where each row is a gene, each column is a sample, and the colour — from deep blue through white to vivid red — reflects whether that gene is low, average, or high in that sample. What makes R2's i...