"The Clinical Picture" — Using Annotation Analyses to Understand Your Cohort
Before you can interpret expression data, you need to understand the samples it came from. How old were the patients? What were their tumour stages? Was there a relationship between age and stage in this particular cohort? Do the survival outcomes cluster in ways that align — or don't align — with the clinical categories you've been using as grouping variables?
These questions live one level below the gene expression data, in the clinical annotation — the metadata that describes your samples. In R2, that annotation is stored as "tracks," and the Annotation Analyses module is where you interrogate those tracks directly, before a single gene comes into view.
The module handles three fundamental types of comparison, and it routes you to the right one automatically based on what you're comparing. If you want to know whether two categorical variables co-occur — are MYCN-amplified tumours more likely to be stage 4? — R2 computes a Fisher's exact test and shows you the overlap as a contingency table and visual summary. If both variables are numerical — does patient age correlate with a particular clinical score? — you get a scatter plot with a regression line and correlation statistics. And if you're relating a categorical variable to a numerical one — does survival time differ between molecular subtypes? — R2 applies the appropriate test and produces grouped plots with p-values.
The Cohort Overview function goes one step further: it generates a visual dashboard of all the annotation tracks in your dataset simultaneously, giving you a bird's-eye view of the cohort composition. How many male versus female patients? What's the distribution of ages? How do the molecular subtypes map onto the clinical stage categories? This overview is invaluable when you pick up a new public dataset and need to orient yourself quickly — and it's the kind of figure that belongs in the supplementary materials of any paper describing a cohort.
There's a practical wisdom to starting here. Many apparently biological findings in expression data are actually reflections of clinical confounding — a gene that looks like a marker of tumour aggressiveness is really just a marker of patient age, which happens to correlate with prognosis. The annotation analyses help you spot those confounds before you've built a whole hypothesis around them.
Know your cohort. Then interrogate the genes.
This is Part of an ongoing series on the R2 Genomics Analysis and Visualization Platform, developed at Amsterdam UMC. All analyses can be freely performed at r2.amc.nl. Full tutorials at r2-tutorials.readthedocs.io.
Comments
Post a Comment