The central dogma – DNA to RNA to protein – underpins all of biology. And while RNA keeps unveiling XKCD-worthy surprises, it’s proteins that drive cellular function and thus are the predominate targets for most drugs. Yet, for drug discovery and development, RNA-based biomarkers are still very common. Why the disparity?
The genomic era, fueled by advances in sequencing and oligonucleotide technologies, has put DNA and RNA in the spotlight. While these tools excel at reconstructing and predicting cellular processes, they fall short of directly observing and quantifying proteins with the precision needed to connect science to medicine.
In drug development and patient treatment, especially for antigen-targeted therapies like Antibody-Drug Conjugates (ADCs), leaning on RNA alone is like buying a new home from its blueprints and mockups without ever stepping inside. A gene might produce mRNA, but that’s no guarantee its protein is made or functions as expected, especially over time. Factors like translation efficiency, RNA degradation, protein turnover, and post-translational modifications keep the correlation between mRNA and protein levels moderate, typically ranging from 0.3 and 0.61.
There’s quite a lot of evidence to support this. Studies show significant variation depending on cell type or context – for example, Jovanovic and colleagues reported a correlation of 0.77-0.82 in activated mouse dendritic cells2 , while others found much lower correlations in fibroblasts3. These differences in expression matter for drug developers: RNA might suggest a target, but without protein data, the picture’s incomplete.
Table 1: Key studies examining the RNA-protein relationship
Study |
Correlation (R) |
Tissue/Cell Type |
Methods Used |
Conclusion |
Jovanovic et al., 20152 |
~0.77-0.82 |
Mouse dendritic cells |
Ribosome profiling, RNA-seq, pulsed-SILAC proteomics |
mRNA abundance explains 59-68% of steady-state protein variance, with translational efficiency contributing significantly to protein synthesis regulation |
Schwanhäusser et al., 20113 |
~0.6 |
NIH 3T3 fibroblasts |
RNA-seq, SILAC-based proteomics |
mRNA accounts for ~40% of protein variation, with post-transcriptional regulation key. |
Shalek et al., 20134 |
Not directly reported |
Mouse dendritic cells |
Single-cell RNA-seq, flow cytometry |
Transcriptional noise and burst kinetics add variability to RNA-protein links. |
Schwartz et al., 20145 |
Not directly reported |
Yeast |
Ψ-seq for pseudouridylation mapping |
RNA pseudouridylation impacts stability, indirectly affecting protein levels. |
Schulz et al., 20186 |
Variable (e.g., 0.68 for HER2 population, 0.16–0.45 for CK19) |
Breast cancer tissue |
Imaging mass cytometry |
Gene-specific patterns emerge, with HER2 showing stronger correlation than CK19. |
Ingolia et al., 20147 |
Not directly reported |
Mouse ESCs, HEK293 cells (validation in mouse tissues) |
Ribosome profiling |
Pervasive translation outside annotated genes shapes RNA-protein dynamics. |
Magnusson et al., 20228 |
~0.21 (raw), ~0.86 (modeled) |
Human T cells |
RNA-seq, mass spectrometry, time-series modeling |
Raw correlation low (0.21), splice variant and time-delayed models boost it to 0.86, highlighting temporal effects. |
Taniguchi et al., 20109 |
0.77 (FISH) or 0.54 (RNA-seq). Single-cell correlations are ~0. |
E. coli |
Single-molecule imaging, transcriptomics |
Ensemble shows moderate correlation, single-cell shows none, due to stochastic expression and protein/mRNA turnover differences. |
Edfors et al., 201610 |
0.39-0.79 (direct), 0.93 (with RTP) |
Human cell lines (e.g., U2OS, A431, SH-SY5Y) and tissues |
RNA-seq, targeted proteomics |
Moderate direct correlation (0.39-0.79) improves to 0.93 with gene-specific RTP factors, reflecting translational and stability differences. |
A correlation of 0.6 might sound solid, but it leaves 40% of protein levels unexplained.3 For ADCs, where success hinges on proteins like HER2 or EGFR being present on tumor cells, that’s a gamble. RNA-seq can flag a lead, but if the protein doesn’t materialize, the therapy flops. Even if the protein is present, its intra-tumoral heterogeneity and expression heterogeneity within the patient cohort dilutes the observable response, mandating larger cohort sizes and longer trials, while risking even more costly trial failures due to the lack of patient stratification and treatment biomarkers.11 Proteomics isn’t optional, it’s the reality check. For solid tumors, understanding the spatial relationship among the protein targets, drugs, and immune modifiers can be critical.6
Again, this relationship isn’t just academic – it directly impacts drug development, especially in drug classes like radioligand-based therapies and ADCs. These drugs marry antibodies to toxic payloads, relying on the target protein being expressed as expected in order to confer specificity. Researchers have mined datasets like TCGA and GTEx to see how RNA and protein align for ADC targets. The pattern holds: moderate correlations with some standouts, shaped by cancer type, immune context, and molecular quirks.
Table 2: Examining the RNA-protein relationship in the context of drug development
Study |
Corr. (R) |
Dataset/Tissue |
Methods |
Conclusion |
Zhang et al., 201412 |
0.47 (avg, steady-state), 0.23 (avg, gene variation) |
TCGA colorectal cancer |
RNA-seq, mass spectrometry |
RNA-protein correlations (median 0.39) vary by subtype-specific proteomic patterns; HER2-enriched cancers show tight mRNA-protein links for ERBB2 (r=0.84). |
Mertins et al., 201613 |
0.39 (median, global), e.g., 0.84 (ERBB2) |
TCGA (breast cancer) |
RNA-seq, mass spectrometry |
RNA-protein correlations vary by subtype; HER2-enriched cancers show tighter links. |
Gholami et al., 201314 |
~0.5-0.76 |
Human cancer cell lines |
RNA-seq, quantitative proteomics |
Moderate to high correlations between transcriptome and proteome, with post-transcriptional regulation influencing drug-response proteins potentially relevant to cancer therapeutics |
Coscia et al., 201615 |
Not directly reported |
Human ovarian cancer tissues and cell-line xenografts |
Mass spectrometry, transcriptomics |
Protein-level validation critical for identifying immunotherapy targets, including potential ADC-relevant antigens, in ovarian cancer |
Nusinow et al., 202016 |
0.48 (mean) |
Human cell lines (CCLE) |
RNA-seq, mass spectrometry |
RNA explains ~23% of protein variance on average; cell cycle and posttranscriptional regulation, including protein stability, contribute to differences between RNA and protein expression |
Jiang et al., 201917 |
~0.36-0.66 |
TCGA (hepatocellular carcinoma) |
RNA-seq, proteomics |
Immune infiltration and tumor heterogeneity modulate RNA-protein correlations. |
Clark et al., 201918 |
0.43-0.44 (median) |
CPTAC (ccRCC), TCGA |
RNA-seq, mass spectrometry |
RNA-protein correlations in ccRCC are similar to other cancers; N-linked glycosylation is upregulated in aggressive subtypes but not linked to detection issues. |
HER2 is a standout here, syncing mRNA and protein tightly—think R > 0.7 in breast cancer data6—making it a rare case where RNA might suffice. Most targets, though, linger in that 0.4-0.6 range, with factors like glycosylation or immune infiltration adding twists.
RNA research has flipped biology’s script. RNA-seq has unlocked research into gene regulation, cell states, and disease mechanisms with speed and breadth proteomics can’t match. It’s a discovery engine – dynamic and scalable. But when it’s time to act clinically, especially with protein-targeting therapies like ADCs, RNA alone won’t cut it.
A blueprint’s crucial for dreaming up your new home, but you’d never move in without inspecting the real thing. RNA hints at what might be; proteins prove what is. In drug development and clinical medicine, that gap is everything. For picking patients, predicting outcomes, or driving precision medicine, protein data’s the clincher.