The Bethesda Handbook of Clinical Hematology, 3 Ed.

30. Interpretation of Functional Genomics

Adrian Wiestner and Louis M. Staudt

The paradigm of “nature or nurture” juxtaposes genetically determined traits to the formative environment. When we consider gene expression in a given cell or organism, these apparent opposites converge. Genes required for cell lineage determination and genes required for cellular responses to environmental conditions are equally transcribed into RNA. Not all genes however are expressed in all cells at all times. Thus, the “transcriptome,” or the genes that are expressed in a given cell at a given time, is only a fraction of the genome. The transcriptome integrates cell lineage, cellular functions, activity of regulatory or oncogenic pathways, and response to external factors. In hematologic malignancies, the quantitative analysis of the transcriptome has refined disease classification and provided powerful prognostic information. In addition, profiling of the transcriptome revealed activation of distinct oncogenic pathways, the importance of which can be experimentally tested by targeted genetic interventions using short complementary RNAs that reduce expression of a specific gene. In some cases, these approaches have led to the discovery of oncogenic mutations, thus linking the “structural” genetic information to the “functional” genomic characteristic of the sample under study. In recent years, whole genome sequencing technologies have been widely applied in oncology and are rapidly generating a comprehensive map of tumor mutations. Functional genomics will likely continue to be a powerful tool to study the role of these mutations in tumor biology.

Here, we focus on discussing general concepts of functional genomic methods and illustrate their application with examples primarily from the study of lymphoid malignancies.

GENE EXPRESSION PROFILING TO CAPTURE THE TRANSCRIPTOME

Two major techniques are now available to capture the complement of genes expressed in a cell: DNA microarrays and RNA sequencing. DNA microarrays consist of solid supports onto which probes have been attached that detect the presence of a specific RNA. Each array consists of thousands of such probes and each probe specifically hybridizes to one distinct RNA. A type of microarray technology commonly used employs oligonucleotide probes attached to a solid support. Affymetrix GeneChip® arrays are commercially available oligonucleotide arrays that depending on the specific array type can quantify the expression of approximately 47,000 transcripts (Human Genome U133 Plus 2.0). Novel sequencing technologies have made it possible to determine the sequence of all RNAs in a given sample. In addition to the actual sequence information, this technology also provides a highly quantitative measure of the relative abundance of a given RNA in the sample.

GENE EXPRESSION SIGNATURES IN MOLECULAR DIAGNOSIS, OUTCOME PREDICTION, AND TARGETED CANCER THERAPY

Microarray experiments typically yield several thousand data points per sample. The amount of data generated in such studies can easily overwhelm the researcher and statistician alike and makes “eyeball” analysis of the data virtually impossible. A number of analytical techniques aid in the interpretation of microarray data.1-4 In a so-called unsupervised analysis, statistical methods are used to visualize patterns of shared gene expression and to identify distinct groups of samples. This approach is independent of external data. “Supervised” approaches instead rely on statistical tests to relate gene expression characteristics to known biologic or clinical characteristics.

Unsupervised Analysis: Pattern Discovery by Hierarchical Clustering

One commonly used unsupervised strategy is called hierarchical clustering.1 This analysis identifies genes that share a similar expression pattern across all samples. For example, hierarchical clustering will group genes together that are highly expressed in one group of samples and lowly expressed in a second group. Genes that are involved in the same cellular function are often coordinately expressed and thus form a distinct “gene expression signature” of a particular biologic process.5 Gene expression signatures capture biologic characteristics, including cell type, differentiation state, cellular functions, and activity of signaling pathways, and thereby provide a framework in which the complexity of microarray data can be related to the biology of the study samples. Hierarchical clustering is a valuable tool to discover such patterns of coordinately expressed genes. The strength of this analysis is the focus on distinct biologic functions, represented by sets of genes contributing to the same process rather than isolated genes. For example, to proliferate, a cell simultaneously expresses a set of hundreds of genes involved in cell cycle progression, DNA replication, and metabolism, which upon hierarchical clustering can be visualized as a proliferation signature. Gene expression studies often use a single array per sample. The apparent lack of replicates is sometimes felt to make such data inferior. However, signature-based analysis strategies intrinsically are based on numerous replicates, which are more valuably biologic as opposed to technical replicates.

Hierarchical clustering can not only identify genes with coordinate expression across samples but also group samples that share a common pattern of gene expression. Hierarchical clustering can thereby dissect the heterogeneity of tumor samples that may be very important clinically.6-8 Thus, hierarchical clustering is an especially useful tool for “question-driven” as opposed to “hypothesis-driven” analysis of a data set and can uncover unexpected associations.

Experimentally defined gene expression signatures are catalogued and made available for statistical analysis.3,5,9 Signature-based analysis algorithms can provide molecular classifications of cancer types, establish prognoses, identify cancer subtypes with sensitivity to specific pharmacologic interventions, establish optimal drug combinations, and facilitate the discovery of novel pathway inhibitors. Strategies that have proven particularly effective are gene set enrichment analysis (GSEA) and the connectivity map. GSEA provides a statistical measure of the probability that a set of genes contains a predefined functional signature.4 This method can test whether the gene expression difference expressed between two tumor types are due, for example, to differential activity of the nuclear factor kappa B signaling pathway; similarly, the effect of a drug can be related to a distinct signaling pathway. To link these characteristics of cancer and drug, the connectivity map was developed.3 In essence, the gene expression profile is used to match tumor biology with the mechanism of action of a pharmaceutical agent. Such methods aid in drug development and may guide the clinical use of cancer therapy by identifying patient populations likely to benefit from a given intervention.

Supervised Analysis: Building Molecular Predictors of Diagnosis, Prognosis, and Treatment Response

“Supervised” analytical methods use biologic or clinical data to search for gene expression differences that are most informative for diagnosis or prognosis. To derive a molecular predictor of survival, one can, for instance, use the Cox proportional hazard method to identify gene expression characteristics associated with a distinct outcome. This initial step may yield several hundred genes depending on the sample size and significant cutoffs chosen. To further organize data, hierarchical clustering can be used to identify specific gene expression signatures that reflect those biologic processes that impact on survival. The pattern of gene expression signatures represented by a molecular outcome predictor and the optimal number of genes vary between different diseases and analytical techniques. In a large study of diffuse large B-cell lymphoma, 17 genes representing several signatures related to differentiation, tumor proliferation, and tumor–host interactions were combined to form the best prognostic score.10 In chronic lymphocytic leukemia (CLL), in contrast, a single gene, ZAP-70, was the most differentially expressed gene between biologic and prognostic distinct subtypes of the disease.11

CHALLENGES OF GENE EXPRESSION PROFILING

Not surprisingly, a method that yields quantitative data concerning many thousands of genes in hundreds of samples poses statistical obstacles. We focus briefly on three aspects; a recent review provides a more detailed discussion.12

Data Reproducibility: The Value of Training and Validation Sets

The large amount of data derived from gene expression studies increase the likelihood of finding chance associations between clinical variables and gene expression, increasing the likelihood that a model derived in one data set may not be reproducible in an independent data set. One approach against such overfitting problems is to randomly assign the cases in a study to two independent sets. The “training set” is used to derive the model while the “validation set” is used to test the general applicability of the model.

Multiple Testing Corrections: The Concept of False Discoveries

To analyze gene expression data, the traditional probability testing has to be corrected for the innumerable tests that can be performed on such large data sets. Because the p-value is designed to test a distinct hypothesis, a correction for multiple testing is necessary to avoid numerous false-positive calls. The false discovery rate (FDR) predicts the likely number of false-positive discoveries within a nominally significant set of variables. The FDR is computed as the number of expected chance findings at a given p-value divided by the number of observations at this significance cutoff.

Real and Apparent Discrepancies between Different Gene Expression Studies

A quality control study sponsored by the Food and Drug Administration involving different laboratories and different array platforms found excellent reproducibility of microarray measurements. While establishing, in principle, the robustness of the method, apparent and real discrepancies between reported studies can have many reasons, including faulty annotation of probes, lack of specificity of spotted array features, technical differences in hybridization and signal detection, and different analysis strategies that seemingly yield discrepant gene lists.13 The use of different platforms may result in gene lists that only partially overlap but nevertheless capture the same biologic characteristics, as a gene expression signature that identifies a distinct diagnostic entity or cellular process may be composed of several hundred genes, not all of which are equally represented and equally well measured on different platforms. Therefore, the rank order of these genes can substantially differ between different studies. Comparison of the entire set of differentially expressed genes, not only the top ranked genes, may be required to detect commonality.14 With the increasing standardization of technical platforms and more robust analysis algorithms, the reproducibility of microarray studies may already have surpassed the reproducibility of immunohistochemical or flow cytometric methods.

Genetic Interference Studies with Short Complementary RNAs to Identify Essential Pathways in Cancer Biology

The discovery that short (in the 20–30 bp range) RNAs can regulate the stability and translation of mRNAs has been transformed into a powerful screening tool for genes essential in cancer biology.15Introduction of such short RNAs into a cell can effectively “knock-down” the activity of the target gene. It is therefore possible to assess the importance of a given gene for phenotype, proliferation, and survival of the transfected cell. Two major approaches are used: short RNAs are synthesized in vitro and transfected into cells and the RNA is integrated into a viral expression vector that is then used to transfect the cells. The later strategy has the advantage of providing stable expression of the short RNA while transfected RNA molecules typically show effects only in the first 2 to 3 days. The approach can be scaled so as to basically allow a functional screen across the whole genome. Some recent studies using this approach are listed in Table 30.1.

CLINICAL APPLICATION OF GENE EXPRESSION PROFILING

Gene expression profiling is in rapid transition from a research test to clinical application. Table 30.2 summarizes select informative studies, and recent reviews provide more detailed discussions. 5,38-40 Table 30.3 lists some prospective clinical trials incorporating gene expression profiling. Because of ready tissue availability, most clinical gene expression studies to date have been performed in malignancies. Many of the pioneering studies were retrospective, often based on archival material, and focused on diagnosis.

More recently, gene expression profiling has been used as a tool to capture dynamic changes in tumor biology. One study, for example, analyzed changes in CLL cells as a function of the location of the tumor in blood, lymph node, or bone marrow.28 Gene expression profiling to analyze sequential tumor samples as a patient undergoes treatment provides a comprehensive pharmacodynamic assessment that can validate a hit on the intended target and characterizes the ensuing (stress) response in the tumor cell.29,33

References

  1. 1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns.Proc Natl Acad Sci U S A. 1998;95(25):14863-14868.
  2. 2. Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. 1999;286(5439):531-537.
  3. 3. Lamb J, Crawford ED, Peck D, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. 2006;313(5795):1929-1935.
  4. 4. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.
  5. 5. Shaffer AL, Wright G, Yang L, et al. A library of gene expression signatures to illuminate normal and pathological lymphoid biology.Immunol Rev. 2006;210:67-85.
  6. 6. Alizadeh AA, Eisen MB, Davis RE, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. 2000;403:503-511.
  7. 7. Valk PJ, Delwel R, Lowenberg B. Gene expression profiling in acute myeloid leukemia.Curr Opin Hematol. 2005;12(1):76-81.
  8. 8. Zhan F, Huang Y, Colla S, et al. The molecular classification of multiple myeloma. 2006;108(6):2020-2028.
  9. 9. Nevins JR, Potti A. Mining gene expression profiles: expression signatures as cancer phenotypes.Nat Rev Genet. 2007;8(8):601-609.
  10. 10. Rosenwald A, Wright G, Chan WC, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma.N Engl J Med.2002;346(25):1937-1947.
  11. 11. Wiestner A, Rosenwald A, Barry TS, et al. ZAP-70 expression identifies a chronic lymphocytic leukemia subtype with unmutated immunoglobulin genes, inferior clinical outcome, and distinct gene expression profile.2003;101(12):4944-4951.
  12. 12. Tinker AV, Boussioutas A, Bowtell DD. The challenges of gene expression microarrays for the study of human cancer.Cancer Cell.2006;9(5):333-339.
  13. 13. Sotiriou C, Piccart MJ. Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care?Nat Rev Cancer.2007;7(7):545-553.
  14. 14. Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma.Proc Natl Acad Sci U S A.2003;100(17):9991-9996.
  15. 15. Ngo VN, Davis RE, Lamy L, et al. A loss-of-function RNA interference screen for molecular targets in cancer.2006;441(7089):106-110.
  16. 16. Davis RE, Ngo VN, Lenz G, et al. Chronic active B-cell-receptor signalling in diffuse large B-cell lymphoma.2010;463(7277):88-92.
  17. 17. Ngo VN, Young RM, Schmitz R, et al. Oncogenically active MYD88 mutations in human lymphoma.2011;470(7332):115-119.
  18. 18. Annunziata CM, Davis RE, Demchenko Y, et al. Frequent engagement of the classical and alternative NF-kappa B pathways by diverse genetic abnormalities in multiple myeloma.Cancer Cell.2007;12(2):115-130.
  19. 19. Shaffer AL, Emre NC, Lamy L, et al. IRF4 addiction in multiple myeloma.2008;454(7201):226-231.
  20. 20. Valk PJ, Verhaak RG, Beijen MA, et al. Prognostically useful gene-expression profiles in acute myeloid leukemia.N Engl J Med.2004;350(16):1617-1628.
  21. 21. Bullinger L, Dohner K, Bair E, et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.N Engl J Med.2004;350(16):1605-1616.
  22. 22. Holleman A, Cheok MH, den Boer ML, et al. Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment.N Engl J Med.2004;351(6):533-542.
  23. 23. Yeoh E-J, Ross ME, Shurtleff SA, et al. Classification, subtype discovery and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling.Cancer Cell.2002;1(2):133-143.
  24. 24. Dave SS, Fu K, Wright GW, et al. Molecular diagnosis of Burkitt’s lymphoma.N Engl J Med.2006;354(23):2431-2442.
  25. 25. Hummel M, Bentink S, Berger H, et al. A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling.N Engl J Med.2006;354(23):2419-2430.
  26. 26. Klein U, Tu Y, Stolovitzky GA, et al. Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells.J Exp Med.2001;194(11):1625-1638.
  27. 27. Rosenwald A, Alizadeh AA, Widhopf G, et al. Relation of gene expression phenotype to immunoglobulin mutation genotype in B cell chronic lymphocytic leukemia.J Exp Med.2001;194(11):1639-1647.
  28. 28. Shipp MA, Ross KN, Tamayo P, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.Nat Med.2002;8:68-74.
  29. 29. Dave SS, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumorinfiltrating immune cells.N Engl J Med.2004;351(21):2159-2169.
  30. 30. Rosenwald A, Wright G, Wiestner A, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma.Cancer Cell.2003;3(2):185-197.
  31. 31. Shaughnessy JD Jr, Zhan F, Burington BE, et al. A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.2007;109(6):2276-2284.
  32. 32. Shaughnessy JD Jr, Qu P, Usmani S, et al. Pharmacogenomics of bortezomib test-dosing identifies hyperexpression of proteasome genes, especially PSMD4, as novel high-risk feature in myeloma treated with Total Therapy 3.2011;118(13):3512-3524.
  33. 33. Rosenwald A, Wright G, Leroy K, et al. Molecular diagnosis of primary mediastinal B cell lymphoma identifies a clinically favorable subgroup of diffuse large B cell lymphoma related to Hodgkin lymphoma.J Exp Med.2003;198(6):851-862.
  34. 34. Savage KJ, Monti S, Kutok JL, et al. The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma.2003;102(12):3871-3879.
  35. 35. Bullinger L. Gene expression profiling in acute myeloid leukemia.2006;91(6):733-738.
  36. 36. Shaffer Iii AL, Young RM, Staudt LM. Pathogenesis of human B cell lymphomas.Annu Rev Immunol.2012;30:565-610.
  37. 37. Johnson SK, Heuck CJ, Albino AP, et al. The use of molecular-based risk stratification and pharmacogenomics for outcome prediction and personalized therapeutic management of multiple myeloma.Int J Hematol.2011;94(4):321-333.
  38. 38. Herishanu Y, Perez-Galan P, Liu D, et al. The lymph node microenvironment promotes B-cell receptor signaling, NF-kappaB activation, and tumor proliferation in chronic lymphocytic leukemia.2011;117(2):563-574.
  39. 39. Rosenwald A, Chuang EY, Davis RE, et al. Fludarabine treatment of patients with chronic lymphocytic leukemia induces a p53-dependent gene expression response.2004;104(5):1428-1434.
  40. 40. Weniger MA, Rizzatti EG, Perez-Galan P, et al. Treatment-induced oxidative stress and cellular antioxidant capacity determine response to bortezomib in mantle cell lymphoma.Clin Cancer Res.2011;17(15):5101-5112.


If you find an error or have any questions, please email us at admin@doctorlib.org. Thank you!