Introduction: Viral gene integration into the host’s chromosome is an essential step for the successful completion of the life cycle of several viruses such as retroviruses, and adeno-associated viruses but in herpesviruses, their genomes are maintained as extrachromosomal circular episomes in the nucleic of infected cells without needing integration. However, a number of studies have reported of chromosomally integrated herpesvirus (CIHHV) DNA which means that herpesvirus can integrate into the host’s chromosome in certain circumstances. Furthermore, a virus such as HHV-6 has been found to integrate into the germ lines of about 1% of the global population suggesting that integration may represent either a sporadic or anecdotal event. Integration causes host genetic instability as shown by elevated adjacent mutation rates with the viral genome consisting of subgenomic fragments; therefore there is no possibility of production of infectious viral particles. This chapter will review latest data on EBV integration, the consequences of integration, and the methods utilized in analyzing integration.
4.1 EBV Genome and site of integration: Epstein-Barr virus (EBV) is a human herpesvirus with the ability of immortalizing human B lymphocytes in vitro. In EBV-infected B cells, the virus is usually in episomal state forming multiple copies of covalently closed circles. A number of malignant lymphoma such as Burkitt’s lymphoma, nasopharyngeal carcinoma, and Hodgkin’s disease are associated with EBV infection. In individual lacking efficient T-cell function, for e.g. in AIDS patients or transplant recipients. EBV-immortalized B lymphocytes can grow into immunoblastic lymphoma. In its 2008 classification of hematopoietic and lymphoid associated tumors, the WHO recognized another entity called EBV-positive diffuse large B-cell lymphoma (DLBCL) in the elderly. Chronic active EBV infection (CAEBV) is associated with prolonged fever, wasting, hepatosplenomegaly and cytopenia. In some cases, patients with CAEBV also develop a fulminate course with lymphoid malignant.
The 170kb of EBV genome (figure 1) is a linear ds DNA and contains at least 86 ORFs. The genome contains a long unique region which is interspersed by four major internal repeats (IR1 to IR4) and terminal repeat (TR). Nine latent proteins including Epstein-Barr nuclear antigen 1 (EBNA1), EBNA2, EBNA3A, -3B,-3C, EBNA-LP and latent membrane protein 1 (LAMP1), and LAMP2A, -2B are encoded by genes located in the unique region of the genome. Other ORFs have been reported which encodes capsid protein, transcriptional factors and lytic proteins of various functions. Furthermore, in addition to the protein-coding genes, the EBV genome also encodes other non-coding RNAs such as EBV-encoded small RNA1 (EBER1) and 2 (EBER2), BART-driven microRNAs (mRNAs-BART) and BHRF1 microRNAs (miRNAs-BHRF).
Figure 1: Schematic diagram of linear EBV genome
Four complete or partial EBV genome have been described as: B95-8, AG876, GD1, and GD2. B95-8 was the first complete genome to be sequenced and it was derived from an individual with infectious mononucleosis. AG876 originated from Burkitt’s lymphoma in case from Ghana. It is the only complete type 2 EBV sequence available to date. GD1 and GD2 are EBV genomes derived from NPC patients from the Guangdong provinces in Southern China. GD1 was isolated from saliva of NPC patients while GP2 was isolated from saliva of NPC patients while GD2 was isolated from NPC tumor. Cis-acting elements which mediate DNA replication during latency have been identified as Ori-P (for plasmid origin of replication). The viral DNA replication takes place once per cell cycle and proceeds bidirectionally from the Ori-P and is dependent on cellular proteins and EBNA1; with studies showing that EBNA1 binding to Ori P is essential for plasmid DNA replication and episome maintenance, and can also function as a transcriptional enhancer of the C promoter (Cp). Another origin of replication different from Ori P has been described and referred to as Ori Lyt. It is associated with amplification of the viral genome. Replication assay showed that seven EBV proteins are required for Ori Lyt-dependent replication: DNA polymerase (BALF5), polymerase Processivity factor (BMRF1), single-stranded DNA binding protein (BALF2), Primase (BSLF1), Helicase (BBLF4), Helicase/primase associated protein (BBLF2/3) and EB4. In addition, several non-essential proteins with enzymatic activities involved in biochemical pathways are also encoded by the virus.
Integration of EBV is essential mechanism for persistence and for viral interaction with cellular genes, especially with those genes involved the regulation of cell growth and tumorigenesis. Elucidating the site of integration is essential for better understanding the mechanism of persistence in EBV-associated malignant. Analysis of integrated EBV DNA is complicated as a result of highly methylated DNA which hinders mapping of EBV genome and multiple copies of viral episomes which gives interfering noise at the EBV integrated sites. However, a number of studies have been undertaken to elucidate EBV integration sites and the mechanism of integration in EBV infection. A study by Luo et al using NAB-2 cell line reported that EBV was integrated via the terminal repeat and the integration site was located in chromosome 2p13 between two oncogenes, REL and BCL11A. Others have reported of integration of EBV in chromosome 6 in Raji cell line resulting in loss of BACH2 gene. As to whether integration occurs randomly or not, the data is still debatable but a study by Lestou et al reported that EBV integration is nonrandom with the involvement of bands 1p31, 1q43, 2p22, 3q28, 4q13, 5p14, 5q12, and 11p15 in most of the cell lines. One interesting studies reported that EBV integration was in G-band-positive materials. This band refers to regions in the chromosome that stain Giemsa reagents and generally associated with heterochromatin, a region associated with many repeats and no functional genes. However it must be pointed out that integration does not occur exclusively in regions without genes, as integration sites have been reported to occasionally overlap within bona fide cellular genes such as MACF1 in Namalwa cells, BACH2 and BCL11A. From this data, it can be concluded that EBV integrated at different sites. Integration of the virus into the host genome leads to novel fusion transcripts, and/ or local genomic instability, resulting in secondary deletions, rearrangements, duplications, or inversion of the host and/or viral genomic sequences. In addition, integration is associated with tumorigenesis.
4.2 Methods of Identifying Viral Integration: Mapping of oncoviral integration sites is a powerful tool for identifying cellular oncogenes. In a study, Copeland and Jenkins used retrovirus to identify potential oncogenes by determining the viral integration sites in tumor tissues. This led to the development of database of cancer-associated genes. Earlier methods used in mapping integration sites utilized the concept of PCR-based capture and amplification assays but were inefficient and highly labor intensive. High-throughput generation sequencing technologies were also utilized which led to efficient identification of integration sites. Peter et al and other researchers developed web-based bioinformatics tools which facilitated the identification integration sites by mapping the sequence data derived from Sanger technology. However, the tools are not sufficient to quickly map and characterize integration sites in high-throughput methods. Peter et al introduced a new methodology which quickly maps integration sites to a reference genome from extremely large datasets. The method utilizes Seqmap 2.0 and provides scalable method for sequencing matching, clustering, and alignment, and also addresses the challenges of 454 pyrosequencing data output, namely base stutter and redundant coverage of each integration site. The Seqmap 2.0 workflow has three phases: 1. Sequencing processing which includes identification and masking of vector features and distribution of sequence reads into multiple identifiers / barcodes-specific groups, 2. Sequencing clustering and alignment, and 3. Data visualization and storage for future analysis. Figure 2 shows a typical representation of mapped integration sites in sequence viewer.
Figure 2: Graphical representation of mapping integration sites in sequence viewer
The Seqmap 2.0 can analyze data from major PCR techniques such as ligase-mediated PCR (LM-PCR), and non restrictive LAM-PCR (nrLAM-PCR). This methodology allow user to: 1. Upload full sets of 454 pyrosequencing reads, 2. Create savable lists of bar codes and identifiers, 3. Create savable lists of vector features to mask from each read and 4. Identify the appropriate reference genome to which the integration site could be mapped. Other approaches in identifying integration sites include ViralFusion Seq, Virus Finder, and Virana. Recently Wang et al introduced a new approach that detects virus integration sites through Reference Sequence customization (VERSE).
References
1. Akagi K, Li J, et al (2014): Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability, Genome Res; 24: 185-199.
2. Akagi K, et al (2004): RTCGD: retroviral tagged cancer gene database, Nucleic Acid Red; 32:D523-D527.
3. Altmann P, Pich D, et al (2006): Transcriptional activation by EBV nuclear antigen 1 is essential for the expression of EBV’s transforming genes, PNAS USA; 103:4654-4661.
4. Appelt JY, et al (2009): Quickmap: a public tool for large scale gene therapy vector insertion site mapping and analysis, Gene Ther; 16:885-893.
5. Baer R, Bankier AT, et al (1984): DNA sequencing and expression of the B95-8 Epstein-Barr virus genome, Nature; 310: 207-211.
6. Buchberg AM, et al (1990): Evi-2, a common integration site involved in murine myeloid leukomagenesis, Mol Cell Biol; 10:4658-4666.
7. Chen SJ, Chen GH, et al (2010): Characterization of Epstein-Barr virus miRNAome in nasopharyngeal carcinoma by deep sequencing, PLoS One; 5: e12745.oding a zinc-finger protein, Adv cancer Res; 54:141-157.
8. Copeland NG, Jenkins NA (1990): Retroviral integration in murine myeloid tumors to identify Evi-1, a novel encoding a zinc-finger protein, Adv cancer Res; 54:141-157.
9. Gabriel R, et al (2009): Comprehensive genomic access to vector integration in clinical gene therapy, Nat Med; 15: 1431-1436.
10. Giordano FA, et al (2007): New bioinformatics as strategies to rapidly characterize retroviral integration site of gene therapy vector, Methods Inf Med; 40:542-547.
11. Hawkins TB, Dantzer J, et al (2011): Identifying viral integration sites using Seqmap 2.0, Bioinformatics; 27:720-722.
12. Ishihara S, Okada S, et al (1995): Chronic active Epstein-Barr virus infection in children in Japan, Acta Pediatr; 84:1271-1275.
13. Lestou VS, De Braekeleer M, et al (1993): Non-random integration of Epstein-Barr virus in lymphoblastoid cell lines, Genes Chromosome Cancer; 8:38-48.
14. Li J-W, Wan R, Yu C-S, et al (2013): ViralFusion Seq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution, Bioinformatics; 29:649–51.
15. Liu P, Tang X, et al (2011): Direct sequencing and characterization of a clinical isolate of Epstein-Barr virus from nasopharyngeal carcinoma tissue using next generation sequencing technology, J Virology; 85:11291-11299.
16. Lu F, Wilkramasinghe P, et al (2010): Genome-wide analysis of host-chromosome binding sites for Epstein-Barr virus nuclear antigen 1 (EBNA1), Virology J; 7:62
17. Luo W-J, Takakuwa T, et al (2004): Epstein-Barr virus is integrated between REL and BCL-11A in America Burkitt lymphoma cell line (NAB-2), Lab Invest; 84:1193-1199.
18. Knipe DM, Howley PM, et al (2007): Field’s Virology, Lippincott Williams and Wilkins.
19. Kwok H, Tang AHY, et al (2012): Genome sequencing and comparative analysis of Epstein-Barr virus genome isolated from primary nasopharyngeal carcinoma biopsy, PLoS One; 7:e36939.
20. Matsuo T, Heller M, et al (1984): Persistence of the entire Epstein-Barr virus genome integrated into human lymphocyte DNA, Science; 226:1322-1325.
21. Parker BD, Bankier A, et al (1990): Sequence and transcription of Raji Epstein-Barr virus DNA spanning the B95 deletion region, Virology; 179:339-346.
22. Peter B, et al (2008): Automated analysis of viral integration sites in gene therapy research using the Seqmap web resources, Gene Ther; 15:1294-1298.
23. Pizzo PA, Magrath IT, et al (1978): A new tumor-derived transforming strain of Epstein-Barr virus, Nature; 272: 629-631.
24. Polan A, Addison C, et al (2006): The genome of Epstein-Barr virus type 2 strain AG876, Virology; 350:164-170.
25. Portes-Sentis S, Sergeant A, Gruffat H (1997): A particular DNA structure is required for the function of a cos-acting component of the Epstein-Barr virus OriLyt origin of replication, Nucleic Acid Research; 7:1347-1354.
26. Rickinson AB (1986): Chronic, symptomatic Epstein-Barr infection, Immunology Today; 7:13-14.
27. Schelhorn S-E, Fischer M, et al (2013): Sensitive detection of viral transcripts in human tumor transcriptomes, PLoS Comput Biol; 9:e1003228.
28. Schmidt M, et al (2003): Efficient characterization of retro-, lenti-, and foamy vector- transduced cell population by high-accuracy insertion site sequencing, Ann NY Acad Sci; 996: 112-121.
29. Schmidt M, et al (2007): High resolution insertion site analysis by linear amplification-mediated PCR (LAM-PCR), Nat Methods; 4:1051-1057.
30. Smith DR (1992): Ligation-mediated PCR of restriction fragment for large DNA molecules, PCR Methods Appl; 2: 21-27
31. Sung W-K, Zheng H, et al (2012): Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma, Nat Genet; 44:765-769.
32. Swaminathan S (2008): Noncoding RNAs produced by oncogenic human herpesvirus, J Cell Physiol; 216: 321-326
33. Takakuwa T, Luo W-J, et al (2004): Integration of Epstein-Barr virus into chromosome 6q15 of Burkitt’s lymphoma cell line (Raji) induces loss of BACH2 expression, Am J Pathol; 164:967-974.
34. Tarbouriech N, Buisson M, et al (2006): Structural genome of the Epstein-Barr virus, Acta Crystallogr D Biol Crystallogr; 62: 1276-1285.
35. Wang Q, Jia P, Zhao Z (2013): Virus Finders: software for efficient and accurate detection of viruses and their integration sites into host genome through next generation sequencing data, PLos One; 8:e64465.
36. Weiss LM, Movahed LA, et al (1989): Detection of Epstein-Barr viral genomes in Reed-Stemberg cells of Hodgkin’s disease, NEJM; 32:502-506.
37. Yates JL, Warren N, et al (1984): A Cis-acting element for Epstein-Barr viral genome that permits stable replication of recombinant plasmids in latently infected cells, PNAS USA; 81: 3806-3810.
38. Zeng MS, Li DJ, et al (2005): Genomic sequence analysis of Epstein-Barr virus strain GD1 from a nasopharyngeal carcinoma patient, J Virology; 79:15323-15330.
39. Zimber U, Adldinger HK, et al (1986): Geographical prevalence of two types of Epstein-Barr virus, Virology; 154:56-66.