Medical Genetics 1st Ed

chapter 4

The Structure and Function of Genes

CHAPTER SUMMARY

When one thinks about the genetic makeup of a human, or indeed any organism, it is natural to focus on the protein-coding genes. After all, that is the part of the genome that controls biochemical activities of cells and the processes of growth and development. But the protein-coding genes whose function is summarized in the “Central Dogma” (DNA ↔ mRNA → polypeptide) account for only about 3% of the DNA in a human cell. The genome also contains a large array of DNA sequences that have other functions (Figure 4-1) or that perhaps have no function at all. Some sequences represent the no-longer functional copies of duplicated genes, pseudogenes, produced at an earlier time in a species’ history. In other cases, the regulatory functions of regions like microRNAs have only recently been recognized. Thus, the genome must be understood as a package of informational, historical, and noncoding DNA along with regions that hold secrets that researchers continue to unravel with the tools of molecular biology.

image

Figure 4-1. Overview of the kinds of DNA sequences found in the human genome (after Stracham and Read, Garland Science, NCBI Bookshelf). For additional details, see Tables 4-1 and 4-2. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Table 4-1. Categories of Genome Sequence Complexity

images

Table 4-2. Types of Noncoding RNA

images

In Chapter 1 we saw that the chromosomes of eukaryotes (Figure 4-2) are made up of DNA complexed with proteins to form a nucleoprotein structure. The DNA molecule in each chromosome is a single, very long double helix. If one took each of the 23 chromosomes in one haploid set of human chromosomes, removed the protein, and stretched the DNA molecules out end-to-end, they would reach about a meter in total length. On average, then, each human chromosome’s DNA strand is about 4.3 cm long (100 cm/23 linkage groups) and can be composed of as many as several hundred million nucleotide base pairs. Within this molecule, some genes follow the diploid organization we have assumed to this point, with one copy of each gene per haploid genome. But many genes are actually found in multi-gene families that often have large numbers of copies, and in fact the number of copies can change over time.

image

Figure 4-2. A typical eukaryotic chromosome showing some of the genetic structures and activities it can carry. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The first step is to understand the kinds of sequences present in the genome and their functions, if any, for the cell or their use to researchers, which is not necessarily the same thing. We will then explore how this vast amount of DNA is packed within the tiny confines of a nucleus and how packing can influence the process of gene regulation. This will lead into our discussion of normal and aberrant cytogenetic organization in Chapter 5.

Part 1: Background and Systems Integration

Categories of Sequence Complexity

Sequence complexity refers to the number of times a particular nucleotide sequence is found in the genome. Some are unique in the haploid genome, while other short- or medium-length sequences are repeated dozens, hundreds, or even millions of times. One way to estimate the proportion of the genome at various levels of sequence complexity is to measure the rate of DNA reannealing (Figure 4-3). Genomic DNA is first broken up into short pieces of several hundred base pairs (bp) each. Reaction temperature is then raised so the hydrogen bonds between strands break, yielding single-stranded molecules. In other words, the DNA becomes denatured. You will recognize this as being similar to the first DNA melting step of the PCR technique we described earlier. When the temperature is cooled again, complementary strands begin to renature into stable double strands, i.e., they reassociate or reanneal.

image

Figure 4-3. Detecting levels of sequence complexity by measuring rates of reassociation of melted DNA fragments. (a) The hydrogen bonds between complementary DNA strands are broken at high temperature. When fragmented DNA is denatured by heat, it yields single-stranded DNA fragments that reassociate (renature or reanneal) when cooled. (b) Rates of renaturation are measured in a C0t curve that plots the percentage of DNA that has reannealed against the DNA concentration C0 times the incubation time, t. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

For repetitive sequences, where there are large numbers of a particular sequence in the genome, it takes less time for two complementary strands to encounter each other and pair than it does for the parts that are present in lower repeat numbers or as single copies in the haploid genome. A C0t curve plots a metric derived from concentration (C0) and time (t) versus the percentage of DNA that has reannealed. From such experiments, it is estimated that about 60% of the human genome is made up of slowly reannealing DNA, representing largely unique or low-copy sequences. About 30% reanneals at an intermediate rate (middle-repetitive) and 10% is fast-annealing (highly repetitive). The proportion of middle- and highly-repetitive DNA can be even larger in other organisms.

Most protein-coding genes are part of the unique-sequence component. But, it is misleading to think that all of the unique sequence DNA is doing something useful. Similarly, the multiple-copy DNA is not just filler. It can have functions that are critical to the individual, like the tandem repeats that make up the telomeres at the tips of each chromosome (Figure 4-4). In Table 4-1 we summarize some of the roles (or non-roles) played by representative levels of sequence diversity, and we will expand upon some important examples in the next few sections. To avoid confusion, this list will address sequence copies per haploid genome. For example, so-called “unique” or single-copy sequences are of major importance, because they code for many of the key proteins controlling cell structure and function. But since a diploid carries two copies, it can be confusing to call such unique sequence genes “single-copy” without any qualification.

images

Figure 4-4. An example of a FISH hybridized metaphase spread. Four different probe signals are visible: subtelomeric probes for 2p (green signal), 2q (red signal) are seen. (Reprinted from Wise JL, Crout RJ, McNeil DW, et al: Cryptic subtelomeric rearrangements and X chromosome mosaicism: a study of 565 apparently normal individuals with fluorescent in situ hybridization. PLoS One. 2009 Jun 10;4(6):e5855. doi: 10.1371/journal.pone.0005855.)

Anatomy of a Protein-Coding Gene

For the purposes of this chapter, we will tend to take for granted the single-copy genes that code for proteins. But since this chapter focuses on DNA structural components, it might be helpful to review briefly the anatomy of a typical protein-coding gene. The term “structural gene” is often used to describe the nucleotide sequence that defines the amino acid composition of a protein. Upstream of the coding region will be a stretch of noncoding DNA that includes regulatory functions, such as the promoter that binds RNA polymerase and the binding sites for various transcription factors (Figure 4-5). Downstream past the end of the coding region is the terminator that ends transcription. While these regulatory sequences are not part of the structural gene per se, they are critical elements in its functional environment.

images

Figure 4-5. Correspondence between the functional regions of the genome and of the final mRNA product following transcription and RNA processing. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Coded within the transcribed region are nucleotides that define the ribosome-binding domain, translational start site, the codons corresponding to the polypeptide’s amino acid chain, and stop codons that terminate translation. But the gene as it is found on the chromosome has untranslated regions (introns) in addition to the portions (exons) that are part of the mature messenger RNA (mRNA). The dystrophin gene associated with Duchenne muscular dystrophy, for example, has more than 80 exons. Cleaving the introns out of the 2500 kilobase (kb) initial transcript yields a mature mRNA that is only about 14 kb long.

Deciphering the human genome and the genomes of model organisms has clearly demonstrated the range of unusual patterns in what we might call the “molecular geography of functions” in a DNA strand. First, the synthesis of an mRNA transcript always reads the DNA template strand from the 3′ toward the 5′ end (the mRNA strand grows in the antiparallel direction, adding new nucleotides at its growing 3′ end). But the template strand’s orientation can differ from one gene to another (Figure 4-6), with some genes reading one of the DNA strands and others being transcribed in the opposite direction using the other strand. The orientation is determined by which strand has the proper nucleotide sequence of a promoter.

images

Figure 4-6. Transcription of three different genes, showing that the direction of transcription depends on the placement and orientation of the promoter region. The transcript is synthesized in a 5′ to 3′ direction by reading the template strand from 3′ to 5′. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Furthermore, a strand of DNA is not always the exclusive territory of only one gene. Rare examples are known where genes overlap. Often this involves transcribing the complementary strands in opposite directions, with part of the mRNA transcripts overlapping (using the same portion of the DNA molecule) at one end. But occasionally two genes will transcribe the same strand in different reading frames, as seen for example in the mitochondrial ATPase gene (Figure 4-7). Genes can also be nested as, for example, in the neurofibromatosis type I (NF1) gene that has three smaller genes (OGMP, EVI2A, and EVI2B) transcribed within one of its introns (Figure 4-8).

images

Figure 4-7. Example of overlapping genes.

images

Figure 4-8. Neurofibromatosis type I (NF1) with three small genes transcribed from within one of its introns.

The information flow summarized by the Central Dogma is, therefore, deceptively simple until we look at DNA from the viewpoint of its functional geography. No doubt the picture will become more complex. But, by the same token, seeing the complications that are encoded within DNA gives us the foundation to explain important mechanisms at work in medical conditions. So, from this complexity will ultimately come greater understanding and order.

Varieties of RNA

Early understanding of RNA and its function was mainly focused on the actual transcription and translation of DNA into a protein product. Messenger RNA (mRNA) was recognized as the copy resulting from transcription which then functioned as the template for translation. Ribosomal RNA (rRNA) was appreciated as a major component of the translational machinery, and transfer RNA (tRNA) was known to be the major shuttle vehicle for transporting amino acids to the translational complex. Advances in the understanding of RNA and its various functions has revealed many other “types” of RNA that do not code for a specific product, but that play many important roles in the process of gene expression and regulation. Besides tRNA and rRNA, several more types of noncoding RNA have been discovered. It is sometimes confusing that this group of molecules can be referred to by many other names such as non-protein-coding RNA (npcRNA), non-messenger RNA (nmRNA), small non-messenger RNA (snmRNA), and functional RNA (fRNA). The list of non-coding RNAs in humans described thus far totals over 15 types, each with numerous subtypes that have specific regulatory functions in processes such as translation, transcription, posttranscriptional modification, DNA replication, epigenesis, and gene regulation/expression (Table 4-2). The importance of these RNAs cannot be overstated. Not surprisingly, mutations in these RNAs lead to clinical problems. A few are listed later. Much more important is the fact that these molecules have an ongoing role of gene regulation and expression beyond embryogenesis—in fact for the life of the individual. This fact, then, gives noncoding RNAs tremendous potential for use in the development of genetic therapies, where the actual DNA code would not have to be changed.

MicroRNAs: Single-Copy Sequences That Are Not Protein Coding

MicroRNA (miRNA) is not translated. Instead, it is a recently identified mechanism for gene regulation. MicroRNAs are small RNA molecules that participate in gene regulation by RNA interference. Each is about 21 to 23 nucleotides long and is at least partly complementary to one or more mRNA molecules. By binding to an mRNA molecule, miRNA inhibits translation or degrades mRNA and, thus, down-regulates the expression of that gene. MicroRNAs can be produced from a variety of sources. Some are produced from genes, while others are processed from introns, noncoding RNAs, transposons, or other sources. The initial, or primary, transcript for a miRNA is a pri-miRNA with 5′-cap and poly-A tail that is processed in the nucleus to a 70-nucleotide pre-miRNA with a stem-loop structure (Figure 4-9). The pre-miRNA is processed into mature double-stranded miRNA in the cytoplasm by the endonuclease called Dicer. This RNA then associates with protein to form the RNA-induced silencing complex (RISC), and one of the RNA strands is broken down. When the remaining RNA strand of the RISC binds to an mRNA, it can cause the mRNA to degrade or it can block translation.

images

Figure 4-9. The processing of microRNA that is involved in gene regulation by RNA interference. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Functional Multi-Copy Genes

Several proteins in eukaryotic cells are coded for by families of genes that are distributed at dispersed locations among the chromosomes. Examples of dispersed gene families include actin, with 5 to 30 copies in eukaryotes, and the tubulin proteins, with 3 to 15 copies. In some cases, members of a family can diverge, allowing slightly different functions to arise within the group.

Some gene products are needed in large amounts, and having multiple copies is one way to accomplish this. Such genes can sometimes be found in tandemly-duplicated arrays. Histone proteins, for example, are present 100 to 1000 times in duplicated arrays. The tRNAs and rRNAs, in which the final gene product is the RNA itself, are also duplicated tandemly. In humans, there are about 50 chromosomal locations for the different tRNA genes, with from 10 to 100 copies at each. Another example is the rRNA coding structure associated with the nucleolar organizer (NO), which forms a cytologically distinct region, the nucleolus, in the nucleus. A human NO region can contain about 250 copies of tandemly arranged rRNA genes (homologous to the “p” arms of the acrocentric chromosomes) to produce the large amount of these RNAs needed to synthesize ribosomes. The processing of one of the repeated units is shown in Figure 4-10.

images

Figure 4-10. Tandemly repeated sequences like the one shown here from the nucleolar organizer (NO) region code for three rRNAs needed for ribosome synthesis. This sequence is repeated about 250 times in the human NO region. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Pseudogenes

Pseudogenes have sometimes been described as “gene ghosts.” Some are duplicated genes that accumulate mutations that make them nonfunctional since selection no longer acts effectively against mutations in the extra copy. Others may have arisen from the activity of retrotransposons, like SINEs and LINEs described later. These transposable elements reverse transcribe, or retrotranspose, from RNA back into DNA that then inserts into a chromosome. In the process, some random mRNA can become involved and yield a pseudogene that generally lacks typical features like a promoter or introns that had already been processed out of the RNA molecule. If such a sequence is inserted into an intron, it may be neutral or it can possibly have an effect due to alternative intron splicing. Other rarer mechanisms have also been documented. But it is also possible that presumed pseudogenes might acquire a new function or be mistakenly classified because of incomplete information about their function. If their promoter is intact, some presumed pseudogenes can be transcribed and may potentially play some role in gene regulation and gene expression.

Repetitive Sequences Having an Uncertain Function

Some repetitive sequences likely affect the biology of the cell, but the mechanism by which they act, if any, is still uncertain. A good example is the repetitive alpha-satellite DNA in the heterochromatic regions around human chromosomal centromeres that makes up about 3% to 4% of the genome. Its repetitive nature makes centromeric DNA very difficult to sequence for comparative genomic studies. In contrast to the conserved sequence similarity found in genes that share a critical function in different organisms, the sequence of centromeric repeats appears to differ extensively from one species to another.

Human alpha-satellite DNA is composed of tandemly repeating units (monomers) of about 171 bp. There are two forms of alpha satellite (Figure 4-11). Higher order repeat arrays (HOR) are chromosome-specific arrays composed of hundreds or thousands of copies per chromosome totaling 3 to 5 Mb in size and with only about 2% sequence divergence between the repeat units. The array size varies from one individual to another due to unequal crossing-over during meiosis. This can yield some submicroscopic chromosome length polymorphism. In addition, a second type of alpha-satellite DNA has been found in the areas of transition between the HOR region flanking the centromere and the coding euchromatic portion of the chromosome. Unlike higher order structure, the repeats within this so-called “monomeric” alpha-satellite DNA have a lot of sequence divergence among individuals.

images

Figure 4-11. Human alpha-satellite DNA showing tandemly repeating units from the centromeric region of a chromosome (Reprinted with permission from Alkan et al: 2007. PLoS Computational Biology 3(9): e181. Doi: 10.1371/journal.pcbi.0030181.)

Minisatellites and Microsatellites

The variable number of tandem repeats (VNTRs), or minisatellites, form a class of tandemly-repeated sequences that can vary from one location or one individual to another. Each is between about 1 and 5 kb in length with repeated units of about a dozen to perhaps 100 nucleotides. Since they can readily change in the number of repeated units, minisatellites can be a useful marker to assess chromosome relationships, such as those among geographically separated populations.

As far as genome content is concerned, therefore, lacking a biological function is not the same as lacking a use. Even when they have no particular influence on the biochemistry of cellular processes, VNTRs have a use in applications like forensic DNA fingerprinting. The Blooding, by Joseph Wambaugh, for example, is an historical novel recounting the true story of the first murder case solved with the involvement of genetic fingerprinting. The now-familiar process has played a role in an increasingly large number of legal cases. Given the legal profession’s awareness of this aspect of genetic diversity in human populations, physicians with knowledge of medical genetics might expect inquiries in this field.

To obtain a DNA fingerprint, total genomic DNA is digested with a restriction enzyme that cleaves DNA at specific nucleotide target sequences (Figure 4-12). The population of fragments is then separated by size with electrophoresis. Once the fragments have been transferred from the electrophoretic gel to a nylon membrane by a process called Southern blotting, they can be visualized by hybridizing with a radioactively-tagged oligonucleotide probe that complements regions within the minisatellite. The resulting autoradiograph (Figure 4-13) shows the variation in fragment lengths that will occur when the placement of restriction enzyme target sites differs from one individual to another. Similar approaches using fluorescent tags are also in use.

images

Figure 4-12. DNA fingerprints are produced by digesting chromosomes with a restriction enzyme to yield fragments that differ in the number of tandem repeats, and thus the relative migration rates of the fragments on an electrophoretic gel. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 4-13. An autoradiograph showing DNA fragments which show different migration speeds, interpreted in Figure 4-29.

Microsatellites have become an even more widely used category of tandemly repeated sequences. They differ from minisatellites in the length of the repeated unit. Microsatellites are long repeats of two to six nucleotides (often di-, tri-, and tetranucleotide repeats). Changes in microsatellite copy number can have serious medical consequences. In other cases, microsatellites serve as genetic markers that can segregate along with (i.e., cosegregate with) a condition of interest and lead its molecular localization.

Transposable Elements

The study of transposable elements (TEs), sometimes called “jumping genes,” was pioneered by Barbara McClintock who received the Nobel Prize for her work in 1983. They are common in most organisms and move by a variety of mechanisms. By transposing from one chromosome location to another, they can insert directly into a gene causing a mutation or diseases, including cancers. Most insertions, however, are into noncoding regions and do not affect development. In the human genome, there are several hundred different transposable element families and subfamilies that together account for about 44% of the total human genome. But most of these copies are inactive. Only about 0.05% of the more than 4 million annotated TEs in the human genome are still capable of transposition.

Transposons carry out transposition as DNA copies, while retrotransposons spread after reverse transcription of an RNA molecule into DNA. These include short interspersed nuclear elements, called SINEs, that are less than about 500 nucleotides in length. Another type of transposable element in mammals, including humans, is the long interspersed nuclear elements (LINEs), which have some DNA sequence homology to retroviruses and encode enzymes used in transposition.

The most common human TEs are the retrotransposon Alu and the LINE-1 (L1) dispersed repetitive sequences. The complete Alu sequence is about 200 nucleotides long, and L1 is 1 to 5 kb in length. Aluis present in hundreds of thousands of full and partial copies in the human genome, and L1 is found in 20,000 to 40,000 copies. But most Alu and L1 elements are truncated and thus inactive. There is functionally important genetic variation in L1 copies, but only rarely have active L1 copies been documented to transpose and cause disease. Transposition by Alu is dependent on the L1-transposition mechanism. Examples of human diseases traced to transposable elements include occurrences of hemophilia A (L1 insertion) and neurofibromatosis type 1 (Alu insertion). The genetics and biomedical importance of transposable elements will be explored in more detail in Chapter 12.

Eukaryotic Chromosome Packing in the Nucleus

This introduction to categories of genetic function brings home the point that there is a very large amount of DNA in each cell. The mechanism for packaging this DNA in the nucleus is critical to maintaining its integrity and organization (Figure 4-14). Packaging also influences gene expression. The basic element of chromosome structure is the nucleosome, a repeating unit composed of double-stranded DNA wrapped almost twice around a complex of histone proteins (Figure 4-15). This compacts DNA by reducing its length about 7-fold.

images

Figure 4-14. The hierarchy of DNA packing in a chromosome. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

images

Figure 4-15. Nucleosomes are composed of DNA wrapped around the positively-charged histone core. (a) The core of eight proteins includes two each of histone H2A, H2B, H3, and H4. (b) Histone H1 and various non-histone proteins bind to the linker DNA between adjacent nucleosomes. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

The nucleosome’s protein core is composed of two molecules of each of four different histone proteins, H2A, H2B, H3, and H4. These contain a large number of lysine and arginine amino acids, making the protein core highly basic. This helps it bind to the negatively-charged phosphate groups in DNA. A short unbound stretch of linker DNA is found between consecutive nucleosomes. A fifth histone, H1, binds to the linker DNA and may help connect adjacent nucleosomes during early chromosome compaction (Figure 4-16), although there are competing models of the resulting 30-nm fiber. This shortens DNA length another 7-fold, to a total of almost 50-fold over the initial DNA molecular length.

images

Figure 4-16. (a) When H1 histones are not present, the nucleosomes appear like a beaded string. (b) The H1 histones may link together adjacent nucleosomes to yield a first-order level of compaction seen in the 30-nanometer (nm) fibers. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

In addition to these histone proteins, there is a large non-histone protein component in a chromosome. Non-histone proteins are highly diverse. The largest class is the transcription factors that regulate gene expression. Other examples are the non-histone proteins in the centromeric kinetochores that function in chromosome movement during cell division and the structural proteins of the nuclear matrix and supporting scaffold of condensed chromosomes.

At a third level of compaction, the 30-nm fibers are hypothesized to attach as radial loops to filaments of the dynamic network of proteins that make up the nuclear matrix (Figure 4-17). These looped domains of between 25,000 bp and 200,000 bp in size are anchored to the matrix filaments by other kinds of non-histone proteins. Matrix-attachment regions (MARs) or scaffold-attachment regions (SARs) are dispersed at intervals throughout the genome. This results in a further 200- to 250-fold shortening of the chromosomes to a total of about 10,000-fold above the naked DNA.

images

Figure 4-17. (a-d): The nuclear matrix. (a, d: Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008. (b, c): Nickerson et al: “The nuclear matrix revealed by eluting chromatin from a cross-linked nucleus.” PNAS 94: 446-4450. Figure 2ab. © 1997 National Academy of Sciences, USA.)

Formation of the chromosome scaffold from the nuclear matrix causes additional compaction of the radial looped domains (Figure 4-18). By the end of prophase in nuclear division, all chromosomes are highly condensed, and gene transcription is almost completely halted since transcription factors cannot easily bind the DNA. Each chromosome also has its own characteristic patterns of regional compaction that can be seen by pretreating prometaphase chromosomes with heat and then staining with Giemsa (G-banding technique; Figure 4-19).

images

Figure 4-18. The protein scaffold of a metaphase chromosome. Banding at the level of the electron micrograph is shown.

images

Figure 4-19. Individual patterns of density following G-banding allow cytogeneticists to identify individual chromosomes and large-scale changes in structure. The image on the left shows chromosomes arranged in homologous pairs; the right image is the way they appear in the original spread. (Reproduced with permission of Warren G. Sanger, PhD, University of Nebraska Medical Center, Omaha, Nebraska.)

About 850 G-bands can be distinguished in a human karyotype, which provides cytogeneticists with a fine degree of structural resolution. Given their intimate association with DNA, it should not be too surprising that nucleosomes can influence gene expression. Enzyme-controlled chromatin remodeling involves the partial or complete displacement of histones to allow access by transcription factors to promoter regions (Figure 4-20). Medical examples of chromatin remodeling and other conditions associated with the various categories of sequence complexity are discussed in Part 2.

images

Figure 4-20. Partial or complete histone protein removal during chromatin remodeling. (Reprinted with permission from Brooker RJ: Genetics: Analysis & Principles, 3rd ed. New York: McGraw-Hill, 2008.)

Part 2: Medical Genetics

The concept that a gene exerts its effects by coding for a protein that has a specific function was first formally proposed by Archibald Garrod in 1909 in his study of the inborn error of metabolism alkaptonuria. George Beadle and Edward Tatum were awarded the Nobel Prize in Physiology or Medicine in 1958 for their experimental work in 1941 documenting the relationship between the gene and the protein (enzyme). This relationship has been described as the “one gene, one enzyme” hypothesis. This was the first identified mechanism of how a change in DNA could result in a heritable trait. For a brief time it seemed that genetics made sense.

Of course, the natural world is not that simple. Discoveries over the past several decades, as will be discussed in detail throughout this book, have expanded the understanding of how changes in nucleic acids lead to observable differences in the physiology of the individual. Mechanisms such as epigenetic influences, posttranslational protein modifications, differential DNA processing, gene-gene interactions, gene environment interactions, gene regulation (promoters/enhancers), and so forth, all have a role in the normal workings of the organism. A common theme throughout this book is that of genotype: phenotype correlation. Specifically, by what mechanisms do changes in the genome produce human diseases?

In the first section of this chapter we discussed the many different ways that nucleic acids in the form of DNA and RNA are organized and function in the genome. The relationship is much more complicated than a straight sequence of a coding DNA that translates into a single exact protein sequence. The primary point here is that genetics is not just about protein coding.

A better understanding of DNA, how it is organized and arranged, what it does, what influences it, and what interacts with it leads to an enhanced understanding of disease. Many medical conditions simply cannot be understood without this knowledge base. For many conditions, the DNA relationship would never have been elucidated by an observation of the phenotype with a predictive deduction of the genotype. Powerful molecular tools have identified the genetic basis of many conditions where no physiologic clues lead to answers. For example, cystic fibrosis (CF) is a complex medical condition characterized by chronic progressive pulmonary disease and pancreatic insufficiency (Figure 4-21). The condition is still considered a lethal disorder although the life expectancy has improved dramatically from 5 years old in the 1960s to almost 40 years old now. Cystic fibrosis is inherited as an autosomal recessive condition. It is one of the more common single gene disorders in humans with an estimated frequency of 1 per 1600 persons of Northern European descent. Early on, people had appreciated a salty taste when kissing the foreheads of babies with this condition. To this day the “sweat test,” which measures sodium and chloride levels in sweat, is the gold standard diagnostic test for this condition. Many of the features of CF were suggestive of some type of problem with exocrine glands.

images

Figure 4-21. (a) Young girl with cystic fibrosis requiring oxygen therapy for chronic pulmonary disease. (b) Adolescent male with cystic fibrosis receiving aggressive pulmonary therapy. (c) Chest X-rays showing chronic obstructive pulmonary disease in patients with cystic fibrosis. (d) Mucous cast of a bronchus removed at autopsy on a patient with cystic fibrosis.

The pathophysiology of the condition seemed to indicate problems with inspissated mucous in these glands. Hundreds if not thousands of studies were performed to try and identify the cause of CF using these bits of information. Ultimately, the CF gene was discovered by the chance identification of genetic linkage of the condition to an unrelated enzyme, paroxynase. Sequencing of the gene and sequence homology predictions identified the gene as coding for a membrane chloride transport function. With these insights, the physiological cascade of:

1. abnormal chloride transport

2. leading to increased chloride content in the excocrine glands

3. which resulted in thickened mucous in the glands

4. leading to obstruction of the glands and impaired functions was revealed as the “cause” of CF.

It is highly unlikely that this mechanism would have been discovered in anywhere close to this time frame by non-molecular genetic approaches.

Another critical part of the understanding of these variations on the “one gene one polypeptide” hypothesis is that many of these alternative mechanisms have a greater potential for genetic therapies than do simple changes in the code. In the example of cystic fibrosis, for example, therapy can now be targeted to the primary source of the disorder–impaired chloride transport across membranes—rather than the secondary clinical expressions. Each variation is unique and has its own implications for human health.

Table 4-3 lists some examples of human medical conditions associated with mutations in different “types” of DNA. In the text that follows we discuss a few of these in more detail. This discussion is not meant to be a comprehensive list of all such conditions, nor does it provide a detailed description of each condition. Rather we are reviewing the breadth of the medical implications in the broader context.

Table 4-3. Human Medical Conditions Associated With Mutations in Different Types of DNA

images

Changes in the DNA-Coding Sequence

It is intuitive that changes (mutations) in the DNA-coding sequence can result in disease. One way in which this may occur is if the mutation results in a missing protein product. The classic example of this would be the enzymatic defects seen in the inborn errors of metabolism (Chapter 8). Mutations that lead to poorly functioning enzymes will cause problems from the lack of normal enzymatic activity. Alternatively, changes in the coding sequence may produce a structurally abnormal product that is not deficient but that interferes with other proteins, such as in sickle cell anemia and many of the connective tissue disorders.

It is important to note that not all DNA changes—even in the coding sequence—necessarily lead to problems. Due to the “degenerative” nature of the DNA code, there is some “wobble” in protein translation in which the third position of a codon is less important than the first two in determining the tRNA binding to the ribosome. Thus certain nucleotide changes will not produce changes in the expected amino acids at that position. In addition, even if an amino acid change occurs due to a mutation, there still may not be a phenotypic problem. Depending upon where in the protein the amino acid change occurs and what that amino acid is doing at that position, an amino acid substitution may not cause any appreciable alteration in the protein function. These types of silent changes are referred to as benign polymorphisms. Clinically benign polymorphisms present a challenge in DNA testing. The identification of a specific nucleotide change that has not been seen before (a polymorphism of “unknown significance”) has to be interpreted cautiously. The identification of mutation in a gene that is associated with a specific condition is not always the cause of that condition in a particular patient. These concepts will be discussed more in Chapter 7 on Mutations and discussions on pathogenesis in Chapter 16.

Changes in the Gene Outside of the Coding Sequence

As has already been described, the components of a particular gene extend well beyond the actual coding sequence. Thus just looking at the coding sequences may not identify the cause of a genetic disorder. Sometimes the mutations may be in the promoter or enhancer regions. Recently, insights have also led to the understanding that mutations in the noncoding regions (introns) are not always benign. Changes in the intronic sequence may change such things as splice sites or other recognition points. Beta-thalassemia (Figure 4-22) is a disorder of the hemoglobin molecule. Hemoglobin is a multimeric protein composed of equal amounts of two proteins, the alpha and beta chains. The human genome has six genes (four alpha chain and two beta chain) coding for these proteins. The thalassemias represent a group of related conditions seen with a decrease in the production of one of the hemoglobin chains. The clinical presentation of thalassemias can vary from a stillborn infant to a person with mild asymptomatic anemia. Beta- thalassemia occurs with mutations that result in a decreased production of structurally normal beta-globin. Many of the mutations that lead to beta-thalassemia have been shown to be in the promoter region or at splice sites; hence a structurally normal protein is produced in decreased quantities.

images

Figure 4-22. Beta-thalassemia is a disorder of the hemoglobin molecule. (Reprinted with permission from Hartwell LH, et al. Genetics: From genes to genomes. 3rd ed. New York: McGraw-Hill, 2008.)

Disbursed Gene Family

The actin proteins are highly conserved cytoskeletal proteins. They play a role in many cell functions such as migration, division, endocytosis, contraction, and structural integrity. Across species almost 30 different actin genes have been noted. In humans there are three major isoforms: the alpha form, which is found in muscle with different subtypes in striated and in smooth muscle, the beta form found in all cells, and the gamma form also seen in all cells. These isoforms represent a “family” of structurally and functionally related proteins. The genes for the different isoforms are located at different, dispersed loci.

The best explanation for the existence of these related forms is that a protein can take on slightly different functions due to mutations in duplicated copies of the gene. As each protein in a family has a different function, mutations in each specific gene will produce different clinical entities. For example, the nomenclature for actin genes is the designation of ACT. Mutations in different members of the actin family produce different phenotypes depending upon the specific gene that has been changed. Mutations in ACT A1, for example, result in various muscle diseases (myopathies). Alternatively, changes in ACT G1 produces non-syndromic deafness, changes in ACT C1 produce cardiomyopathies, and in ACT B juvenile dystonia. It is, therefore, important in genetic diagnostics that isoforms are recognized when testing is performed.

Pseudo Genes

Scattered throughout the genome are DNA sequences that are very similar to those of known functioning genes, but are non-functioning themselves. These sequences are referred to as “pseudogenes.”Pseudogenes are felt to be the historic relatives of functioning genes that have lost their coding ability or no longer express RNA. They are felt to be either duplicated or disabled copies of an original coding gene. They are thus characterized by:

1. sequence homology to a “parent gene” and

2. nonfunctionality.

The current count for human pseudogenes is around 24,000 to 25,000. The most important role of pseudogenes in medicine is that these very similar sequences can create confusion during genetic testing.

Early descriptions of pseudogenes often referred to them as part of the “junk DNA.” Pseudogenes can almost be perceived as the “appendix” of the genome. As with most vestigial structures their importance may be falsely discounted. Pseudogenes may actually have a role in gene regulation and gene expression. They may in fact play a part in the regulation of protein-coding transcripts. It is also felt that some silencing RNAs (sRNA) may be derived from pseudogenes.

Tandem Repeats

As first introduced in Part 1, the term tandem repeats means clustered repeated nucleotides (next to each other and oriented in the same direction). They may be subcategorized as: satellites, minisatellites, or microsatellites based upon their overall size and the length of the repeated unit. The name “satellites” comes from a pattern of optical density seen on spectral analysis where DNA with tandem repeats appear as bands off of the main band (consisting of the majority of the “regular” DNA; Figure 4-23).

images

Figure 4-23. Optical density banding of nuclear DNA showing satellite bands corresponding to repetitive sequences.

Satellites

Satellite DNA ranges in size from 100 kb to over 1 Mb. Most satellites in humans are located at the centromere. The centromere of human chromosomes has no precisely-defined DNA sequence but in fact is largely made up of large arrays of satellite DNA. The primary centromeric repeated unit is referred to as the alpha satellite. The repeat unit of the alpha satellite is 171 bp and the repetitive region accounts for 3% to 5% of the overall DNA content. Centromeric DNA usually occurs in a heterochromatic state and is associated with a unique histone protein, CENP-A.

Deletion of the centromeric portion of a chromosome will lead to loss of further replication and migration of that particular chromosome. Some cancers have been associated with problems of centromeric function. In addition, several autoimmune diseases have been associated with anti-centromeric antibodies such as systemic sclerosis, systemic lupus erythematosis, rheumatoid arthritis, and Sjögren syndrome.

Minisatellites

Minisatellites typically are variant regions of the genome consisting of repeats rich in guanine and cytosine (GC) that range from 10 to 100 bp. The overall size of a minisatellite ranges from 1 to 20 kb. The vast majority (over 90%) of minisatellites are found in the subtelomeric regions of the chromosomes. Some minisatellites have an exceptionally high mutation rate approaching 20%. These hypervariable minisatellites then represent the most unstable loci of the genome.

One type of minisatellite is called variable number of tandem repeats (VNTRs). Individual repeats can be duplicated in or deleted from the VNTR via recombination or replication errors. This leads to variants that act as heritable alleles characterized by the different numbers of repeated DNA sequences. VNTRs are extremely useful in many settings of DNA diagnostics as a natural source of readily identifiable genomic variation between individuals. VNTRs can be used to generate an individual “DNA fingerprint.” Common applications using such VNTRs in molecular diagnostics include forensic investigations, paternity testing, personal identification, and population migration tracking.

Alterations in minisatellites have been associated with several human medical conditions. Minisatellites have been associated with chromosome fragile sites and are proximal to a number of recurrent translocation breakpoints. Polymorphisms in the VNTR region of certain genes have been associated with neurobehavioral disorders. VNTRs of the 5-hydroxytryptamine/serotonin transporter (5-HTT) gene have been associated with anxiety disorders and changes in the dopamine transporter 1 (DAT1) gene with attention deficit hyperactivity disorder. Unstable 15-18 minisatellite expansion in the promoter region of the glycosyltransferase 6 (GT6) gene has been shown to cause autosomal recessive myoclonic epilepsy.

Another type of minisatellite is the telomere of the human chromosome. All eukaryotic chromosomes are capped at the end with repeat telomere sequences that protect the ends from damage and rearrangement. The size of a telomere is about 15 kb in the germ cells. It is somewhat smaller in somatic cells. The human telomere contains the tandemly repeated sequence GGGTTA. The handling of telomeres has a specific role in the aging process with removal of telomere sequences functioning as the biologic “timekeeper” of cell cycles.

Dyskeratosis congenita (DC) is a multisystem disorder characterized by a classic triad of features that include nail dystrophy, skin hyperpigmentation, and mucosal leukoplakia (Figure 4-24). Patients also typically have bone marrow failure, premature aging, ataxia, hypoplasia of the cerebellum, and learning disabilities. Cytogenetic studies in patients with dyskeratosis congenita show shortening of the teleomeres. One form of DC is X-linked and is associated with a gene, dyskeratosis congenita 1 (DKC1). The protein associated with this protein product is dyskerin. This is a component of the small nucleolar ribonucleoprotein (snoRNA) particles. Dyskerin plays an important role in the processing of telomere complexes. The autosomal dominant form of DC is associated with mutations in the gene telomerase RNA component (TERC). The product of TERC is an RNA that is a component of the telomere that actually functions as a template. Thus the principal pathology of DC appears to be related to abnormalities of the processing of telomeres.

images

Figure 4-24. Adult male with dyskeratosis congenita. This is a rare multisystem disorder caused by defective telomere maintenance. Clinical features include abnormal ‘reticular’ pigmentation of the skin, ectodermal changes (brittle nails, scant hair, and poor dentition), osteoporosis, premalignant lesions of the oral mucosa, absent fingerprints, missing lacrimal ducts, hyperkeratosis of the palms, anemia, and immune deficiency. A finding of ‘endoreduplication’ is seen on chromosome studies.

Microsatellites

Microsatellites represent the smallest size of tandem repeats. They may also be referred to as short tandem repeats (STRs). In humans the size of the repeated unit is 2 to 6 bp, most commonly di-, tri-, or tetranucleotide repeats. They are usually clustered in groups of 10 to 100 repeats yielding an overall size of around 100 to 150 bp.

Microsatellites play an important role in human disorders. The trinucleotide repeat disorders are a group of neurogenetic disorders that share a common pathophysiology. Abnormal expansion of normally occurring microsatellites (trinucleotide repeats) results in neurologic problems such as mental retardation, ataxia, and movement disorders (Table 4-4). These conditions demonstrate a set of novel mechanisms in the basis of genetic disorders. Most show genetic anticipation, i.e., a worsening of the condition as it passes through the generations. (For more details on trinucleotide repeat disorders, see Chapter 12 and the clinical correlation section that follows in this chapter.)

Table 4-4. Clinical and Molecular Characteristics of Selected of Trinucleotide Repeat Disorders

images

Lynch syndrome, also known as hereditary non-polyposis colorectal cancer (HNPCC), is a hereditary cancer syndrome associated with cancer of the colon and other abdominal/pelvic organs (Figure 4-25). HNPCC is genetically heterogeneous with five genes known to cause the condition. All five of these genes are mismatch repair genes—genes involved in identifying and correcting errors in DNA replication. Genetic testing for mutations in these DNA mismatch repair (MMR) genes is laborious and expensive. As a prescreen, laboratories can actually quantify the degree of microsatellite instability(MSI) in colonic tumor specimens. If increased microsatellite instability is identified in a tumor, there is a significantly increased risk that the patient has a mismatch repair abnormality associated with his or her cancer. This group of selected patients is thus “flagged” for further studies such as sequencing of the MMR genes. A combined strategy of immunohistochemical staining and microsatellite instability (MSI) screening is the currently recommended first-step approach for the evaluation of possible Lynch syndrome in a family.

images

Figure 4-25. Lynch syndrome or hereditary non-polyposis colorectal cancer (HNPCC). (a) Endoscopic images of colorectal cancer in Lynch syndrome. (b) Diagram showing the predominance of right-sided colonic tumors in Lynch syndrome (as compared to predominantly left sided tumors in sporadic cases).

Transposable Elements

Transposable elements are mobile segments of DNA that occur in all eukaryotic cells. They are nonrandomly distributed throughout the genome. Potentially, a third to half of the entire genome is composed of repetitive sequences that are degenerative copies of transposable elements. By the very nature of being migratory, these segments of DNA may affect an “insertional mutagenesis.” In other words, they may produce mutations by disruption of a gene or by exerting affects on its promoter or enhancer. In the scope of population genetics, this is likely a prime source of generating genetic variation.

Hemophilia A is a clotting disorder due to a deficiency of a protein (factor VIII) in the clotting cascade. Factor VIII deficiency results in problems with effective blood coagulation. The gene for factor VIII is on the X chromosome, and thus the condition typically affects males. Men with this condition experience problems, often severe, with excess bleeding and bruising. If one looks at the spontaneous occurrence of hemophilia A, a 3-fold higher than expected spontaneous mutation rate is found as compared to other coding sequences. This increased mutation rate resulting in hemophilia appears related to the insertion of a truncated L1 (LINE) into the gene. Other conditions associated with a higher mutation rate felt to be related to transposable elements include neurofibromatosis and breast/ovarian cancer due to mutations in the BRCA2 gene. This is discussed further in Chapter 12(Atypical Inheritance).

Other Changes in RNA

Transfer RNA (tRNA)

The main function of tRNA is its role in the transport of amino acids to the translational complex (RNA to protein). “Mutations” in tRNA produce problems with tRNA synthesis and coupling. Clinical conditions that have been reported with abnormalities of tRNA include Charcot-Marie-Tooth disease and other peripheral neuropathies, Alzheimer disease, Parkinson disease, and atherosclerosis.

Ribosomal RNA (rRNA)

Changes in rRNA result in defects in ribosome biogenesis. Typically, clinical disorders associated with abnormal rRNA interestingly have problems with red blood cell production. Blackfan Diamond syndrome is a multiple anomaly syndrome associated with abnormal thumbs, short stature, and a congenital anemia. Over 25% of the patients with Blackfan Diamond syndrome have mutations in ribosomal protein S19. Abnormalities of rRNA have also been seen with macrocytic anemia and a predisposition to leukemia.

MicroRNAs (miRNA)

There are over 500 miRNAs described in mammals. The key feature of miRNAs are the stem loop phenomenon. They serve as a “fine tuning” function of gene expression. Abnormalities in miRNA have been implicated in cancer, especially leukemia. Other abnormalities of miRNA include cardiac problems (cardiogenesis, hypertropic growth response, and abnormal cardiac conductance). Neurologic changes have also been seen with changes in miRNAs including a role in the pathogenesis of schizophrenia and Alzheimer disease.

Part 3: Clinical Correlation

Significant cognitive deficits (mental retardation, MR) occur in 3% to 4% of the US population. The vast majority of mental retardation can be attributed to genetic factors. In the population, MR occurs about four times as often in males as in females. It has been known for a long time that much of this male predominance can be attributed to mutations in X-linked genes. In fact, the first report of X-linked MR in a kindred was published in 1943. Before molecular tests were readily available, all that the clinician could ascertain was that MR was appearing in the family in an X-linked pattern (see Chapter 6). In 1969 a laboratory directed by Dr. Herb Lubs studying X-linked mental retardation discovered a molecular marker designated as a “fragile site” on the X chromosome (Figure 4-26). This marker was only observed under specific cell culture conditions such as a folic acid deficient media. Using this marker, a subset of families with X-linked mental retardation could be identified. Thus, the clinical phenotype of fragile X syndrome was defined. Men with fragile X syndrome were noted to have cognitive deficits and mild craniofacial changes (macrocephaly in early childhood, a prominent jaw, a broad nasal bridge, large/protuberant ears, light blue irises, and epicanthal folds). Other features included large testicles (macroorchidism) after puberty, lax joints, other skeletal changes, and neurobehavioral/neuropsychiatric problems (Figure 4-27).

images

Figure 4-26. Karyotype of fragile-X syndrome. Note the “fragile” site indicated by the arrow.

images

Figure 4-27. Many men with fragile-X syndrome show: (a) common characteristic facial features and (b) other traits such as macroorchidism.

Further examination of kindreds with fragile X syndrome began to identify a more complex inheritance pattern. Intervening females were often found to have a partial phenotype. Many were noted to have a lesser degrees of cognitive impairment and a pattern of neurobehavioral changes as well. Some showed early ovarian failure. In addition, genetic anticipation (a worsening of the condition as it is passed through generations) was seen in a review of the pedigrees. This pattern of X-linked semi-dominant inheritance with genetic anticipation was described by Dr. Beth Sherman as what subsequently became known as the “Sherman paradox” (Figure 4-28). Ultimately, an exciting discovery was made that uncovered the mechanism of this unusual inheritance pattern. Fragile X syndrome was found to be caused by an expanding trinucleotide repeat (expansion of a microsatellite region) of a gene ultimately designated as FMR1 on the X chromosome at position Xq28. In the case of fragile X syndrome, the specific repeat is a CGG nucleotide triplet. The repeat region is in the 5′ untranslated region of the FMR1 gene. The typical size of the repeat in the general population is 29 or 30 tandem copies. After an initiating event (mutation), the size of the repeat begins to expand as it progresses through the generations. In fragile X syndrome, expansion occurs only if the abnormal allele is transmitted by the mother. The expansion of the repeats can take many generations. Ultimately, when the size of the expansion exceeds 200 repeats, transcription of the FMR1 gene is turned off, the protein product of this gene is not produced, and the affected individual demonstrates clinical fragile X syndrome (Figure 4-29). But the inheritance of fragile X turns out to be even more complex. Over the past several years additional amazing insights into the pathogenesis of this condition have been reported. For more information see Chapter 12, Atypical Inheritance.

images

Figure 4-28. A sample pedigree showing semi-dominant inheritance with genetic anticipation. This pattern illustrates the Sherman Paradox in fragile-X expression. The percentages denote the proportion of affected persons.

images

Figure 4-29. Southern blot showing differing sizes of trinucleotide repeats in fragile-X.

image Board-Format Practice Questions

1. The “one-gene, one-enzyme” hypothesis as put forth by Beadle and Tatum

A. has stood the test of time and still remains a solid working model.

B. has been proven to be completely incorrect.

C. has now been shown to be an overly simple representation.

D. is true for plants, but not humans.

E. applies to most human diseases.

2. Genes are expressed

A. almost exclusively through protein coding.

B. usually in isolation, not by interacting with other genes.

C. by a variety of different mechanisms—some of which do not entail protein coding.

D. only in the nucleus.

E. only due to information in the coding sequence.

3. Satellite DNA

A. is composed of tandem repeats of nucleotide sequences.

B. excludes the centromere.

C. are interesting genetic phenomena, but have little clinical significance.

D. are subclassified as macro- and megasatellites.

E. is typically very homogeneous.

4. Fragile X syndrome

A. is caused by a change in a gene on chromosome 17.

B. demonstrates recessive inheritance.

C. is caused by an expanding trinucleotide repeat.

D. is caused by folic acid deficiency.

E. affects exclusively men.



If you find an error or have any questions, please email us at admin@doctorlib.org. Thank you!