B. A.
Malyarchuk
Institute of the Biological Problems of the North, Far East Division,
Russian Academy of Sciences, Magadan, 685000.
Abstract—The nucleotide sequence variation of hypervariable segment 1 (HVS1) was analyzed for mtDNAs of 88 phylogeographical clusters characteristic of African, West Eurasian, or East Eurasian populations. A significant difference in the distribution of mutations was revealed for the mitochondrial gene pools of the regional human populations. The HVS1 positions were identified whose instability is explained by strand displacement during mtDNA replication. Strand displacement was assumed to be a major mechanism of context-dependent mutagenesis associated with the regional differentiation of human populations.
Key words: human mitochondrial DNA, major noncoding region, hypervariable segment 1, nucleotide substitutions, hot points, mechanisms of mutagenesis.
INTRODUCTION
The human mitochondrial genome is a circular DNA of 16,569 bp that shows a clonal maternal inheritance and experiences no recombination [1, 2]. Mito-chondrial genome variation has recently attracted particular interest in relation to human evolution, genetic history of ethnic and racial groups, and molecular medicine.
As population genetic studies have revealed, the human mitochondrial gene pool displays a continent-specific distribution of monophyletic clusters (groups and subgroups) of mtDNAs, with gene pools of particular ethnic and racial populations containing different sets of mtDNA groups [2]. On the strength of the phy-logeographical data, all Eurasian mtDNA groups belong to three macrogroups—M, N, and R—which originated from one African group, L3, and spread through Eurasia since its colonization by modern-type humans started about 60,000 years ago [3-5]. The mitochondrial gene pools of the West and East Eurasian populations differ in the origin and composition of mtDNA groups. While mtDNAs of macrogroups N and R prevail in the West Eurasian gene pool, the East Eurasian one mostly consists of mtDNAs of macro-group M and specific variants of macrogroups N and R. Gene pools of African populations consist of mtDNA groups belonging to macrogroup L.
Several explanations are possible for the geographical difference in mtDNA groups prevailing in the Eurasian and in the African populations. One is that the difference results from random processes. This implies that various Eurasian regions were initially colonized by small populations and, consequently, gene drift played a crucial role in fixing mtDNA variants of particular macrogroups. On the other hand, genetic differences between regional Eurasian populations might result from selection, which accompanied the adaptation of populations to new climatic and geographical conditions. Evidence has already been obtained for the possible effect of selection on mito-chondrial gene pools of various regional populations [6-11]. Thus an analysis of full-length mtDNA sequences has revealed differences in the variation of mitochondrial genes for populations of the tropical, temperate, and Arctic zones [10]. This shows that climatic factors might contribute to the structural variation of the mitochondrial gene pool among regional populations of the world. Yet it is still unclear how selection affects the variation of the major noncoding mtDNA region, which harbors all structural and functional elements responsible for the initiation and regulation of transcription and replication of the mito-chondrial genome. With the existing phylogeographical structure of the mitochondrial gene pool, it is necessary first to study the mutation spectra of particular fragments of the major noncoding mtDNA region in various regional populations. Sequence variation has been most extensively studied for hypervariable segment 1 (HVS1), which is at the 5' end of the major noncoding region. The objective of this work was to analyze the distribution of HVS1 nucleotide substitutions in mtDNA types of the phylogeographical groups characteristic of three regions: Africa, Western Eurasia, and Eastern Eurasia.
EXPERIMENTAL
Variable positions of mtDNA were identified using a phylogenetic approach, which reveals the unstable nucleotide positions where identical mutations arose repeatedly and independently in different mtDNA clusters [12]. Data were used on the variation of HVS1 sequences in mtDNA groups identified by the polymorphism of coding mitochondrial genome regions, which have a far lower mutation rate than noncoding regions. A database included 7482 variants of the HVS1 sequence (region 16,092-16,365), which represented 88 mtDNA groups. Of these, 28 (H, HV*, pre-V, pre-HV, R*, T1, T*, J*, J1a, J1b, J2, K, U*, U1, U2, U3, U4, U5, U7, U8a, U8b, N1a, N1b, N1c, N*, I, W, and X) are common in the West Eurasian population (total 3834 HVS1 sequences [13]); 34 (C, Z, M8a, D4 (including D*), D5, G2, G3, G4, E, M*, M7*, M7b, M7c, M9, M10, A, N9a, N2, N*, Y, R9a, R*, F*, F1a, F1b, F1c, F2, B*, B4*, B4a, B4b, B5*, B5a, and B5b), in the East Eurasian populations (total 801 sequences [14-18]); and 26 (L1a1, L1a2, L1b, L1c*, L1c1, L1c2, L1c3, L1d, L1e, L2a*, L2a1a, L2a1b, L2b, L2c, L2d1, L2d2, L3b1, L3b2 (including L3b*), L3d, L3e1, L3e2, L3e3, L3e4, L3f*, L3f1, and L3g), in African populations (total 2847 sequences [5]).
The analysis of variation and the search for variable positions in mtDNA HVS1 included several steps. First, median networks were used to cluster the HVS1 nucle-otide sequences into groups of related mtDNA types on evidence of the distribution of group-specific variants of coding regions. Second, variable HVS1 positions were identified for each group of mtDNA types. Third, the portion of variable positions in HVS1 was estimated for each group of mtDNA types. Mutations were identified against the Cambridge human mitochondrial DNA sequence (L strand) [1], which is commonly used as a reference in mitochondrial genetics. Only direct mutations were considered (with the only exception of position 16,223). This was because individual mtDNA subgroups must be analyzed to identify reverse mutations and to diagnose their independent generation in different mtDNA lines; such analysis is feasible only for a few mtDNA groups with well-characterized substructures [5]. Both direct and reverse mutations were considered in the case of position 16,223, since the Cambridge sequence, which belongs to cluster R, is known to differ from evolutionarily younger (N-, M-, and L3-root) mtDNA sequences in having 16,223C. Hence transition T —► C, which led to the origins of cluster R and of several mtDNA types in macrogroups M and L, was considered to be a direct mutation, while transition C —► T, which was detectable in various mtDNA groups of cluster R (H, J*, K, pre-HV, R*, U2, U3, U4, U5, B4), was considered to be a reverse mutation.
The homoplastic direct mutation rate was computed as a ratio between the number of independent identical mutations having arisen in different phyloge-netic clusters and the number of clusters examined. The significance of the differences in the mutation rate for HVS1 mutation spectra was estimated with the t test, using the STATISTICA/w 5.0 program. Statistical analysis of the model of dislocation mutagenesis was performed with the CONSEN program [19, 20]. The model is based on the assumption that DNA strand displacement arising at repetitive sequences during replication quickly leads to sequence alignment, with nonpaired bases generated in one strand at the site of displacement [21].
RESULTS AND DISCUSSION
A large body of data on the variation of mtDNA HVS1 in human populations has been accumulated by now. For instance, the mtradius database harbors more than 17,000 individual nucleotide sequences [17]. Yet published data demonstrate that there are far fewer HVS1 sequences having a phylogenetic status confirmed by additional analysis of the polymorphism of coding regions of the mitochondrial genome. Previously a database has been constructed of 7482 HVS1 nucleotide sequences (region 16,092-16,365) that belong to 88 phylogenetic mtDNA groups that are widespread in the African and Eurasian populations. West Eurasian samples represent various regional populations of Europe and Western Asia; East Eurasian samples mostly represent the North Asian (Siberian and Central Asian) population, whose gene pool has been studied in the most detail; and African samples represent various regional populations of Africa.
The gene pools of West Eurasian populations are characterized by mtDNA types belonging to mono-phyletic clusters HV, TJ, and U of macrogroup R or clusters N1, W, and X of macrogroup N [13]. These macrogroups were respectively designated as R(WEA) and N(WEA). In the gene pools of East Eurasian populations, mtDNA types of macrogroup M are most common and those of macrogroups R and N occur at relatively high frequencies. However, the R and N lines of the East Eurasian gene pool belong to R(EEA) clusters B and R9 and N(EEA) clusters A and N9, which are unusual for the European and West Asian populations [16, 17]. The gene pools of African populations mostly consist of mtDNA groups belonging to macrogroup L; groups U6 and M1 occur only in populations of northeastern Africa [5]. Yet these groups were excluded from the analysis because of their Eurasian origin [4, 23].
The HVS1 mutation spectrum was reconstructed on the basis of the L-strand of the Cambridge mtDNA sequence and analyzed in the above three regional populations. Of the 274 positions examined, 202 proved to be variable, affected by 2212 mutations. Most mutations were transitions: the transition-to-transversion ratio was 14:1. The ratio slightly varies with region, being 13.5:1 in West Eurasia, 14:1 in Africa, and 15.8:1 in East Eurasia. An analysis of the distribution of transitions among the individual phylo-geographical groups of mtDNA demonstrated that pyrimidine transitions are most common (Table 1).
Compared with purine transitions, pyrimidine transitions are 3.4 times more frequent in the African and West Eurasian populations and 2.9 times more frequent in the East Eurasian population. Another informative index is the ratio between the number of variable positions and the number of mutations occurring in these positions. This ratio reflects the mutation pressure on particular nucleotides. High ratios were obtained for all nucleotides but adenine. In addition, the mutation pressure on cytosine in the East Eurasian group was lower than in the other regional groups (2.9 vs. 6.0). It should be noted that the variability of individual nucleotides (as estimated from the number of polymorphic positions) correlates significantly with their content in HVS1 (r = 0.99): the higher the content of a particular nucleotide, the higher its variability. However, the distribution of mutation pressure was not associated with the nucleotide composition of HVS1. Thus adenines, which, along with cytosines, are most common in the L-strand of mtDNA HVS1 (34.7 and 35%, respectively), are least affected by mutation pressure compared to the other nucleotides (Table 1). In contrast, guanines occur at a low (9.1%) frequency and experience a high mutation pressure (at least five independent mutations per polymorphic G).
Comparison of the mutation spectrum of mtDNA HVS1 for the three regional human populations showed that some nucleotide positions are variable in all three populations.
Table 2 shows 18 such positions. Each of these is a hot point in at least one phylogeo-graphical group of mtDNA sequences; i.e., it is characterized by more than ten mutations having arisen independently in different mtDNA groups [24].
Among the 18 positions, the hottest points are 16,093, 16,129, 16,189, 16,311, and 16,362. For each of these positions, more than ten identical mutations were found in each regional mutation spectrum, that is, in 28 mtDNA groups of the West Eurasian population, in 34 mtDNA groups of the East Eurasian population, and in 26 mtDNA groups of the African population. Importantly, the three mutation spectra differ significantly (P < 0.05) in the mutation rate at certain positions (Table 3).
In Table 3, horizontal lines show the nucleotide positions with mutations significantly more frequent in one of the three regional groups. The
highest number of such positions was observed in the West Eurasian mutation spectrum. The data of Table 3 demonstrate that, in the West Eurasian gene pool, mutations in 35 positions arose at a higher frequency than in the East Eurasian gene pool and mutations at 21 positions were more frequent than in the African gene pool. The African gene pool significantly differed in the mutation rate at 3 and 20 positions from the West and East Eurasian gene pools, respectively. The East Eurasian mutation spectrum contains only one position (16,319) in which mutations arose more frequently than in the African gene pool. The diagonal shows the nucleotide positions that are variable only in one regional group. Consequently, mutations in these positions are unique. Their numbers were 15, 7, and 2 in the West Eurasian, African, and East Eurasian spectra, respectively. Such unique, region-specific mutations arising independently in at least two mtDNA clusters were found in each of the three HVS1 mutation spectra. It is noteworthy that transversions account for a considerable portion of total unique region-specific mutations. Their frequency reached 70% in the African spectrum, where five of the seven unique mutations were transversions.
Thus, the results clearly demonstrate that the mtDNA HVS1 mutation spectra of the three regional human populations significantly differ in mutation rate at certain nucleotide positions. Moreover, some nucleotide positions became unstable only in particular phylogeographical (regional) mtDNA groups in the course of individual evolution of the mitochon-drial gene pools, whose divergence started several tens of thousands of years ago. The causes of this are still obscure. As mentioned above, phylogeographical mtDNA groups of the tropical, temperate, and Arctic zones have earlier been reported to differ in distribution of nonsynonymous mutations of mitochondrial genes [10]. The difference has been explained by selection associated with adaptation of the corresponding populations. Both findings, however, require a better understanding of the molecular mechanisms generating mutation hot points in the mitochondrial genome. highest number of such positions was observed in the West Eurasian mutation spectrum. The data of Table 3 demonstrate that, in the West Eurasian gene pool, mutations in 35 positions arose at a higher frequency than in the East Eurasian gene pool and mutations at 21 positions were more frequent than in the African gene pool. The African gene pool significantly differed in the mutation rate at 3 and 20 positions from the West and East Eurasian gene pools, respectively. The East Eurasian mutation spectrum contains only one position (16,319) in which mutations arose more frequently than in the African gene pool. The diagonal shows the nucleotide positions that are variable only in one regional group. Consequently, mutations in these positions are unique. Their numbers were 15, 7, and 2 in the West Eurasian, African, and East Eurasian spectra, respectively. Such unique, region-specific mutations arising independently in at least two mtDNA clusters were found in each of the three HVS1 mutation spectra. It is noteworthy that transversions account for a considerable portion of total unique region-specific mutations. Their frequency reached 70% in the African spectrum, where five of the seven unique mutations were transversions.
As previous studies have revealed, the mechanisms of mutations in the major noncoding mtDNA region depends to a great extent on the DNA context [24, 25]. One of the most important mechanisms is strand displacement (dislocation) at mononucleotide repeats or regions of secondary structures (hairpins, loops) in the course of DNA replication. The model of dislocation mutagenesis explains the generation of 20% hot points in HVS1 [24].
An analysis of the HVS1 mutation spectrum showed that 23.4% (517 of 2212) of the mutations arising in 34.7% (70 of 202) of the variable positions may be explained in terms of dislocation mutagenesis. All HVS1 positions prone to dislocation mutagenesis are listed in Table 4. In the case of mutations arising in the vicinity of position 16,223, context variants were analyzed both with T and with C located in this position. For instance, the root sequences of groups Z and K are respectively 16,185-16,223-16,22416,260-16,298 and 16,224-16,311 (transitions relative to the Cambridge mtDNA sequence). The mutation in position 16,224 arose twice and in different nucleotide contexts, since position 16,223 is occupied by T in group Z and by C in group K. In both cases, dislocation mutagenesis is the most probable mechanism of transition in position 16,224, but its mechanisms differ. In the case of group K, transition (16,224) arose in the context of R-root sequence and changed CCCTCAA to CCCCCAA (nucleotide 16,224 is boldfaced, the dislocation site is underlined). In the case of group Z, transition (16,224) arose in the context of M-root sequence and changes CCTTCAA to CCTCCAA.
In the list of dislocation mutations (Table 4), those observed in the regional mutation spectra (Table 3) are shown in boldface. It should be noted that the portion of positions prone to dislocation mutations was 34.5% (19 of 55) in the West Eurasian mutation spectrum and 16.7% (4 of 24) in the African mutation spectrum (dislocation mutations corresponding to particular regional spectra are indicated in Table 4). The only position (16,319) with a mutation rate significantly higher in the East Eurasian than in the African mtDNA groups is also prone to dislocation mutagenesis. Thus, the model of strand displacement during replication of the mitochondrial genome explains the mechanism of some mtDNA mutations found in regional spectra. It is clear, however, that a variety of mechanisms are responsible for mutations arising in the mitochondrial genome. This important problem deserves further investigation.
ACKNOWLEDGMENTS
I am grateful to I.B. Rogozin (Institute of Cytology and Genetics, Siberian Division, Russian Academy of Sciences) and M.V. Derenko (Institute of the Biological Problems of the North) for their help in the research. This work was supported by the Far East Division of the Russian Academy of Sciences (project
nos. 03-3-A-06-096, 04-3-A-06-039).
REFERENCES
1.
Anderson S., Bankier A.T., Barrel B.G., et
al. 1981. Sequence
and organization of the human mitochondrial genome. Nature. 290, 457-465.
2.
Wallace D.C. 1995. Mitochondrial DNA variation in human evolution, degenerative disease
and aging. Am. J.
Hum. Genet. 57, 201-223.
3.
Watson E., Forster P., Richards M., Bandelt H.-J. 1997. Mitochondrial footprints of human expansions in Africa. Am.
J. Hum. Genet. 61, 691-704.
4.
Quintana-Murci L., Semino O., Bandelt H.-J., et
al. 1999. Genetic evidence for an early exit of Homo
sapiens sapiens from Africa through eastern Africa. Nature
Genet. 23, 437-441.
5.
Salas A., Richards M., De la Fe T., et
al. 2002. The
making of the African mtDNA landscape. Am.
J. Hum. Genet. 71, 1082-1111.
6.
Excoffier L. 1990. Evolution of human mitochondrial DNA: Evidence for departure from a
pure neutral model of populations at equilibrium. J.
Mol. Evol. 30, 125-139.
7.
Malyarchuk B.A., Derenko M.V. 1995. Polymorphism of the mtDNA V region in the indigenous and adventive populations of northeastern Asia. Genetika.
31, 1308-1313.
8.
Malyarchuk B.A., Solovenchuk L.L. 1997. Negative correlation between
the degrees of diversity of the nuclear and mitochondrial genomes in the Arctic
Mongoloid populations of northeastern Asia. Genetika.
33,
532-538.
9.
Torroni A., Rengo C., Guida V., et al. 2001. Do
the four clades of the mtDNA haplogroup L2 evolve at different rates? Am.
J. Hum. Genet. 69, 1348-1356.
10.
Mishmar D., Ruiz-Pesini E., Golik P., et
al. 2003. Natural selection shaped regional mtDNA variation in humans. Proc.
Natl. Acad. Sci. USA. 100, 171-176.
11.
Moilanen J.S., Majamaa K. 2003. Phylogenetic network and
physicochemical properties of nonsynonymous mutations in the protein-coding
genes of human mito-chondrial DNA. Mol.
Biol. Evol. 20, 1195-1210.
12.
Malyarchuk B.A., Derenko M.V 2001. Variation of human mitochondrial DNA: Distribution
of hot points in the hypervariable segment 1 of the major noncoding region. Genetika. 37, 991-1001.
9.
Richards M., Macaulay V.,
Hickey E., et al. 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am. J. Hum. Genet. 67, 1251-1276.