A common assumption in comparative sequence analysis is that the sequences have evolved with the same pattern of nucleotide substitution (homogeneity of the evolutionary process). Violation of this assumption is known to adversely impact the accuracy of phylogenetic inference and tests of evolutionary hypotheses. Here we propose a disparity index, ID, which measures the observed difference in evolutionary patterns for a pair of sequences. On the basis of this index, we have developed a Monte Carlo procedure to test the homogeneity of the observed patterns. This test does not require a priori knowledge of the pattern of substitutions, extent of rate heterogeneity among sites, or the evolutionary relationship among sequences. Computer simulations show that the ID-test is more powerful than the commonly used X2-test under a variety of biologically realistic models of sequence evolution. An application of this test in an analysis of 3789 pairs of orthologous human and mouse protein-coding genes reveals that the observed evolutionary, patterns in neutral sites are not homogeneous in 41% of the genes, apparently due to shifts in G + C content. Thus, the proposed test can be used as a diagnostic tool to identify genes and lineages that have evolved with substantially different evolutionary processes as reflected in the observed patterns of change. Identification of such genes and lineages is an important early step in comparative genomics and molecular phylogenetic studies to discover evolutionary processes that have shaped organismal genomes.
|Original language||English (US)|
|Number of pages||7|
|State||Published - Aug 1 2001|
ASJC Scopus subject areas