Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees

P. W. Purdom, P. G. Bradford, K. Tamura, S. Kumar

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Motivation: In the maximum parsimony (MP) method, the tree requiring the minimum number of changes (discrepancy) to explain the given set of DNA or amino acid sequences is chosen to represent their evolutionary relationships. To find the MP tree, the branch-and-bound algorithm is normally used. For a partial phylogenetic-tree (one that has a subset of the organisms) the traditional algorithm assigns a cost equal to the discrepancy of the partial phylogenetic-tree. We propose a single column discrepancy heuristic which increases this cost by predicting a minimum additional discrepancy needed to attach the sequences yet to be added to the partial phylogenetic-tree. A dynamic Max-mini order of sequence addition is also proposed to quickly terminate branch-and-bound search paths that are guaranteed to lead to suboptimal solutions. Results: We studied the running time of 47 problems generated from 17 data sets. The use of single column discrepancy heuristic speeded up the computation to 2.4-fold for static and 18.2-fold for dynamic search order The improvement appeared to increase exponentially with the number of sequences. The proposed strategies are also likely to be useful in speeding up the MP tree search using heuristic searches that are based on banch-and-bound-like algorithms. Contact s.kumar@asu.edu.

Original languageEnglish (US)
Pages (from-to)140-151
Number of pages12
JournalBioinformatics
Volume16
Issue number2
StatePublished - Feb 2000

Fingerprint

Minimax Optimization
Evolutionary Tree
Discrepancy
Maximum Parsimony
Phylogenetic Tree
Partial
Set theory
Fold
Amino acids
Costs
Heuristics
DNA
Heuristic Search
Search Trees
Branch and Bound Algorithm
Amino Acids
Costs and Cost Analysis
Branch-and-bound
Terminate
Amino Acid Sequence

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees. / Purdom, P. W.; Bradford, P. G.; Tamura, K.; Kumar, S.

In: Bioinformatics, Vol. 16, No. 2, 02.2000, p. 140-151.

Research output: Contribution to journalArticle

Purdom, P. W. ; Bradford, P. G. ; Tamura, K. ; Kumar, S. / Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees. In: Bioinformatics. 2000 ; Vol. 16, No. 2. pp. 140-151.
@article{fd2fec88869a4c5292a21db3a4b452b1,
title = "Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees",
abstract = "Motivation: In the maximum parsimony (MP) method, the tree requiring the minimum number of changes (discrepancy) to explain the given set of DNA or amino acid sequences is chosen to represent their evolutionary relationships. To find the MP tree, the branch-and-bound algorithm is normally used. For a partial phylogenetic-tree (one that has a subset of the organisms) the traditional algorithm assigns a cost equal to the discrepancy of the partial phylogenetic-tree. We propose a single column discrepancy heuristic which increases this cost by predicting a minimum additional discrepancy needed to attach the sequences yet to be added to the partial phylogenetic-tree. A dynamic Max-mini order of sequence addition is also proposed to quickly terminate branch-and-bound search paths that are guaranteed to lead to suboptimal solutions. Results: We studied the running time of 47 problems generated from 17 data sets. The use of single column discrepancy heuristic speeded up the computation to 2.4-fold for static and 18.2-fold for dynamic search order The improvement appeared to increase exponentially with the number of sequences. The proposed strategies are also likely to be useful in speeding up the MP tree search using heuristic searches that are based on banch-and-bound-like algorithms. Contact s.kumar@asu.edu.",
author = "Purdom, {P. W.} and Bradford, {P. G.} and K. Tamura and S. Kumar",
year = "2000",
month = "2",
language = "English (US)",
volume = "16",
pages = "140--151",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Single column discrepancy and dynamic max-mini optimizations for quickly finding the most parsimonious evolutionary trees

AU - Purdom, P. W.

AU - Bradford, P. G.

AU - Tamura, K.

AU - Kumar, S.

PY - 2000/2

Y1 - 2000/2

N2 - Motivation: In the maximum parsimony (MP) method, the tree requiring the minimum number of changes (discrepancy) to explain the given set of DNA or amino acid sequences is chosen to represent their evolutionary relationships. To find the MP tree, the branch-and-bound algorithm is normally used. For a partial phylogenetic-tree (one that has a subset of the organisms) the traditional algorithm assigns a cost equal to the discrepancy of the partial phylogenetic-tree. We propose a single column discrepancy heuristic which increases this cost by predicting a minimum additional discrepancy needed to attach the sequences yet to be added to the partial phylogenetic-tree. A dynamic Max-mini order of sequence addition is also proposed to quickly terminate branch-and-bound search paths that are guaranteed to lead to suboptimal solutions. Results: We studied the running time of 47 problems generated from 17 data sets. The use of single column discrepancy heuristic speeded up the computation to 2.4-fold for static and 18.2-fold for dynamic search order The improvement appeared to increase exponentially with the number of sequences. The proposed strategies are also likely to be useful in speeding up the MP tree search using heuristic searches that are based on banch-and-bound-like algorithms. Contact s.kumar@asu.edu.

AB - Motivation: In the maximum parsimony (MP) method, the tree requiring the minimum number of changes (discrepancy) to explain the given set of DNA or amino acid sequences is chosen to represent their evolutionary relationships. To find the MP tree, the branch-and-bound algorithm is normally used. For a partial phylogenetic-tree (one that has a subset of the organisms) the traditional algorithm assigns a cost equal to the discrepancy of the partial phylogenetic-tree. We propose a single column discrepancy heuristic which increases this cost by predicting a minimum additional discrepancy needed to attach the sequences yet to be added to the partial phylogenetic-tree. A dynamic Max-mini order of sequence addition is also proposed to quickly terminate branch-and-bound search paths that are guaranteed to lead to suboptimal solutions. Results: We studied the running time of 47 problems generated from 17 data sets. The use of single column discrepancy heuristic speeded up the computation to 2.4-fold for static and 18.2-fold for dynamic search order The improvement appeared to increase exponentially with the number of sequences. The proposed strategies are also likely to be useful in speeding up the MP tree search using heuristic searches that are based on banch-and-bound-like algorithms. Contact s.kumar@asu.edu.

UR - http://www.scopus.com/inward/record.url?scp=0342424769&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0342424769&partnerID=8YFLogxK

M3 - Article

C2 - 10842736

AN - SCOPUS:0342424769

VL - 16

SP - 140

EP - 151

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 2

ER -