Motivation: In the maximum parsimony (MP) method, the tree requiring the minimum number of changes (discrepancy) to explain the given set of DNA or amino acid sequences is chosen to represent their evolutionary relationships. To find the MP tree, the branch-and-bound algorithm is normally used. For a partial phylogenetic-tree (one that has a subset of the organisms) the traditional algorithm assigns a cost equal to the discrepancy of the partial phylogenetic-tree. We propose a single column discrepancy heuristic which increases this cost by predicting a minimum additional discrepancy needed to attach the sequences yet to be added to the partial phylogenetic-tree. A dynamic Max-mini order of sequence addition is also proposed to quickly terminate branch-and-bound search paths that are guaranteed to lead to suboptimal solutions. Results: We studied the running time of 47 problems generated from 17 data sets. The use of single column discrepancy heuristic speeded up the computation to 2.4-fold for static and 18.2-fold for dynamic search order The improvement appeared to increase exponentially with the number of sequences. The proposed strategies are also likely to be useful in speeding up the MP tree search using heuristic searches that are based on banch-and-bound-like algorithms. Contact email@example.com.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics