### Abstract

Background: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. Results: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. Conclusion: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.

Original language | English (US) |
---|---|

Article number | 527 |

Journal | BMC Bioinformatics |

Volume | 7 |

DOIs | |

State | Published - 2006 |

Externally published | Yes |

### Fingerprint

### ASJC Scopus subject areas

- Medicine(all)
- Structural Biology
- Applied Mathematics

### Cite this

**Logarithmic gap costs decrease alignment accuracy.** / Cartwright, Reed.

Research output: Contribution to journal › Article

}

TY - JOUR

T1 - Logarithmic gap costs decrease alignment accuracy

AU - Cartwright, Reed

PY - 2006

Y1 - 2006

N2 - Background: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. Results: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. Conclusion: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.

AB - Background: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. Results: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. Conclusion: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.

UR - http://www.scopus.com/inward/record.url?scp=33846602331&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846602331&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-7-527

DO - 10.1186/1471-2105-7-527

M3 - Article

VL - 7

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 527

ER -