An efficient lossless DNA compression algorithm, DNA-Residual, that significantly decreases the average bit-rate required to losslessly code correlated DNA sequences. The algorithm can be divided into two parts: modeling and coding. The modeling part consists of mapping the DNA bases into a binary representation and, then, a forward linear prediction filter is used to predict the current input from the previous ones. The prediction error is then transformed into a binary error sequence that is coded using an adaptive binary arithmetic coder. Compared to state-of-the-art compressors using benchmark DNA sequences, the proposed algorithm reveals a significantly higher compression ratio whenever correlation between bases is high.
|Original language||English (US)|
|State||Published - May 8 2006|