A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)

Abstract

A genetic algorithm with stochastic macro mutation operators which merge, split, move, reverse and align DNA contigs on a scaffold is shown to accurately and consistently assemble raw DNA reads from an accurately sequenced single-read library into a contiguous genome. A candidate solution is a permutation of DNA reads, segmented into contigs. An interleaved merge operator for contigs allows for the quick minimization of a fitness function measuring the string length of a candidate solution. This study assembles read libraries for three genomic fragments from different organisms, five complete virus genomes, and one complete bacterial genome, with the largest genome length of 159  kbp. To evaluate the accuracy of any assembled genome, test libraries of DNA reads are generated from reference genomes, and the assembly is compared to the reference. The method has very high assembly accuracy: over repeated assemblies for each input genome, the original genome was constructed optimally in over 85% of the runs. Given the consistency of the algorithm, the method is suitable to determine the consensus genome in de-novo assembly problems. There are two limitations to the method: genomes with long repeats may be overcompressed, and the computational complexity is high.
Original languageEnglish
Pages (from-to)184-199
JournalInformation sciences
Volume420
DOIs
Publication statusPublished - 2017

Keywords

  • De novo, DNA, Assembly, Genetic algorithm, Consensus genome

Fingerprint Dive into the research topics of 'A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus'. Together they form a unique fingerprint.

  • Cite this