De Novo DNA Assembly with a Genetic Algorithm Finds Accurate Genomes Even with Suboptimal Fitness

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)

Abstract

We design an evolutionary heuristic for the combinatorial problem of de-novo DNA assembly with short, overlapping, accurately sequenced single DNA reads of uniform length, from both strands of a genome without long repeated sequences. The representation of a candidate solution is a novel segmented permutation: an ordering of DNA reads into contigs, and of contigs into a DNA scaffold. Mutation and crossover operators work at the contig level. The fitness function minimizes the total length of scaffold (i.e., the sum of the length of the overlapped contigs) and the number of contigs on the scaffold. We evaluate the algorithm with read libraries uniformly sampled from genomes 3835 to 48502 base pairs long, with genome coverage between 5 and 7, and verify the biological accuracy of the scaffolds obtained by comparing them against reference genomes. We find the correct genome as a contig string on the DNA scaffold in over 95% of all assembly runs. For the smaller read sets, the scaffold obtained consists of only the correct contig; for the larger read libraries, the fitness of the solution is suboptimal, with chaff contigs present; however, a simple post-processing step can realign the chaff onto the correct genome. The results support the idea that this heuristic can be used for consensus building in de-novo assembly.
Original languageEnglish
Title of host publicationApplications of Evolutionary Computation
Subtitle of host publication20th European Conference, EvoApplications 2017, Amsterdam, The Netherlands, April 19-21, 2017, Proceedings, Part I
EditorsGiovanni Squillero, Kevin Sim
Place of PublicationCham
PublisherSpringer
Pages67-82
Number of pages16
ISBN (Electronic)978-3-319-55849-3
ISBN (Print)978-3-319-55848-6
DOIs
Publication statusPublished - Apr 2017
Event20th European Conference on the Applications of Evolutionary Computation (EvoApplications 2017) - De Bazel, Amsterdam, Netherlands
Duration: 19 Apr 201721 Apr 2017
Conference number: 20
http://www.evostar.org/2017/cfp_evoapps.php

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume10199
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th European Conference on the Applications of Evolutionary Computation (EvoApplications 2017)
Abbreviated titleEvoApplications 2017
CountryNetherlands
CityAmsterdam
Period19/04/1721/04/17
Internet address

Fingerprint Dive into the research topics of 'De Novo DNA Assembly with a Genetic Algorithm Finds Accurate Genomes Even with Suboptimal Fitness'. Together they form a unique fingerprint.

Cite this