White Papers

Ready Solutions Engineering Test Results
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
the property of their respective owners. Published in the USA. Dell EMC believes the information in this document is accurate as of its publication date. The information is
subject to change without notice.
1
De Novo Assembly with SPAdes assembler
Overview
We published the whitepaper, “Dell EMC PowerEdge R940 makes De Novo Aseembly easier”, last year to study the behavior of
SOAPdenovo2 [1]. However, the whitepaper is limited to one De Novo assembly application. Hence, we want to expand our application
coverage little further. We decided to test SPAdes (2012) since it is a relatively new application and reported for some improvement on
the Euler-Velvet-SC assembler (2011) and SOAPdenovo
i
. SPAdes is also based on de Bruijn graph algorithm like most of the
assemblers targeting Next Generation Sequencing (NGS) data. De Bruijin graph-based assemblers would be more appropriate for
larger datasets having more than a hundred-millions of short reads.
As shown in Figure 1, Greedy-Extension and overlap-layout-consensus (OLC) approaches were used in the very early next gen
assemblers [2]. Greedy-Extension’s heuristic is that the highest scoring alignment takes on another read with the highest score.
However, this approach is vulnerable to imperfect overlaps and multiple matches among the reads and leads to an incomplete
assembly or an arrested assembly. OLC approach works better for long reads such as Sanger or other technology generating more
than 100bp due to minimum overlap threshold (454, Ion Torrent, PacBio, and so on). De Bruijin graph-based assemblers are more
suitable for short read sequencing technologies such as Illumina. The approach breaks the sequencing reads into successive k-mers,
and the graph maps the k-mers. Each k-mer forms a node, and edges are drawn between each k-mer in a read.
SPAdes is a relatively recent application based on de Bruijn graph for both single-cell and multicell data. It improves on the recently
released Euler Velvet Single Cell (E +V- SC) assembler (specialized for single-cell data) and on popular assemblers Velvet and
SoapDeNovo (for multicell data).
Figure 1 Overview of de novo short reads assemblers.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3056720/

Summary of content (3 pages)