A Method and Software Significantly Improving the Accuracy Of Genome Assemblies: SEQuel
University of California System: University of California, San Diego
posted on 07/27/2012
Assemblies of next generation sequencing (NGS) data, while accurate, still contain a substantial number of errors that need to be corrected after the assembly process. Earlier assembly algorithms developed for Sanger sequencing follow an “overlap – layout – consensus” paradigm, where consensus refers to fixing errors in the contigs. Since this paradigm faces difficulties in short read assembly, most NGS assemblers employ a de Bruijn graph approach that effectively deals with large amounts of data. However, most NGS assemblers neglect the consensus step, i.e. , there exists no postprocessing of the contigs in Velvet and many other popular assemblers. Relying on high and uniform coverage, NGS assembly algorithms push the burden of producing high quality assemblies onto the construction of the de Bruijn graph. Our work demonstrates that NGS assemblers can benefit from the use of a consensus step. There are currently no tools that aim to accomplish this same goal.
Correcting errors in contigs from high throughput sequencing (HTS) assemblies. These might include bacterial/plant/vertebrate genomes that were not been previously sequenced, or the products of transcript assembly.
- Removed 35% to 96% of small-scale assembly errors.
- Introduced positional de Bruijn graph for contig refinement.
- Demonstrated utility in hard (single-cell) assembly.
- SEQuel can be used in combination with any NGS assembler.
UCSD researchers have recently developed a method and companion software, SEQuel, to correct errors (i.e., insertions, deletions, and substitution errors) in the assembled contigs of NGS data. Fundamental of SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. SEQuel takes as input an assembled contig, the paired-end reads that align to that contig and the approximate positions where they aligned, and returns a refined contig.
File Number: 22625
|Copyright:||©2012, The Regents of the University of California|
This innovation currently is not available for online licensing. Please contact University of California, San Diego Technology Transfer Office at University of California System: University of California, San Diego for more information.request more info
Find more innovations