Innovation

A Method and Software Significantly Improving the Accuracy Of Genome Assemblies: SEQuel

University of California System: University of California, San Diego
posted on 07/27/2012

Assemblies of next generation sequencing (NGS) data, while accurate, still contain a substantial number of errors that need to be corrected after the assembly process. Earlier assembly algorithms developed for Sanger sequencing follow an “overlap – layout – consensus” paradigm, where consensus refers to fixing errors in the contigs. Since this paradigm faces difficulties in short read assembly, most NGS assemblers employ a de Bruijn graph approach that effectively deals with large amounts of data. However, most NGS assemblers neglect the consensus step, i.e. , there exists no postprocessing of the contigs in Velvet and many other popular assemblers. Relying on high and uniform coverage, NGS assembly algorithms push the burden of producing high quality assemblies onto the construction of the de Bruijn graph. Our work demonstrates that NGS assemblers can benefit from the use of a consensus step. There are currently no tools that aim to accomplish this same goal.

Suggested Uses

Correcting errors in contigs from high throughput sequencing (HTS) assemblies. These might include bacterial/plant/vertebrate genomes that were not been previously sequenced, or the products of transcript assembly.

Advantages

  • Removed 35% to 96% of small-scale assembly errors.
  • Introduced positional de Bruijn graph for contig refinement.
  • Demonstrated utility in hard (single-cell) assembly.
  • SEQuel can be used in combination with any NGS assembler.

Innovation Details
 

Detailed Description

UCSD researchers have recently developed a method and companion software, SEQuel, to correct errors (i.e., insertions, deletions, and substitution errors) in the assembled contigs of NGS data. Fundamental of SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. SEQuel takes as input an assembled contig, the paired-end reads that align to that contig and the approximate positions where they aligned, and returns a refined contig.

File Number: 22625 


IP Protection

Copyright: ©2012, The Regents of the University of California

License Online

This innovation currently is not available for online licensing. Please contact University of California, San Diego Technology Transfer Office at University of California System: University of California, San Diego for more information.

Request more info via email request more info
People

Download Technology Brief (PDF)


Followed By

Follow this innovation



No one is following this innovation.

Organization
Communities
Profile
Related Tags

Find more innovations


February 11, 2009

12,599 members 18,843 innovations 176 organizations

Browse

David Kolb, CEO and chairman of the board, Emunamedica LLC.

"We found tremendous value and benefit in using the iBridge Network..."  read more...