PolyBayes is a computer program for the automated analysis of single-nucleotide polymorphism (SNP) discovery in redundant DNA sequences. The primary motivation for its development is to provide a general and reliable tool for the discovery of genetic vari

Suggested Uses:

Automated analysis of single-nucleotide polymorphism (SNP) discovery in redundant DNA sequences.

 

ITEMS TO LICENSE

End-User License Agreement for Polybayes Software

Polybayes software license terms are $8000 upfront and $8000 annual maintenance fee.

[Edit] [Delete] [Test]
$0.00

Purchase

End User License Agreement for academic/non-profit entities view license

[Edit] [Delete] [Test]
$0.00

Purchase

ADDITIONAL INFORMATION

File Number:

CK0006 

Detailed Description:

PolyBayes is a computer program for the automated analysis of single-nucleotide polymorphism (SNP) discovery in redundant DNA sequences. The primary motivation for its development is to provide a general and reliable tool for the discovery of genetic variations in what is an exponentially increasing volume of sequence data in public and private databases. The software integrates algorithmic solutions to three of the main challenges in sequence-based SNP discovery:

1. Multiple sequence alignment. We have developed an anchored approach enables computationally efficient creation of multiple sequence alignments provided that a reliable anchor sequence (e.g. genomic reference sequence) is available.

2. Paralog identification. We utilize a probabilistic discrimination algorithm to identify likely sequence paralogs (highly similar duplicated sequences from disparate genomic origins). If unidentified, sequence differences between paralogous sequences can lead to false SNP predictions, hence it adventageous to remove them from the analysis as early as possible.

3. SNP detection. We have derived and implemented a novel, fully probabilistic SNP detection algorithm that calculates the probability (SNP score) that discrepancies at a given location of a multiple aligment represent true sequence variations as opposed to sequencing errors. The calculation is based on a rigorous, Bayesian-statistical formulation that takes into account the alignment depth , the base calls in each of the sequences, the associated base quality values (such as generated by the Phred trace analysis program or the Phrap fragment assembler), the base composition in the region, and the expected a priori polymorphism rate . By accounting for the base quality values, it is possible to mine all available data in a statistically rigorous manner, without restrictions on data quality or a need for heuristic considerations.

As its main output, the PolyBayes program produces a list of candidate polymorphic sites, each site with an associated SNP probability score that has been demonstrated to accurately forecast the true positive rate in subsequent validation experiments. A selectable score threshold allows the user to strike a balance between highly accurate predictions and the recovery of additional, rare polymorphisms, or SNPs in low quality sequences.

Web site:

ABOUT THIS INNOVATION

Organization:
Washington University in St. Louis
540

CASE MANAGER

Erin Brosnahan