Document Server@UHasselt >
Research >
Research publications >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/16968

Title: Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model
Authors: De Beuf, Kristof
De Schrijver, Joachim
Thas, Olivier
Van Criekinge, Wim
Irizarry, Rafael A.
Clement, Lieven
Issue Date: 2012
Abstract: Background: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
Notes: [De Beuf, Kristof; De Schrijver, Joachim; Thas, Olivier; Van Criekinge, Wim] Univ Ghent, Dept Math Modelling Stat & Bioinformat, B-9000 Ghent, Belgium. [Thas, Olivier] Univ Wollongong, Sch Math & Appl Stat, Ctr Stat & Survey Methodol, Wollongong, NSW 2522, Australia. [Irizarry, Rafael A.] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD USA. [Clement, Lieven] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium. [Clement, Lieven] Katholieke Univ Leuven, Interuniv Inst Biostat & Stat Bioinformat, B-3000 Louvain, Belgium. [Clement, Lieven] Univ Hasselt, B-3000 Louvain, Belgium.
URI: http://hdl.handle.net/1942/16968
DOI: 10.1186/1471-2105-13-303
ISI #: 000312894900001
ISSN: 1471-2105
Category: A1
Type: Journal Contribution
Validation: ecoom, 2014
Appears in Collections: Research publications

Files in This Item:

Description SizeFormat
N/A487.58 kBAdobe PDF

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.