Probabilistic modeling of systematic errors in two-hybrid experiments

David Sontag, Rohit Singh, Bonnie Berger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a novel probabilistic approach to estimating errors in two-hybrid (2H) experiments. Such experiments are frequently used to elucidate protein-protein interaction networks in a high-throughput fashion; however, a significant challenge with these is their relatively high error rate, specifically, a high false-positive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be activated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins-either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the probability of an observed 2H interaction being true as well as the probability of individual proteins being self-activating/promiscuous. This is the first approach that explicitly models systematic errors in protein-protein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we find that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating confidence in 2H predicted interactions, the proposed method performed 5-10% better overall, and in particular regimes improved prediction accuracy by as much as 76%.

Original languageEnglish (US)
Title of host publicationPacific Symposium on Biocomputing 2007, PSB 2007
Pages445-457
Number of pages13
StatePublished - 2007
EventPacific Symposium on Biocomputing, PSB 2007 - Maui, HI, United States
Duration: Jan 3 2007Jan 7 2007

Other

OtherPacific Symposium on Biocomputing, PSB 2007
CountryUnited States
CityMaui, HI
Period1/3/071/7/07

Fingerprint

Systematic errors
Proteins
Experiments
Protein Interaction Maps
Markov Chains
Statistical Models
Random errors
Noise
Markov processes
Throughput

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering
  • Medicine(all)

Cite this

Sontag, D., Singh, R., & Berger, B. (2007). Probabilistic modeling of systematic errors in two-hybrid experiments. In Pacific Symposium on Biocomputing 2007, PSB 2007 (pp. 445-457)

Probabilistic modeling of systematic errors in two-hybrid experiments. / Sontag, David; Singh, Rohit; Berger, Bonnie.

Pacific Symposium on Biocomputing 2007, PSB 2007. 2007. p. 445-457.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sontag, D, Singh, R & Berger, B 2007, Probabilistic modeling of systematic errors in two-hybrid experiments. in Pacific Symposium on Biocomputing 2007, PSB 2007. pp. 445-457, Pacific Symposium on Biocomputing, PSB 2007, Maui, HI, United States, 1/3/07.
Sontag D, Singh R, Berger B. Probabilistic modeling of systematic errors in two-hybrid experiments. In Pacific Symposium on Biocomputing 2007, PSB 2007. 2007. p. 445-457
Sontag, David ; Singh, Rohit ; Berger, Bonnie. / Probabilistic modeling of systematic errors in two-hybrid experiments. Pacific Symposium on Biocomputing 2007, PSB 2007. 2007. pp. 445-457
@inproceedings{01a720335ffb4b46b6898a073ca471a0,
title = "Probabilistic modeling of systematic errors in two-hybrid experiments",
abstract = "We describe a novel probabilistic approach to estimating errors in two-hybrid (2H) experiments. Such experiments are frequently used to elucidate protein-protein interaction networks in a high-throughput fashion; however, a significant challenge with these is their relatively high error rate, specifically, a high false-positive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be activated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins-either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the probability of an observed 2H interaction being true as well as the probability of individual proteins being self-activating/promiscuous. This is the first approach that explicitly models systematic errors in protein-protein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we find that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating confidence in 2H predicted interactions, the proposed method performed 5-10{\%} better overall, and in particular regimes improved prediction accuracy by as much as 76{\%}.",
author = "David Sontag and Rohit Singh and Bonnie Berger",
year = "2007",
language = "English (US)",
isbn = "9812704175",
pages = "445--457",
booktitle = "Pacific Symposium on Biocomputing 2007, PSB 2007",

}

TY - GEN

T1 - Probabilistic modeling of systematic errors in two-hybrid experiments

AU - Sontag, David

AU - Singh, Rohit

AU - Berger, Bonnie

PY - 2007

Y1 - 2007

N2 - We describe a novel probabilistic approach to estimating errors in two-hybrid (2H) experiments. Such experiments are frequently used to elucidate protein-protein interaction networks in a high-throughput fashion; however, a significant challenge with these is their relatively high error rate, specifically, a high false-positive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be activated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins-either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the probability of an observed 2H interaction being true as well as the probability of individual proteins being self-activating/promiscuous. This is the first approach that explicitly models systematic errors in protein-protein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we find that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating confidence in 2H predicted interactions, the proposed method performed 5-10% better overall, and in particular regimes improved prediction accuracy by as much as 76%.

AB - We describe a novel probabilistic approach to estimating errors in two-hybrid (2H) experiments. Such experiments are frequently used to elucidate protein-protein interaction networks in a high-throughput fashion; however, a significant challenge with these is their relatively high error rate, specifically, a high false-positive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be activated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins-either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the probability of an observed 2H interaction being true as well as the probability of individual proteins being self-activating/promiscuous. This is the first approach that explicitly models systematic errors in protein-protein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we find that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating confidence in 2H predicted interactions, the proposed method performed 5-10% better overall, and in particular regimes improved prediction accuracy by as much as 76%.

UR - http://www.scopus.com/inward/record.url?scp=38449103843&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38449103843&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 17990509

AN - SCOPUS:38449103843

SN - 9812704175

SN - 9789812704177

SP - 445

EP - 457

BT - Pacific Symposium on Biocomputing 2007, PSB 2007

ER -