An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis

Godfried Toussaint, P. M. Sharpe

Research output: Contribution to journalArticle

Abstract

The problem of estimating the performance of a given classifier on a given data set is discussed for the case when no knowledge is available concerning the underlying distributions. A new method of estimating the probability of misclassification is proposed which yields essentially unbiased results similar to Lachenbruch's U-method with far less computation involved. While no theoretical work is presented, a practical rule of thumb is given for choosing the parameters of the estimator. The results are based on experiments performed with a data set concerning six diseases related to epigastric pain, and underline the importance of reporting performance on both the testing data and the training data. Whereas previous papers have continually reported results with a probability of correct classification as high as 74.3 per cent on the raw data and 92.0 per cent on "processed" data, in this paper it is shown that a much more significant estimate of the probability of correct classification based on this data set is 51.0 per cent.

Original languageEnglish (US)
Pages (from-to)269-278
Number of pages10
JournalComputers in Biology and Medicine
Volume4
Issue number3-4
DOIs
StatePublished - Jan 1 1975

Fingerprint

Classifiers
Pain
Testing
Datasets
Experiments

Keywords

  • Classification
  • Epigastric pain
  • Feature size
  • Nearest Neighbour rule
  • Nonparametric
  • Pattern recognition
  • Probability of misclassification
  • Sample size
  • Symptom diagnosis
  • Testing sets
  • Training sets

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. / Toussaint, Godfried; Sharpe, P. M.

In: Computers in Biology and Medicine, Vol. 4, No. 3-4, 01.01.1975, p. 269-278.

Research output: Contribution to journalArticle

@article{d10a108db5514b958882da3fa465105b,
title = "An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis",
abstract = "The problem of estimating the performance of a given classifier on a given data set is discussed for the case when no knowledge is available concerning the underlying distributions. A new method of estimating the probability of misclassification is proposed which yields essentially unbiased results similar to Lachenbruch's U-method with far less computation involved. While no theoretical work is presented, a practical rule of thumb is given for choosing the parameters of the estimator. The results are based on experiments performed with a data set concerning six diseases related to epigastric pain, and underline the importance of reporting performance on both the testing data and the training data. Whereas previous papers have continually reported results with a probability of correct classification as high as 74.3 per cent on the raw data and 92.0 per cent on {"}processed{"} data, in this paper it is shown that a much more significant estimate of the probability of correct classification based on this data set is 51.0 per cent.",
keywords = "Classification, Epigastric pain, Feature size, Nearest Neighbour rule, Nonparametric, Pattern recognition, Probability of misclassification, Sample size, Symptom diagnosis, Testing sets, Training sets",
author = "Godfried Toussaint and Sharpe, {P. M.}",
year = "1975",
month = "1",
day = "1",
doi = "10.1016/0010-4825(75)90038-4",
language = "English (US)",
volume = "4",
pages = "269--278",
journal = "Computers in Biology and Medicine",
issn = "0010-4825",
publisher = "Elsevier Limited",
number = "3-4",

}

TY - JOUR

T1 - An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis

AU - Toussaint, Godfried

AU - Sharpe, P. M.

PY - 1975/1/1

Y1 - 1975/1/1

N2 - The problem of estimating the performance of a given classifier on a given data set is discussed for the case when no knowledge is available concerning the underlying distributions. A new method of estimating the probability of misclassification is proposed which yields essentially unbiased results similar to Lachenbruch's U-method with far less computation involved. While no theoretical work is presented, a practical rule of thumb is given for choosing the parameters of the estimator. The results are based on experiments performed with a data set concerning six diseases related to epigastric pain, and underline the importance of reporting performance on both the testing data and the training data. Whereas previous papers have continually reported results with a probability of correct classification as high as 74.3 per cent on the raw data and 92.0 per cent on "processed" data, in this paper it is shown that a much more significant estimate of the probability of correct classification based on this data set is 51.0 per cent.

AB - The problem of estimating the performance of a given classifier on a given data set is discussed for the case when no knowledge is available concerning the underlying distributions. A new method of estimating the probability of misclassification is proposed which yields essentially unbiased results similar to Lachenbruch's U-method with far less computation involved. While no theoretical work is presented, a practical rule of thumb is given for choosing the parameters of the estimator. The results are based on experiments performed with a data set concerning six diseases related to epigastric pain, and underline the importance of reporting performance on both the testing data and the training data. Whereas previous papers have continually reported results with a probability of correct classification as high as 74.3 per cent on the raw data and 92.0 per cent on "processed" data, in this paper it is shown that a much more significant estimate of the probability of correct classification based on this data set is 51.0 per cent.

KW - Classification

KW - Epigastric pain

KW - Feature size

KW - Nearest Neighbour rule

KW - Nonparametric

KW - Pattern recognition

KW - Probability of misclassification

KW - Sample size

KW - Symptom diagnosis

KW - Testing sets

KW - Training sets

UR - http://www.scopus.com/inward/record.url?scp=0016466461&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0016466461&partnerID=8YFLogxK

U2 - 10.1016/0010-4825(75)90038-4

DO - 10.1016/0010-4825(75)90038-4

M3 - Article

C2 - 1095287

AN - SCOPUS:0016466461

VL - 4

SP - 269

EP - 278

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

SN - 0010-4825

IS - 3-4

ER -