Comments on Statistical Issues in May 2014

Article information

Korean J Fam Med. 2014;35(3):167-168

Publication date (electronic) : 2014 May 22

doi : https://doi.org/10.4082/kjfm.2014.35.3.167

Department of Biostatistics, The Catholic University of Korea College of Medicine, Seoul, Korea.

In this section, we explain Cohen's kappa coefficient, a measure of agreement, which appeared in the article titled, 'Impact of Clinical Performance Examination on Incoming Interns' Clinical Competency in Differential Diagnosis of Headache', by Park et al.1) published in March 2014.

KAPPA, A MEASURE OF AGREEMENT

Cohen's kappa coefficient is a statistical measure of interrater agreement or inter-instrument agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than a simple percent agreement calculation since kappa takes into account agreement occurring by chance. A chance-corrected measure originally introduced by Scott,2) was extended by Cohen3) and has come to be known as Cohen's kappa. It comes from the notion that the observed cases of agreement include some cases for which the agreement was by chance alone.

Let us assume that there are two raters, who independently rate n subjects into one of two mutually exclusive and exhaustive nominal categories. Let p_ij be the proportion of subjects that are placed in the i, jth cell, i.e., assigned to the ith category by the first rater and to the jth category by the second rater (i, j = 1, 2). Also, let p_i+ = p_i1 + p_i2 denote the proportion of subjects placed in the ith row (i.e., the ith category by the first rater), and let p_+j = p_1j + p_2j denote the proportion of subjects placed in the jth column (i.e., the jth category by the second rater). Then the kappa coefficient is

where p_o = p₁₁ + p₂₂ is the observed proportion of agreement and p_e = p₁₊p₊₁ + p₂₊p₊₂ is the proportion of agreement expected by chance.

If there is complete agreement, κ = 1. If observed agreement is greater than or equal to chance agreement, κ ≥ 0, and if observed agreement is less than chance agreement, κ < 0. The minimum value of κ depends on the marginal proportions. If they are such that p_e = 0.5, then the minimum equals -1. Otherwise, the minimum is between -1 and 0.

Example. Two doctors independently classified 100 people into one of two diagnostic categories, abnormal/normal as follows (Table 1).

Table 1

Diagnoses on n = 100 people by two doctors

Observed agreement (p_o) = (40 + 30) / 100 = 0.7

Chance agreement (p_e) = (40 + 10) / 100 × (40 + 20) / 100 + (20 + 30) / 100 × (10 + 30) / 100 = 0.5 × 0.6 + 0.5 × 0.4 = 0.3 +0.2 = 0.5

κ = (0.7 - 0.5) / (1 - 0.5) = 0.4.

However, we should not use kappa as a measure of agreement when all raters or devices cannot be treated symmetrically. When one of the sources of ratings may be viewed as superior or a standard, e.g., one rater is senior to the other or one medical device is more precise measuring instrument than the other, kappa may no longer be appropriate.

Notes

No potential conflict of interest relevant to this article was reported.

References

1. Park SM, Song YM, Kim BK, Kim H. Impact of clinical performance examination on incoming interns' clinical competency in differential diagnosis of headache. Korean J Fam Med 2014;35:56–64. 24724000.

2. Scott WA. Reliability of content analysis: the case of nominal scale coding. Public Opin Q 1955;19:321–325.

3. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960;20:37–46.

Article information Continued

(open-access, http://creativecommons.org/licenses/by-nc/3.0/) :

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1

Diagnoses on n = 100 people by two doctors