Comments on Statistical Issues in May 2014

Yong Gyu Park

doi:10.4082/kjfm.2014.35.3.167

In this section, we explain Cohen's kappa coefficient, a measure of agreement, which appeared in the article titled, 'Impact of Clinical Performance Examination on Incoming Interns' Clinical Competency in Differential Diagnosis of Headache', by Park et al.¹⁾ published in March 2014.

KAPPA, A MEASURE OF AGREEMENT

Cohen's kappa coefficient is a statistical measure of interrater agreement or inter-instrument agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than a simple percent agreement calculation since kappa takes into account agreement occurring by chance. A chance-corrected measure originally introduced by Scott,²⁾ was extended by Cohen³⁾ and has come to be known as Cohen's kappa. It comes from the notion that the observed cases of agreement include some cases for which the agreement was by chance alone.

Let us assume that there are two raters, who independently rate n subjects into one of two mutually exclusive and exhaustive nominal categories. Let p_ij be the proportion of subjects that are placed in the i, jth cell, i.e., assigned to the ith category by the first rater and to the jth category by the second rater (i, j = 1, 2). Also, let p_i+ = p_i1 + p_i2 denote the proportion of subjects placed in the ith row (i.e., the ith category by the first rater), and let p_+j = p_1j + p_2j denote the proportion of subjects placed in the jth column (i.e., the jth category by the second rater). Then the kappa coefficient is

where p_o = p₁₁ + p₂₂ is the observed proportion of agreement and p_e = p₁₊p₊₁ + p₂₊p₊₂ is the proportion of agreement expected by chance.

If there is complete agreement, κ = 1. If observed agreement is greater than or equal to chance agreement, κ ≥ 0, and if observed agreement is less than chance agreement, κ < 0. The minimum value of κ depends on the marginal proportions. If they are such that p_e = 0.5, then the minimum equals -1. Otherwise, the minimum is between -1 and 0.

Example. Two doctors independently classified 100 people into one of two diagnostic categories, abnormal/normal as follows (Table 1).

Observed agreement (p_o) = (40 + 30) / 100 = 0.7

Chance agreement (p_e) = (40 + 10) / 100 × (40 + 20) / 100 + (20 + 30) / 100 × (10 + 30) / 100 = 0.5 × 0.6 + 0.5 × 0.4 = 0.3 +0.2 = 0.5

κ = (0.7 - 0.5) / (1 - 0.5) = 0.4.

However, we should not use kappa as a measure of agreement when all raters or devices cannot be treated symmetrically. When one of the sources of ratings may be viewed as superior or a standard, e.g., one rater is senior to the other or one medical device is more precise measuring instrument than the other, kappa may no longer be appropriate.

Comments on Statistical Issues in May 2014

KAPPA, A MEASURE OF AGREEMENT

Notes

References

Table 1