# Comments on Statistical Issues in May 2014

## Article information

In this section, we explain Cohen's kappa coefficient, a measure of agreement, which appeared in the article titled, 'Impact of Clinical Performance Examination on Incoming Interns' Clinical Competency in Differential Diagnosis of Headache', by Park et al.1) published in March 2014.

## KAPPA, A MEASURE OF AGREEMENT

Cohen's kappa coefficient is a statistical measure of interrater agreement or inter-instrument agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than a simple percent agreement calculation since kappa takes into account agreement occurring by chance. A chance-corrected measure originally introduced by Scott,2) was extended by Cohen3) and has come to be known as Cohen's kappa. It comes from the notion that the observed cases of agreement include some cases for which the agreement was by chance alone.

Let us assume that there are two raters, who independently rate *n* subjects into one of two mutually exclusive and exhaustive nominal categories. Let *p*_{ij} be the proportion of subjects that are placed in the *i*, *j*th cell, i.e., assigned to the ith category by the first rater and to the *j*th category by the second rater (*i*, *j* = 1, 2). Also, let *p*_{i+} = *p*_{i1} + *p*_{i2} denote the proportion of subjects placed in the ith row (i.e., the ith category by the first rater), and let *p*_{+j} = *p*_{1j} + *p*_{2j} denote the proportion of subjects placed in the jth column (i.e., the *j*th category by the second rater). Then the kappa coefficient is

where *p*_{o} = *p*_{11} + *p*_{22} is the observed proportion of agreement and *p*_{e} = *p*_{1+}*p*_{+1} + *p*_{2+}*p*_{+2} is the proportion of agreement expected by chance.

If there is complete agreement, κ = 1. If observed agreement is greater than or equal to chance agreement, κ ≥ 0, and if observed agreement is less than chance agreement, κ < 0. The minimum value of κ depends on the marginal proportions. If they are such that *p*_{e} = 0.5, then the minimum equals -1. Otherwise, the minimum is between -1 and 0.

Example. Two doctors independently classified 100 people into one of two diagnostic categories, abnormal/normal as follows (Table 1).

*p*

_{o}) = (40 + 30) / 100 = 0.7

*p*

_{e}) = (40 + 10) / 100 × (40 + 20) / 100 + (20 + 30) / 100 × (10 + 30) / 100 = 0.5 × 0.6 + 0.5 × 0.4 = 0.3 +0.2 = 0.5

However, we should not use kappa as a measure of agreement when all raters or devices cannot be treated symmetrically. When one of the sources of ratings may be viewed as superior or a standard, e.g., one rater is senior to the other or one medical device is more precise measuring instrument than the other, kappa may no longer be appropriate.

## Notes

No potential conflict of interest relevant to this article was reported.