Published October 2000

Agreement observer

Kappa

Evaluating new technologies or test raises the question of whether differences are due to the technology or the interpreters. Kappa, is widely used to measure interobserver variability, that is, how often 2 or more observers agree in their interpretations. Simple agreement, the proportion of agreements between yes and no is a poor measure of agreement because it does not correct for chance. Kappa is the preferred statistic because it accounts for chance.

Widely, but inappropriately used in many radiologic studies is the correlation coefficient as a measure of agreement. Two observers may have good (even perfect) correlation, but never agree. One may describe hearts as mildly enlarged, the other severely enlarged.

Observer 1
Observer 2
Yes
No
Totals
Yes
Proportion agreement
No
Bias Index
Totals
kappa
Prevalence Index
References

Kundel HL, Polansky M. Measurement of Observer Agreement. Radiology 2003; 228:303-8.
[Related Records][Full text]

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1:307-310.
[Related Records]

Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. J Clin Epidemiol 1993; 46:423-429.
[Related Records]

Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990; 43:543-549.
[Related Records]

Ker M. Issues in the use of kappa. Invest Radiol 1991; 26:78-83.
[Related Records]

Landis RJ , Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33:159-174.