Evaluating new technologies or test raises the question of whether differences are due to the technology or the interpreters. Kappa, is widely used to measure interobserver variability, that is, how often 2 or more observers agree in their interpretations. Simple agreement, the proportion of agreements between yes and no is a poor measure of agreement because it does not correct for chance. Kappa is the preferred statistic because it accounts for chance.
Widely, but inappropriately used in many radiologic studies is the correlation coefficient as a measure of agreement. Two observers may have good (even perfect) correlation, but never agree. One may describe hearts as mildly enlarged, the other severely enlarged.