Testing for Inter-rater reliability
I haven’t used a stats package in a long time. I remember back in grad school how interesting my Biostats and BioInformatics courses were, especially Janet Hughes‘ classes. But in my professional life, I hardly ventured beyond the first five tasks in the function button on the standard Excel toolbar.
My QI (Quality Improvement) team has been pre-testing some clinical assessment questionnaires for Health Centers. The questionnaires consist of a checklist and a rating, for two assessors to fill out while observing a specific consultation eg an Antenatal Care visit. The checklist scores are weighted with qualitative ratings. For example, of the items which the midwife conducted during the Physical Examination, the assessors rate the quality in which these items were carried out.
There’s a lot of room for bias in any data collection method, but at least I want to know that my instrument is reliable. So I want to test the inter-rater agreement (eg did they see the same things? did they rate the items similarly?). You’d think this is a simple enough calculation to carry out in excel, in a neat and tidy point-and-click experience. But no. Apparently this seems to be an obscure corner of statistics that you need macros for to carry out in SAS and SPSS.
I ask for the billionth time in my short life: what did people do before internet? I’m a bit less stumped since I’ve found these handy sources below, which provide a free or low-cost stats package for content analysis. Whew!
- kappa2 – free downloadable automated excel workbook (although I’ve had some problems using this one)
- ReCal – free to use, online interface to enter your data
- AgreeStat – $45 downloadable automated Excel workbook