文章基本信息

标题：Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability
本地全文：下载
作者：Anne Garrison Wilhelm ; Amy Gillespie Rouse ; Francesca Jones 等
期刊名称：Practical Assessment, Research and Evaluation
印刷版ISSN：1531-7714
电子版ISSN：1531-7714
出版年度：2018
卷号：23
出版社：ERIC: Clearinghouse On Assessment and Evaluation
摘要：Although inter-rater reliability is an important aspect of using observational instruments, it hasreceived little theoretical attention. In this article, we offer some guidance for practitioners andconsumers of classroom observations so that they can make decisions about inter-rater reliability,both for study design and in the reporting of data and results. We reviewed articles in two majorjournals in the fields of reading and mathematics to understand how researchers have measured andreported inter-rater reliability in a recent decade. We found that researchers have tended to reportmeasures of inter-rater agreement above the .80 threshold with little attention to the magnitude ofscore differences between raters. Then, we conducted simulations to understand both how differentindices for classroom observation reliability are related to each other and the impact of reliabilitydecisions on study results. Results from the simulation studies suggest that mean correlations with anoutcome are slightly lower at lower levels of percentage of exact agreement but that the magnitudeof score differences has a more dramatic effect on correlations. Therefore, adhering to strictthresholds for inter-rater agreement is less helpful than reporting exact point estimates and alsoexamining measures of rater consistency.