文章基本信息

标题：Analytic assessment of multiple-choice tests
本地全文：下载
作者：Maryamsadat kaveh tabatabaee ; Mohammad Hossein Bahreyni Toosi ; Akbar Derakhshan 等
期刊名称：Journal of Medical Education
印刷版ISSN：1735-3998
出版年度：2009
卷号：2
期号：2
语种：English
出版社：Journal of Medical Education
其他摘要：Background : Multiple choice tests (MCT),are widely known and applied as useful evaluation tests in the field of education especially in Medical Science. Items on a multiple-choice test consist of a stem, which is followed by a correct answer as well as three to jour distracters. Items on a well-written multiple-choice test will have stems that are precise and clear, one answer that is clearly correct or best, and distracters that are plausible. Purpose : The purpose of the present study is conducting item and test analysis to 24 MCTs given in first semester of 2000-2001 educational year in medical faculty of Mashad University of Medical Science. Methods : Data of this descriptive study were composed of 1496 MCQs gathered from 2092 answer sheets of 24 MCTs obtained from educational department of the medical faculty.A split-half method of reliability was employed to calculate reliability coefficient for MCTs. Items Difficulty and Discrimination index also were calculated for questions. Further studies should be undertaken for developments the methods for evaluation of validity, assessment of distracters and structural principles in MCTs . Results : Mean reliability coefficient of the exams was 0.72±0.13 and In more than 50% of cases, reliability coefficient was greater than 0.7. There was a significant difference between basic science exams and clinical clerkship exams in Reliability coefficient (P=0.001). Mean standard error a/measurement (SEM) was 3.51±1.11. In 52.2% of the cases, difficulty of MCQs was inappropriate and 49.3% of questions had inadequate discriminative power to discern between poor students and good students. Conclusion : Our finding indicate that only 33% of studied MCQs have desirable or acceptable item difficulty and discrimination indices both and 34.9% of those have no desirable or acceptable item difficulty neither acceptable discrimination index. Having subjects respond reliably on a measure is a great sta11, but there is another concept needed to gel down really well named validity.