文章基本信息

标题：Applying Psychometric Modeling to aid Feature Engineering in Predictive Log-Data Analytics: The NAEP EDM Competition
本地全文：下载
作者：Fabian Zehner ; Beate Eichmann ; Tobias Deribo 等
期刊名称：Journal of Educational Data Mining
电子版ISSN：2157-2100
出版年度：2021
卷号：13
期号：2
页码：80-107
DOI：10.5281/zenodo.5275316
语种：English
出版社：International EDM Society
摘要：The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Credit Model for estimating latent person ability. Additionally, we adopted an n-gram feature approach for event sequences. Furthermore, instead of using the provided binary target label, we distinguished inefficient test takers who were going too fast and those who were going too slow for training a multi-label classifier. Our best-performing ensemble classifier comprised three sets of low-dimensional classifiers, dominated by test-taker speed. While our classifier reached moderate performance, relative to the competition leaderboard, our approach makes two important contributions. First, we show how classifiers that contain features engineered through literature-derived domain knowledge can provide meaningful predictions if results can be contextualized to test administrators who wish to intervene or take action. Second, our re-engineering of test scores enabled us to incorporate person ability into the models. However, ability was hardly predictive of efficient behavior, leading to the conclusion that the target label's validity needs to be questioned. Beyond competition-related findings, we furthermore report a state sequence analysis for demonstrating the viability of the employed tools. The latter yielded four different test-taking types that described distinctive differences between test takers, providing relevant implications for assessment practice.
关键词：log files;psychometric models;domain knowledge–based feature engineering;process data;state sequence analysis;clustering;latent state;ensemble