首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Mapping Python Programs to Vectors using Recursive Neural Encodings
  • 本地全文:下载
  • 作者:Benjamin Paassen ; Jessica McBroom ; Bryn Jeffries
  • 期刊名称:Journal of Educational Data Mining
  • 电子版ISSN:2157-2100
  • 出版年度:2021
  • 卷号:13
  • 期号:3
  • 页码:1-35
  • DOI:10.5281/zenodo.5634224
  • 语种:English
  • 出版社:International EDM Society
  • 摘要:Educational data mining involves the application of data mining techniques to student activity. However, in the context of computer programming, many data mining techniques can not be applied because they require vector-shaped input, whereas computer programs have the form of syntax trees. In this paper, we present ast2vec, a neural network that maps Python syntax trees to vectors and back, thereby enabling about a hundred data mining techniques that were previously not applicable. Ast2vec has been trained on almost half a million programs of novice programmers and is designed to be applied across learning tasks without re-training, meaning that users can apply it without any need for deep learning. We demonstrate the generality of ast2vec in three settings. First, we provide example analyses using ast2vec on a classroom-sized dataset, involving two novel techniques, namely progress-variance projection for visualization and a dynamical systems analysis for prediction. In these examples, we also explain how ast2vec can be utilized for educational decisions. Second, we consider the ability of ast2vec to recover the original syntax tree from its vector representation on the training data and two other large-scale programming datasets. Finally, we evaluate the predictive capability of a linear dynamical system on top of ast2vec, obtaining similar results to techniques that work directly on syntax trees while being much faster (constant- instead of linear-time processing). We hope ast2vec can augment the educational data mining toolkit by making analyses of computer programs easier, richer, and more efficient.
  • 关键词:computer science education;computer programs;word embeddings;representation learning;neural networks;visualization;program vectors
国家哲学社会科学文献中心版权所有