文章基本信息

标题：A Word Extraction Method from Newspaper Articles Based on Time Infomation for Event Sequence Mining
本地全文：下载
作者：Tomomichi Tada ; Koji Iwanuma ; Hidetomo Nabeshima 等
期刊名称：人工知能学会論文誌
印刷版ISSN：1346-0714
电子版ISSN：1346-8030
出版年度：2009
卷号：24
期号：6
页码：488-493
DOI：10.1527/tjsai.24.488
出版社：The Japanese Society for Artificial Intelligence
摘要：This paper shows a new method of extracting important words from newspaper articles based on time-sequence information. This word extraction method plays an important role in event sequence mining. TF-IDF is a well-known method to rank word's importance in a document. However, the TF-IDF method never consider the time information embedded in sequential textual data, which is peculiar to newspapers. In this research, we will propose a new word-extraction method, called the TF-IDayF method, which considers time-sequence information, and can extract important/characteristic words expressing sequential events. The TF-IDayF method never use so-called burst phenomenon of topic word occurrences, which has been studied by lots of researchers. The TF-IDayF method is quite simple, but effective and easy to compute in sequential textual mining. We evaluate the proposed method from three points of view, i.e., a semantic viewpoint, a statistical one and a data mining viewpoint through several experiments.
关键词：word extraction ; event sequentce mining ; TF-IDF ; newspaper article