出版社:The Japanese Society for Artificial Intelligence
摘要:There have been several previous studies on measuring the semantic similarity between words whose concepts are represented as points in a multi-dimensional vector space acquired from text data such as electronic dictionaries or text corpora. A central problem in these studies is how to select orthonormal basis vectors for the space which represents attributes of the words. We propose a method of building the space by combining two representative methods, one using singular value decomposition and the other using the contents of a thesaurus. The proposed method was evaluated both for the purposes of similar word retrieval and for document retrieval. The evaluations showed that the proposed combination is more effective than either of the original methods alone for both of these tasks.
关键词:word space ; attribute ; similarity ; thesaurus ; singular value decomposition