文章基本信息

标题：Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting
本地全文：下载
作者：Masaki Murata ; Toshiyuki Kanamaru ; Tamotsu Shirado 等
期刊名称：Information and Media Technologies
电子版ISSN：1881-0896
出版年度：2007
卷号：2
期号：1
页码：252-278
DOI：10.11185/imt.2.252
出版社：Information and Media Technologies Editorial Board
摘要：Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because the F-term categories are fine-grained, they are useful when we classify patent documents. We clarified the following three points using experiments: i) which variations of the k-nearest neighborhood method are the best for patent classification, ii) which methods of calculating similarity are the best for patent classification, and iii) from which regions of a patent terms should be extracted. In our experiments, we used the patent data used in the F-term categorization task in the NTCIR-5 Patent Workshop (NTCIR committee 2005; Iwayama, Fujii, and Kando 2005). We found that the method of adding the scores of k extracted documents to classify patent documents was the most effective among the variations of the k-nearest neighborhood method used in this study. We also found that SMART (Singhal, Buckley, and Mitra 1996; Singhal, Choi, Hindle, and Pereira 1997), which is known to be effective in information retrieval, was the most effective method of calculating similarity. Finally, when extracting terms, we found that using the abstract and claim regions together was the best method among all the combinations of using abstract, claim, and description regions. The results were confirmed using a statistical test. Moreover, we experimented with changing the amount of training data and found that we obtained better performance when we used more data, which was limited to that provided in the NTCIR-5 Patent Workshop.
关键词：F-term;Patent Classification;k-Nearest Neighborhood Method;SMART