出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries using a comparable corpus (wikipedia) and word embeddings. Obtained results show that the latter method (word embeddings) is a good way to achieve higher precision rates and retrieve more accurate documents.
关键词:Internet and Web Applications ;Data and knowledge Representation ;Document Retrieval.