期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2005
卷号:28
期号:04
出版社:IEEE Computer Society
摘要:Efficiently finding most relevant publications in large corpora is an important research topic in infor-
mation retrieval. The number of biological literatures grows exponentially in various publication data-
bases. The objective of the study in this paper is to fast locate useful publications from large biomedical
document collections based on users’ preferences.
In this paper, a new iterative search paradigm is introduced which integrates biological background
knowledge in organizing the results returned by search engines, and utilizes user feedbacks to filter irrel-
evant documents. A term weighting scheme based on Gene Ontology is introduced to improve similarity
measurement of documents in biomedical domain. A prototype text retrieval system has been built based
on this iterative search approach. Experimental results show that the system can filter a large number of
irrelevant documents while keep most of the relevant documents with limited user interactions.