出版社:University of Sheffield, Department of Information Studies
摘要:In the big data age, we have to deal with a tremendous amount of information, which can be collected from various types of sources. For information search systems such as Web search engines or online digital libraries, the collection of documents becomes larger and larger. For some query, an information search system needs to retrieve a large number of documents as the result to the query. On the other hand, very often people are only willing to visit no more than a few top-ranked documents. Therefore, how to develop an information search system with desirable efficiency and effectiveness is a research problem. In this paper, we focus on the data fusion approach to information search, in which each component search model contributes a result and all the results are combined by a fusion algorithm. Through empirical study, we are able to find a feasible combination method that is balanced on both effectiveness and efficiency in the context of data fusion. It is a multi-optimisation problem that aims to balance effectiveness and efficiency. To support this, we need to understand how these two factors affect each other and to what extent. Using some groups of historical runs from TREC to carry out the experiment, we find that using much less information (e.g., less than 10% of the documents in the experiment), good efficiency is achievable with marginal loss on effectiveness. We consider that the findings from our experiment are informative and this can be used as a guideline for providing more efficient search service in the big data environment.
关键词:search engines; efficiency; effectiveness; information search; data fusion; linear combination