期刊名称:International Journal of New Computer Architectures and their Applications
印刷版ISSN:2220-9085
出版年度:2018
卷号:8
期号:2
页码:65-77
出版社:Society of Digital Information and Wireless Communications
摘要:Web search queries are short, ambiguous and tend
to have multiple underlying interpretations. To reformulate
such queries, query expansion is a prominent
method that leads to retrieve a set of relevant
documents. In this paper, we propose an aspectbased
query expansion technique for diversified
document retrieval. At first, query suggestions and
completions are retrieved from major commercial
search engines. A frequent phrase-based soft clustering
algorithm is then applied to group similar retrieved
candidates into clusters. Each cluster represents
different query aspect. The expansion terms
are selected from the generated cluster labels for
each cluster. To estimate the relevancy between the
expanded query and the documents, multiple new
lexical and semantic features are introduced using
the content information, and word-embedding
model, respectively. Finally, a linear ranking approach
is employed to re-rank the documents retrieved
for the original query using the extracted
features. We conduct experiments on Clueweb09
document collection using TREC 2012 Web Track
queries. The experimental results clearly demonstrate
that our proposed aspect-based query expansion
method is effective to diversify the retrieved
documents and outperformed baseline and some
known related methods in terms of diversity metrics
ERR-IA, α-nDCG and NRBP at the cut of 20.
关键词:Query Ambiguity; Query Expansion; Diversified
Search; Query Aspect; and Word Embedding