首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:A MapReduce Based Distributed LSI for Scalable Information Retrieval
  • 本地全文:下载
  • 作者:Liu, Yang ; Li, Maozhen ; Khan, Mukhtaj
  • 期刊名称:COMPUTING AND INFORMATICS
  • 印刷版ISSN:1335-9150
  • 出版年度:2014
  • 卷号:33
  • 期号:2
  • 页码:259-280
  • 语种:English
  • 出版社:COMPUTING AND INFORMATICS
  • 摘要:Latent Semantic Indexing (LSI) has been widely used in information retrieval due to its efficiency in solving the problems of polysemy and synonymy. However, LSI is notably a computationally intensive process because of the computing complexities of singular value decomposition and filtering operations involved in the process. This paper presents MR-LSI, a MapReduce based distributed LSI algorithm for scalable information retrieval. The performance of MR-LSI is first evaluated in a small scale experimental cluster environment, and subsequently evaluated in large scale simulation environments. By partitioning the dataset into smaller subsets and optimizing the partitioned subsets across a cluster of computing nodes, the overhead of the MR-LSI algorithm is reduced significantly while maintaining a high level of accuracy in retrieving documents of user interest. A genetic algorithm based load balancing scheme is designed to optimize the performance of MR-LSI in heterogeneous computing environments in which the computing nodes have varied resources.
  • 关键词:Information retrieval, latent semantic indexing, MapReduce, load balancing, genetic algorithms
国家哲学社会科学文献中心版权所有