期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2015
卷号:8
期号:2
页码:213-234
DOI:10.14257/ijgdc.2015.8.2.20
出版社:SERSC
摘要:Although in past few decades, lots of research has been done on searching documents in textual data and there have been many commercial systems for retrieving data on the Web. Fast retrieving desired documents from massive textual corpora is still challenging. Indexing is the most commonly used way to improve the performance of retrieving; particularly an effective index structure can help retrieving data more quickly and effectively. This work proposes a novel index structure, named as L.INDEX, to conceptually index massive set of textual data. L-INDEX employs Form Concept Analysis to discover the relationships between documents and expresses the discovered relationships with lattice. To make our approach more applicable to process massive set of data, the structure is implemented in MapReduce environment which can fulfill computing tasks in an efficient distributed way. A set of algorithms is developed for creating, maintaining, storing L-INDEX and searching documents through it as well. A serial of experiments is conducted to verify the performance of our approach by comparing it with some other popular existing indexing structures. The experimental results demonstrate that the proposed approach can index massive set of textual data with an effective structure which supports querying tasks in an efficient way.