文章基本信息

标题：A New Approach that improves TF-IDF Weighting Measure
本地全文：下载
作者：Reddahi Nabil ; Labriji Amine ; Abdelbaki Issam 等
期刊名称：International Journal of Information and Communication Technology Research
电子版ISSN：2223-4985
出版年度：2015
卷号：5
期号：10
出版社：IRPN Publishers
摘要：Information retrieval (IR) systems are designed to retrieve information in a set of documents called corpus. An iterative search in all documents is a process that can be slow and costly in terms of performance. Indexing is the mechanism that extracts descriptors terms of documents and work on a smaller body than the original set. However, this indexing form ignores the meaning of words; two synonymous words are considered different. The semantics of words is a crucial factor for effective research. It is in this sense that the semantic indexing attributes to the information retrieval systems their qualities of being powerful and efficient. However, the only use of semantic indexing does not value one term over another; some terms are more holders of information than others, it would be wise to assign more weight to these terms. Terms (or concepts) weighting is a technique that allows assigning a specific weight to a term according to its importance in the corpus. The most famous weight measure known is TF-IDF; this measure assigns more weight to the least frequent words in the corpus. However this weight measure as most others, does not take into account the semantic relationships between terms. We propose in this paper a method of weighting based on the semantic relationships between terms.
关键词：semantic indexing; information retrieval; weighting; controlled indexing languages; similarity; vector space model