首页    期刊浏览 2025年03月01日 星期六
登录注册

文章基本信息

  • 标题:A New Approach that improves TF-IDF Weighting Measure
  • 本地全文:下载
  • 作者:Reddahi Nabil ; Labriji Amine ; Abdelbaki Issam
  • 期刊名称:International Journal of Information and Communication Technology Research
  • 电子版ISSN:2223-4985
  • 出版年度:2015
  • 卷号:5
  • 期号:10
  • 出版社:IRPN Publishers
  • 摘要:Information retrieval (IR) systems are designed to retrieve information in a set of documents called corpus. An iterative search in all documents is a process that can be slow and costly in terms of performance. Indexing is the mechanism that extracts descriptors terms of documents and work on a smaller body than the original set. However, this indexing form ignores the meaning of words; two synonymous words are considered different. The semantics of words is a crucial factor for effective research. It is in this sense that the semantic indexing attributes to the information retrieval systems their qualities of being powerful and efficient. However, the only use of semantic indexing does not value one term over another; some terms are more holders of information than others, it would be wise to assign more weight to these terms. Terms (or concepts) weighting is a technique that allows assigning a specific weight to a term according to its importance in the corpus. The most famous weight measure known is TF-IDF; this measure assigns more weight to the least frequent words in the corpus. However this weight measure as most others, does not take into account the semantic relationships between terms. We propose in this paper a method of weighting based on the semantic relationships between terms.
  • 关键词:semantic indexing; information retrieval; weighting; controlled indexing languages; similarity; vector space model
国家哲学社会科学文献中心版权所有