首页    期刊浏览 2024年12月05日 星期四
登录注册

文章基本信息

  • 标题:Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case
  • 本地全文:下载
  • 作者:Lorenzo Bracciale , Pierpaolo Loreti , Andrea Detti ; Nicola Blefari Melazzi
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2019
  • 卷号:10
  • 期号:11
  • 页码:1-11
  • DOI:10.3390/info10110330
  • 出版社:MDPI Publishing
  • 摘要:A very common problem in designing caching/prefetching systems, distribution networks, search engines, and web-crawlers is determining how long a given content lasts before being updated, i.e., its update frequency. Indeed, while some content is not frequently updated (e.g., videos), in other cases revisions periodically invalidate contents. In this work, we present an analysis of Wikipedia, currently the 5th most visited website in the world, evaluating the statistics of updates of its pages and their relationship with page view statistics. We discovered that the number of updates of a page follows a lognormal distribution. We provide fitting parameters as well as a goodness of fit analysis, showing the statistical significance of the model to describe the empirical data. We perform an analysis of the views–updates relationship, showing that in a time period of a month, there is a lack of evident correlation between the most updated pages and the most viewed pages. However, observing specific pages, we show that there is a strong correlation between the peaks of views and updates, and we find that in more than 50% of cases, the time difference between the two peaks is less than a week. This reflects the underlying process whereby an event causes both an update and a visit peak that occurs with different time delays. This behavior can pave the way for predictive traffic analysis applications based on content update statistics. Finally, we show how the model can be used to evaluate the performance of an in-network caching scenario.
  • 关键词:Wikipedia; real-data statistics; update statistics; popularity; caching; content revisions Wikipedia ; real-data statistics ; update statistics ; popularity ; caching ; content revisions
国家哲学社会科学文献中心版权所有