首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Visualizing bivariate long-tailed data
  • 本地全文:下载
  • 作者:Justin S. Dyer ; Art B. Owen
  • 期刊名称:Electronic Journal of Statistics
  • 印刷版ISSN:1935-7524
  • 出版年度:2011
  • 卷号:5
  • 页码:642-668
  • DOI:10.1214/11-EJS622
  • 语种:English
  • 出版社:Institute of Mathematical Statistics
  • 摘要:Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf–Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf–Mandelbrot–Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.
国家哲学社会科学文献中心版权所有