首页    期刊浏览 2025年02月27日 星期四
登录注册

文章基本信息

  • 标题:Building LinkedIn's Real-time Activity Data Pipeline
  • 本地全文:下载
  • 作者:Ken Goodhope ; Joel Koshy ; Jay Kreps
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2012
  • 卷号:35
  • 期号:02
  • 出版社:IEEE Computer Society
  • 摘要:One trend in the implementation of modern web systems is the use of activity data in the form of log or event messages that capture user and server activity. This data is at the heart of many internet systems in the domains of advertising, relevance, search, recommendation systems, and security, as well as contin- uing to fulfill its traditional role in analytics and reporting. Many of these uses place real-time demands on data feeds. Activity data is extremely high volume and real-time pipelines present new design chal- lenges. This paper discusses the design and engineering problems we encountered in moving LinkedIn¡¯s data pipeline from a batch-oriented file aggregation mechanism to a real-time publish-subscribe system called Kafka. This pipeline currently runs in production at LinkedIn and handles more than 10 billion message writes each day with a sustained peak of over 172,000 messages per second. Kafka supports dozens of subscribing systems and delivers more than 55 billion messages to these consumer processing each day. We discuss the origins of this systems, missteps on the path to real-time, and the design and engineering problems we encountered along the way.
国家哲学社会科学文献中心版权所有