A first look at inter-data center traffic characteristics via Yahoo! datasets

Yingying Chen, Sourabh Jain, Vijay Kumar Adhikari, Zhi Li Zhang, Kuai Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

143 Citations (Scopus)

Abstract

Effectively managing multiple data centers and their traffic dynamics pose many challenges to their operators, as little is known about the characteristics of inter-data center (D2D) traffic. In this paper we present a first study of D2D traffic characteristics using the anonymized NetFlow datasets collected at the border routers of five major Yahoo! data centers. Our contributions are mainly two-fold: i) we develop novel heuristics to infer the Yahoo! IP addresses and localize their locations from the anonymized NetFlow datasets, and ii) we study and analyze both D2D and client traffic characteristics and the correlations between these two types of traffic. Our study reveals that Yahoo! uses a hierarchical way of deploying data centers, with several satellite data centers distributed in other countries and backbone data centers distributed in US locations. For Yahoo! US data centers, we separate the client-triggered D2D traffic and background D2D traffic from the aggregate D2D traffic using port based correlation, and study their respective characteristics. Our findings shed light on the interplay of multiple data centers and their traffic dynamics within a large content provider, and provide insights to data center designers and operators as well as researchers.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE INFOCOM
Pages1620-1628
Number of pages9
DOIs
StatePublished - 2011
EventIEEE INFOCOM 2011 - Shanghai, China
Duration: Apr 10 2011Apr 15 2011

Other

OtherIEEE INFOCOM 2011
CountryChina
CityShanghai
Period4/10/114/15/11

Fingerprint

Routers
Satellites

Keywords

  • Anonymization
  • Content provider
  • Inter-data center
  • NetFlow

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Chen, Y., Jain, S., Adhikari, V. K., Zhang, Z. L., & Xu, K. (2011). A first look at inter-data center traffic characteristics via Yahoo! datasets. In Proceedings - IEEE INFOCOM (pp. 1620-1628). [5934955] https://doi.org/10.1109/INFCOM.2011.5934955

A first look at inter-data center traffic characteristics via Yahoo! datasets. / Chen, Yingying; Jain, Sourabh; Adhikari, Vijay Kumar; Zhang, Zhi Li; Xu, Kuai.

Proceedings - IEEE INFOCOM. 2011. p. 1620-1628 5934955.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, Y, Jain, S, Adhikari, VK, Zhang, ZL & Xu, K 2011, A first look at inter-data center traffic characteristics via Yahoo! datasets. in Proceedings - IEEE INFOCOM., 5934955, pp. 1620-1628, IEEE INFOCOM 2011, Shanghai, China, 4/10/11. https://doi.org/10.1109/INFCOM.2011.5934955
Chen Y, Jain S, Adhikari VK, Zhang ZL, Xu K. A first look at inter-data center traffic characteristics via Yahoo! datasets. In Proceedings - IEEE INFOCOM. 2011. p. 1620-1628. 5934955 https://doi.org/10.1109/INFCOM.2011.5934955
Chen, Yingying ; Jain, Sourabh ; Adhikari, Vijay Kumar ; Zhang, Zhi Li ; Xu, Kuai. / A first look at inter-data center traffic characteristics via Yahoo! datasets. Proceedings - IEEE INFOCOM. 2011. pp. 1620-1628
@inproceedings{828335cc4607477bbf8e85ebbd3b7b19,
title = "A first look at inter-data center traffic characteristics via Yahoo! datasets",
abstract = "Effectively managing multiple data centers and their traffic dynamics pose many challenges to their operators, as little is known about the characteristics of inter-data center (D2D) traffic. In this paper we present a first study of D2D traffic characteristics using the anonymized NetFlow datasets collected at the border routers of five major Yahoo! data centers. Our contributions are mainly two-fold: i) we develop novel heuristics to infer the Yahoo! IP addresses and localize their locations from the anonymized NetFlow datasets, and ii) we study and analyze both D2D and client traffic characteristics and the correlations between these two types of traffic. Our study reveals that Yahoo! uses a hierarchical way of deploying data centers, with several satellite data centers distributed in other countries and backbone data centers distributed in US locations. For Yahoo! US data centers, we separate the client-triggered D2D traffic and background D2D traffic from the aggregate D2D traffic using port based correlation, and study their respective characteristics. Our findings shed light on the interplay of multiple data centers and their traffic dynamics within a large content provider, and provide insights to data center designers and operators as well as researchers.",
keywords = "Anonymization, Content provider, Inter-data center, NetFlow",
author = "Yingying Chen and Sourabh Jain and Adhikari, {Vijay Kumar} and Zhang, {Zhi Li} and Kuai Xu",
year = "2011",
doi = "10.1109/INFCOM.2011.5934955",
language = "English (US)",
isbn = "9781424499212",
pages = "1620--1628",
booktitle = "Proceedings - IEEE INFOCOM",

}

TY - GEN

T1 - A first look at inter-data center traffic characteristics via Yahoo! datasets

AU - Chen, Yingying

AU - Jain, Sourabh

AU - Adhikari, Vijay Kumar

AU - Zhang, Zhi Li

AU - Xu, Kuai

PY - 2011

Y1 - 2011

N2 - Effectively managing multiple data centers and their traffic dynamics pose many challenges to their operators, as little is known about the characteristics of inter-data center (D2D) traffic. In this paper we present a first study of D2D traffic characteristics using the anonymized NetFlow datasets collected at the border routers of five major Yahoo! data centers. Our contributions are mainly two-fold: i) we develop novel heuristics to infer the Yahoo! IP addresses and localize their locations from the anonymized NetFlow datasets, and ii) we study and analyze both D2D and client traffic characteristics and the correlations between these two types of traffic. Our study reveals that Yahoo! uses a hierarchical way of deploying data centers, with several satellite data centers distributed in other countries and backbone data centers distributed in US locations. For Yahoo! US data centers, we separate the client-triggered D2D traffic and background D2D traffic from the aggregate D2D traffic using port based correlation, and study their respective characteristics. Our findings shed light on the interplay of multiple data centers and their traffic dynamics within a large content provider, and provide insights to data center designers and operators as well as researchers.

AB - Effectively managing multiple data centers and their traffic dynamics pose many challenges to their operators, as little is known about the characteristics of inter-data center (D2D) traffic. In this paper we present a first study of D2D traffic characteristics using the anonymized NetFlow datasets collected at the border routers of five major Yahoo! data centers. Our contributions are mainly two-fold: i) we develop novel heuristics to infer the Yahoo! IP addresses and localize their locations from the anonymized NetFlow datasets, and ii) we study and analyze both D2D and client traffic characteristics and the correlations between these two types of traffic. Our study reveals that Yahoo! uses a hierarchical way of deploying data centers, with several satellite data centers distributed in other countries and backbone data centers distributed in US locations. For Yahoo! US data centers, we separate the client-triggered D2D traffic and background D2D traffic from the aggregate D2D traffic using port based correlation, and study their respective characteristics. Our findings shed light on the interplay of multiple data centers and their traffic dynamics within a large content provider, and provide insights to data center designers and operators as well as researchers.

KW - Anonymization

KW - Content provider

KW - Inter-data center

KW - NetFlow

UR - http://www.scopus.com/inward/record.url?scp=79960870812&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960870812&partnerID=8YFLogxK

U2 - 10.1109/INFCOM.2011.5934955

DO - 10.1109/INFCOM.2011.5934955

M3 - Conference contribution

SN - 9781424499212

SP - 1620

EP - 1628

BT - Proceedings - IEEE INFOCOM

ER -