TY - GEN
T1 - A measurement study of congestion in an InfiniBand network
AU - Alali, Fatma
AU - Mizero, Fabrice
AU - Veeraraghavan, Malathi
AU - Dennis, John M.
N1 - Publisher Copyright:
© 2017 International Federation for Information Processing - IFIP.
PY - 2017/8/4
Y1 - 2017/8/4
N2 - This paper presents a measurement study of congestion on a production, highly utilized, 72K-core InfiniBand cluster called Yellowstone. The measurement study consists of a 23-day data collection phase in which port counters of the Yellowstone switches were read multiple times every hour to check for stalls during which the port is unable to send data due to a lack of flow-control credits. A total of 30M data records were obtained and analyzed. Results showed that a significant number of the 100-ms intervals over which a port counter was observed, there were transmission stalls. For example, out of 6M observations of Top-of-Rack (ToR) switch uplink ports, we found that the port was forced to wait for credits in 60% of these 100-ms intervals. Such transmission stalls could increase application execution time, and also decrease cluster utilization. The latter will occur when Message Passing Interface (MPI) Barrier calls are issued for synchronization and communication delays cause one or more MPI ranks to be slower than others.
AB - This paper presents a measurement study of congestion on a production, highly utilized, 72K-core InfiniBand cluster called Yellowstone. The measurement study consists of a 23-day data collection phase in which port counters of the Yellowstone switches were read multiple times every hour to check for stalls during which the port is unable to send data due to a lack of flow-control credits. A total of 30M data records were obtained and analyzed. Results showed that a significant number of the 100-ms intervals over which a port counter was observed, there were transmission stalls. For example, out of 6M observations of Top-of-Rack (ToR) switch uplink ports, we found that the port was forced to wait for credits in 60% of these 100-ms intervals. Such transmission stalls could increase application execution time, and also decrease cluster utilization. The latter will occur when Message Passing Interface (MPI) Barrier calls are issued for synchronization and communication delays cause one or more MPI ranks to be slower than others.
KW - congestion
KW - Fat-tree
KW - InfiniBand
UR - https://www.scopus.com/pages/publications/85030239972
U2 - 10.23919/TMA.2017.8002911
DO - 10.23919/TMA.2017.8002911
M3 - Conference contribution
AN - SCOPUS:85030239972
T3 - TMA 2017 - Proceedings of the 1st Network Traffic Measurement and Analysis Conference
BT - TMA 2017 - Proceedings of the 1st Network Traffic Measurement and Analysis Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st Network Traffic Measurement and Analysis Conference, TMA 2017
Y2 - 21 June 2017 through 23 June 2017
ER -