Analyzing and Mitigating Congestion on High Performance Networks

Date and time: 
Tuesday, February 20, 2018 - 15:30
220 Deschutes
Abhinav Bhatele
Lawrence Livermore National Laboratory
  • Al Malony

High-performance networks are a critical component of clusters and supercomputers that enable fast communication between compute nodes. On many platforms, the performance of parallel codes is increasingly communication-bound due to a disproportionate increase in the compute capacity per node but only modest increases in network bandwidths. Hence, it is extremely important to optimize communication on the network. On most architectures, communication performance may be degraded due to network congestion arising from message flows of one or multiple jobs sharing the same network resources. For the past several years, I have been studying network congestion on high-performance networks and developing different strategies to mitigate it. In this talk, I will present studies on analyzing network congestion on two different network topologies, a dragonfly, and a five-dimensional torus network, using analytical modeling, visualization, and machine learning.


Abhinav Bhatele is a computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory. His research interests include performance optimizations through analysis and visualization, task mapping and load balancing, network design and simulation, parallel runtimes and interoperation, and HPC data analytics. Abhinav received a B.Tech. degree in Computer Science and Engineering from I.I.T. Kanpur, India in May 2005 and M.S. and Ph.D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 2007 and 2010 respectively. Abhinav was a recipient of the ACM/IEEE-CS George Michael Memorial HPC Fellowship in 2009 and the IEEE TCSC Young Achievers in Scalable Computing award in 2014. He has received best paper awards at Euro-Par 2009, IPDPS 2013 and IPDPS 2016.