About me
My research interests are broadly in distributed systems. I'm currently a Research Scientist at ByteDance's AI Networking team, working on Systems for ML, with a major focus on LLM reliability. My experiences are widely on distributed systems, cloud systems, and AI infrastructure.
- Reliability and observability: distributed tracing, PL for systems towards root cause analysis
- Distributed caching: heterogeneous memory management, CDN caching, performance quantification
- Systems for ML: reliability, observability, performance analysis, RDMA-based AI infra for LLM, Collective Communication Library
My research has been awarded with an ACM SIGMETRICS Kenneth C. Sevcik Outstanding Student Paper Award. I was a postdoc researcher at Princeton University, working with Prof. Ravi Netravali. Before that, I received my Ph.D. from Emory University in 2021, working with Prof. Ymir Vigfusson, Master from Georgia Tech in 2017(was in Ph.D. program, worked with Prof. Karsten Schwan memorial page ), and Bachelor from Tsinghua University in 2015. I transferred to Emory in 2018 as a post-qualified Ph.D. student.
Experience
Services
Awards
Talks
Publications
Emory University, 2021
  Thesis Toward Bandwidth-adaptive Fully-Immersive Volumetric Video Conferencing
Rajrup Ghosh, Christina Suyong Shin, Lei Zhang, Muyang Ye, Tao Jin, Harsha V. Madhyastha, Ravi Netravali, Antonio Ortega, Sanjay Rao, Anthony Rowe, Ramesh Govindan
In ACM CoNEXT 2025
Yangtao Deng*, Lei Zhang*, Qinlong Wang, Xiaoyun Zhi, Xinlei Zhang, Zhuo Jiang, Haohan Xu, Lei Wang, Zuquan Song, Gaohong Liu, Yang Bai,
Shuguang Wang, W. Xiao, Jianxi Ye, Minlan Yu, Hong Xu
In ACM SOSP 2025
  Paper
 
  Slides
Jingyuan Chen, Lei Zhang, Gongqi Huang, Ravi Netravali, Amit Levy
In USENIX OSDI 2025 (Poster)
Yangtao Deng, Xiang Shi, Zhuo Jiang, Xingjian Zhang, Lei Zhang, Zhang Zhang, Bo Li, Zuquan Song,
Hang Zhu, Gaohong Liu, Fuliang Li, Shuguang Wang, Haibin Lin, Jianxi Ye, Minlan Yu
In USENIX NSDI 2025
  Paper
 
  Slides
 
  Video
Yazhuo Zhang, Rebecca Isaacs, Yao Yue, Juncheng Yang, Lei Zhang, Ymir Vigfusson
In ACM SoCC 2023
  Paper
 
  Code
Lei Zhang, Zhiqiang Xie, Vaastav Anand, Ymir Vigfusson, Jonathan Mace
In USENIX NSDI 2023
  Paper
 
  Slides
 
  Code
 
  Benchmark Code
 
  Video
Lei Zhang, Juncheng Yang, Anna Blasiak, Mike McCall, Ymir Vigfusson
In USENIX HotCloud 2020
  Paper
 
  Slides
 
  Video
Lei Zhang, Reza Karimi, Irfan Ahmad, Ymir Vigfusson
In ACM SIGMETRICS 2020
  Paper
 
  Video
  Kenneth C. Sevcik Outstanding Student Paper Award
Lei Zhang, Douglas Blough
In IEEE International Conference on Dependable Systems and Networks (DSN) 2018
  Paper
Maomeng Su, Lei Zhang, Yongwei Wu, Kang Chen, Keqin Li
In IEEE Transactions on Computers 2016, 65(6): 1964-1977.
  Paper
Under Review:
A Lightweight Telemetry System with Service Tracing for Locating Network SlowdownsAutomatic Instrumentation for Fine-grained Observability in Distributed Systems
