[2023.9] I'm joining Bytedance as a research scientist.

About me

I'm a Research Scientist at ByteDance, working on monitoring and troubleshooting large-scale RDMA and GPU datacenter networks, especially for LLM. My research interests are widely on distributed systems, network systems, cloud systems, and datacenter management. I've been working on the specific areas like:

My research has been awarded with an ACM SIGMETRICS Kenneth C. Sevcik Outstanding Student Paper Award. I was a postdoc researcher at Princeton University, working with Prof. Ravi Netravali. Before that, I received my Ph.D. from Emory University in 2021 (working with Prof. Ymir Vigfusson), Master from Georgia Tech in 2017(was in Ph.D. program, worked with Prof. Karsten Schwan memorial page ), and Bachelor from Tsinghua University in 2015. I transferred to Emory in 2018 as a post-qualified Ph.D. student.

Publication

Thesis:

Measurement and Analysis Methods of Performance Problems in Distributed Systems
Emory University
  Thesis

Papers:

LatenSeer: Causal Modeling of End-to-End Latency Distributions by Harnessing Distributed Tracing

Yazhuo Zhang, Rebecca Isaacs, Yao Yue, Juncheng Yang, Lei Zhang, Ymir Vigfusson
In ACM SoCC 2023
  Paper   The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems

Lei Zhang, Vaastav Anand, Zhiqiang Xie, Ymir Vigfusson, Jonathan Mace
In USENIX NSDI 2023
  Paper     Slides     Code     Benchmark Code

When is the Cache Warm? Manufacturing a Rule of Thumb

Lei Zhang, Juncheng Yang, Anna Blasiak, Mike McCall, Ymir Vigfusson
In USENIX HotCloud 2020
  Paper     Slides     Video

Optimal Data Placement for Heterogeneous Cache, Memory, and Storage Systems

Lei Zhang, Reza Karimi, Irfan Ahmad, Ymir Vigfusson
In ACM SIGMETRICS 2020
  Paper     Video
  Kenneth C. Sevcik Outstanding Student Paper Award

Deceptive Secret Sharing

Lei Zhang, Douglas Blough
In IEEE International Conference on Dependable Systems and Networks (DSN) 2018
  Paper

Systematic Data Placement Optimization in Multi-Cloud Storage for Complex Requirements

Maomeng Su, Lei Zhang, Yongwei Wu, Kang Chen, Keqin Li
In IEEE Transactions on Computers 2016, 65(6): 1964-1977.
  Paper

In preparation:

Towards Bandwidth-adaptive Live Volumetric Video Streaming

Automatic Instrumentation for Fine-grained Observability in Distributed Systems

Experience

Postdoc, Princeton University
2022-2023

Research Assistant, Emory University
2018-2021

Teaching Assistant, Emory University
Fall 2020

CS 377: Database Systems

Ph.D. Intern, Facebook Inc.
Summer 2018

Video cache infra team

Research Assistant, Georgia Tech
2015-2018

Teaching Assistant, Georgia Tech
Fall 2016

CS 3210: Design Operating Systems

Service & Award

Invited Program Committee
ACM SIGMETRICS'23

Program Committee
ACM SoCC'22, SoCC'23

Best Student Paper Award
ACM SIGMETRICS’20

Bronze medal
24th, 25th China Mathematical Olympiad