Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services
Published in KDD 2025 (CCF-A), 2025
Weiyan Wang, Yilun Jin, Yiming Zhang, Victor Junqiu Wei, Han Tian, Li Chen, Jinbao Xue, Yangyu Tao, Di Wang, Kai Chen
KDD 2025 (CCF-A)
Recommended citation: Weiyan Wang, Yilun Jin, Yiming Zhang, Victor Junqiu Wei, Han Tian, Li Chen, Jinbao Xue, Yangyu Tao, Di Wang, Kai Chen. "Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services." KDD 2025.
