Architecture-Tailored Parallelization for Accessible Large Model Era

Talk

Xupeng Miao

Talk Series:

Visitors

Time:

04.01.2024 13:00 to 14:00

Location:

IRB 4105 or https://umd.zoom.us/j/95853135696?pwd=VVEwMVpxeElXeEw0ckVlSWNOMVhXdz09

URL:

https://talks.cs.umd.edu/talks/3817

In this talk, I will introduce my work on machine learning (ML) parallelization, a critical endeavor to bridge the significant gap between diverse ML programs and multitiered computing architectures. Specifically, I will explore ML parallelization at three distinct yet interconnected levels. First, I will show that by leveraging the unexplored space of model partitioning strategies, distributed ML training can be up to 20x faster than existing systems by improving communication efficiency. I will highlight some innovative distributed ML systems, such as HET for sparse embedding models and Galvatron for dense Transformer models, respectively. Second, I will discuss how to improve GPU utilization through ML parallelization. I will present SpecInfer, a system that reduces large language model (LLM) serving latency by 1.5-3.5x compared to existing systems by leveraging a novel tree-based speculative inference and verification mechanism. Third, I will demonstrate how ML parallelization popularizes LLMs by extending its boundaries throughout inter-cloud environments. I will describe SpotServe, the first LLM serving system on spot instances, handling preemptions with dynamic reparallelization, ensuring relatively low tail latency, and reducing monetary cost by 54%. Finally, I will conclude with a discussion on pushing my research forward to a holistic and unified infrastructure for democratizing ML.

Upcoming Events

Talk

04.29.2024 11:30 to 12:30

IRB 4107

PhD Proposal: Multi-Agent Autonomous Decision Making in Artificial Intelligence
Saptarashmi Bandyopadhyay

Talk

04.29.2024 15:00 to 16:00

IRB 5105

PhD Proposal: Scaling Policy Gradient Methods to Open-Ended Domains
Ryan Sullivan

Talk

04.30.2024 10:00 to 12:00

IRB 4105

AI Empowered Music Education
Snehesh Shrestha

Talk

04.30.2024 12:30 to 15:00

IRB 4107

Towards Trustworthy Models in Machine Learning
Xiaoyu Liu

Talk

05.01.2024 15:00 to 17:00

IRB IRB-4105

PhD Defense: Feedback for Vision
Michael Maynord

Talk

05.02.2024 12:30 to 14:00

IRB 4107

Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An

Event

05.03.2024 11:00 to 12:00

IRB-4105

Computer Science APT Meeting

Event

05.03.2024 12:00 to 13:30

IRB-4105

Computer Science FFL

Event

05.06.2024 12:00 to 13:00

IRB-2137

Computer Science Department Council Meeting

Talk

05.06.2024 14:00 to 15:00

IRB 4105

EXAMPLE AIDED DESIGN: A PATH TO AUTOMATING EXPRESSIVE VISUALIZATION DESIGN
Hannah Bako