UMD Team Wins DOE Award to Advance AI Using Supercomputers

The award provides significant run-time on supercomputers managed by the U.S. government.
Descriptive image for UMD Team Wins DOE Award to Advance AI Using Supercomputers

University of Maryland researchers have won a competitive award from the U.S. Department of Energy (DOE) that will provide them access to some of the world’s most powerful computational platforms.

The award—from the DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program—will enable UMD experts in high-performance computing and machine learning to scale distributed AI training and develop new AI vision and language models used in popular applications like DALL-E and ChatGPT.

The researchers will use federal facilities to develop and test their novel methods, with the INCITE award granting them significant run-time on several government supercomputers. This includes 600,000 node-hours on the Frontier supercomputer located at the Oak Ridge National Laboratory in Tennessee; 100,000 node-hours on the Polaris supercomputer located at the Argonne National Laboratory in Illinois; and 50,000 node-hours on the recently unveiled Aurora supercomputer that is also located in Illinois.

Training AI models that do not fit in the memory of a single graphics processing unit (GPU) requires large-scale distributed training on GPU-based supercomputers, said Abhinav Bhatele, an associate professor of computer science who is the principal investigator of the project.

With large language models (LLMs) and vision-based diffusion models now used for journalism, understanding protein structures, writing software code, and much more, training the largest of these AI models—which can reach hundreds of billions of parameters of information—can often overwhelm the computing capabilities of many academic labs doing work in these areas.

The UMD team seeks to “democratize” this process by using innovative parallel computing algorithms and other methods that enable platforms powered by NVIDIA and Advanced Micro Devices’ GPUs to efficiently train extremely large models, explained Bhatele. The researchers will also open source their work, Bhatele added, allowing input and feedback from a diverse group of users.

Bhatele is joined on the project by Tom Goldstein, a professor of computer science and director of the University of Maryland Center for Machine Learning, and Harshitha Menon, a computer scientist at the Lawrence Livermore National Laboratory in California.

Both Bhatele and Goldstein have joint appointments in the University of Maryland Institute for Advanced Computer Studies.

The UMD project was one of only 75 proposals nationwide chosen to receive a 2024 INCITE award. The program, established in 2003, incentivizes researchers from academia, government laboratories and industry to accelerate scientific discoveries and technological innovations involving large-scale, computationally intensive projects that address “grand challenges” in science and engineering.

“We are grateful to receive the INCITE award, both to advance the science and to allow our graduate students access to computational power not readily available at most universities,” said Bhatele.

In identifying the scope of work involved, the UMD researchers said that the “scaling” of large neural networks—allowing them to be processed on large-scale computing platforms—is not a trivial matter. It requires parallelizing and optimizing different computational and communication motifs such as dense and sparse tensor computations, irregular communication patterns, load imbalance issues, and fast filesystem access.

They plan on using a framework called AxoNN developed in Bhatele’s Parallel Software and Systems Group to analyze and optimize the performance and portability of training large models. This includes fine-tuning the refined models and greatly improving the system’s “inference”—which refers to the process wherein a trained model applies its learned knowledge to new, unseen data.

They also plan to explore efficient alternatives for transformer models for language modeling. Many of the modern LLMs currently use transformer models to perform a variety of natural language processing (NLP) tasks like identifying context—and thus meaning—by tracking relationships in sequential data like the words in this sentence.

“We intend to train variants of modern language model architectures that are directly aimed at usability constraints in smaller academic laboratories,” said Goldstein. “We want to focus on variants with smaller memory footprints and adaptive compute capabilities at deployment—traits that will enable more robust outcomes in the fields of machine learning and NLP.”

The researchers will also work on fine-tuning trained models for downstream tasks that have important real-world applications, Goldstein said.

One example is improving performance “explainability,” he said, which involves the concept of being able to understand and utilize the output from a machine learning model. This could be useful in instances where a healthcare model is predicting whether a patient is suffering from a particular disease or not.

Ultimately, the UMD team hopes their research will help scientists, physicians and policymakers better utilize the power of AI systems, whether that power is borne from one of the industry giants like Microsoft or NVIDIA, or from a three- or four-person laboratory at a college or university.

“We hope our work will lead to new discoveries from many of the smaller academic labs that don’t have 24/7 access to supercomputers—this includes our own labs on the University of Maryland campus that are filled to the brim with outstanding talent,” said Bhatele.

—Story by UMIACS communications group

The Department welcomes comments, suggestions and corrections.  Send email to editor [-at-] cs [dot] umd [dot] edu.