Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.
Without inference, an artificial intelligence (AI) model is just math and does not actually execute or forecast much, if anything.
To date, AI inference engines have been largely tethered to specific hardware for which they are designed. That degree of hardware lock-in means that developers will need to build specific software for different hardware, and could well also slow the pace of industry innovation overall.
The challenge of managing inference hardware has not been lost on social media giant Meta (formerly Facebook). Meta uses a lot of different hardware across its infrastructure and has its fair share of challenges implementing inference solutions. To help solve that challenge, Meta has been working on a technology it calls AITemplate (AIT) which it defines as a unified inference system that initially will support both Nvidia TensorCore and AMD MatrixCore inference hardware. Meta announced yesterday that it is open sourcing AITemplate under an Apache 2.0 license.
“Our current version of AIT is focused on support for Nvidia and AMD GPUs, but the platform is scalable and could support Intel GPUs in future if demand was there,” Ajit Matthews, director of engineering at Meta, told VentureBeat. “Now that we have open-sourced AIT, we welcome any silicon providers interested to contribute to it.”
MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.
The need for GPU and inference engine abstraction
The idea of lock-in for AI hardware is not limited to just inference engines; it’s also a concern that others in the industry, including Intel, also have about GPUs for accelerated computing.
Intel is among the leading backers of the open-source SYCL specification, which seeks to help create a unified programming layer for GPUs. The Meta-led AIT effort is similar in concept, though different in what it enables. Matthews explained that SYCL is closer to the GPU programming level, while AITemplate is focusing on high-performance TensorCore/MatrixCore AI primitives.
“AIT is an alternative to TensorRT which is the Inference engine from Nvidia,” Matthews said. “Unlike TensorRT, it is an open-source solution which supports both Nvidia and AMD GPU backends.”
Matthews noted that AIT first characterizes the model architecture, and then works on fusing and optimizing layers and operations specific to that architecture.
It’s not about competition
AIT isn’t just about creating a common software layer for inference, it’s also about performance. In early tests conducted by Meta, it is already seeing performance improvements over non-AIT inference-powered models on both Nvidia and AMD GPUs.
“For AIT the goal is to bring flexible, open, more energy-efficient AI inference for GPU users,” Matthews said.
Meta isn’t just building AIT to serve the greater good, but to also meet its own AI needs. Matthews said that Meta’s workloads are evolving and in order to meet these changing needs, it needs solutions that are open and performant. He also noted that Meta tends to want the upper layers of its technology stacks to be hardware-agnostic. AIT does that today with AMD and Nvidia GPUs.
“We see opportunities with many of our current and future Inference workloads to benefit from AIT,” he said. “We think AIT has the potential for broad adoption as the most performant unified inference engine.”