Ayar Labs, which specializes in something relatively new in processor-making – chip-to-chip optical connectivity that operates at the speed of light – revealed today that it is co-developing with Nvidia a new artificial intelligence (AI) processor infrastructure based on optical I/O technology to meet the heavy future demands of AI and high-performance computing workloads.
The collaboration with Nvidia will focus on integrating Ayar Labs’ IP to develop scale-out architectures enabled by high-bandwidth, low-latency and ultra-low-power optical-based interconnects for future Nvidia products. The two companies said they plan to accelerate the development and adoption of optical I/O technology to support the explosive growth of AI and machine learning (ML) applications and data volumes.
The need for speed-of-light connectivity
Nvidia is the leading manufacturer and provider of graphical processing units (83% market share), the most powerful processors in the IT business. However, while the GPUs are powerful, none of them use optical connectivity. The chip-to-chip optical connectivity process for GPUs literally uses the speed of light to move data within a chip and in interconnecting with others, a discrete advancement in processor manufacturing.
Last June, 7-year-old Ayar Labs successfully demonstrated the optical I/O process using its own hardware: the industry’s first terabit-per-second wavelength division multiplexing (WDM) optical link, a TeraPHY optical I/O chiplet (a tiny integrated circuit that contains a well-defined subset of functionality) and a SuperNova multiwavelength optical source. The demonstration showed a fully functional chiplet with 8 optical ports running error-free without forward error correction (FEC) for a total bandwidth of 1.024 Tbps and at less than 5 pJ/bit energy efficiency – speeds that were unheard of in the industry.
This was a major milestone in providing optical connectivity, Santa Clara, Calif.-based Ayar said at the time. The concept is to help server makers and data centers meet the ever-growing bandwidth needs of data-intensive applications, low power interconnects, and new heterogeneous and disaggregated system architectures.
Processes like this will be needed for the super-powerful applications being developed for new Web3 and metaverse use cases around blockchain security, AI/ML and other data-heavy interactions, the company said.
At least at the outset of their release into the market, the new optical chips will go to CPU and GPU makers (such as Intel, Nvidia, ARM and AMD) in addition to hyperscale companies – including the major cloud-service providers – because many of them are making their own chips, the company said.
Meeting future performance, power requirements with optical I/O
Optical I/O changes the performance and power trajectories of system designs by enabling compute, memory, and networking ASICs (application-specific integrated circuits) to communicate with increased bandwidth, at lower latency, over longer distances and at a fraction of the power of existing electrical I/O solutions, the company said. The technology is also foundational to enabling emerging heterogeneous compute systems, disaggregated/pooled designs and unified memory architectures that are critical to accelerating future data center innovation.
“Today’s state-of-the-art AI/ML training architectures are limited by current copper-based compute-to-compute interconnects to build scale-out systems for tomorrow’s requirements,” CEO Charles Wuischpard said in a media advisory. “Our work with Nvidia to develop next-generation solutions based on optical I/O provides the foundation for the next leap in AI capabilities to address the world’s most sophisticated problems.”
Ayar Labs’ solution will transform a variety of industries from cloud, AI and high-performance computing, telecommunications and aerospace, and “AI is tip of the spear with the greatest need,” Mark Wade, Ayar Labs cofounder, CTO and SVP of engineering, told VentureBeat.
How does optical I/O solve connectivity issues?
“The main constraint to achieving required performance for ever-growing AI/ML workloads [is] bottlenecks in moving the data over distances,” Wade said. “Optical interconnects fundamentally break the traditional bandwidth-distance tradeoff and unlock new system architectures.”
We are in a world of ever-increasing bandwidth demands with the explosion of computational data, the growing complexity of neural networks, and the emergence of new AI and graph workloads and workflows alongside traditional scientific simulations, Wade said.
“At the same time, the pace of processing capability improvements has slowed down considerably in recent years,” Wade said. “However, electrical SerDes, the most common form of electrical I/O, is hitting a wall and creating an additional bottleneck that could significantly decrease the performance capabilities of future systems.
“A paradigm shift is needed to move to optical I/O, a new solution that replaces traditional electrical I/O and enables chips to communicate with each other from millimeters to kilometers, to deliver orders of magnitude improvements in latency, bandwidth density and power consumption,” Wade said.
A new ‘million-X’ speedup for AI with optical interconnect?
“Over the past decade, Nvidia-accelerated computing has delivered a million-X speedup in AI,” Rob Ober, Nvidia chief platform architect for data center products, said in a media advisory. “The next million-X will require new, advanced technologies like optical I/O to support the bandwidth, power and scale requirements of future AI and ML workloads and system architectures.” As AI model sizes continue to grow, Ober said Nvidia believes that by 2023 models will have 100 trillion or more connections – a 600 times increase from 2021 – exceeding the technical capabilities of existing platforms. Traditional electrical-based copper interconnects will reach their bandwidth limits, driving lower application performance, higher latency and increased power consumption, Ober said.