Nvidia’s DGX Cloud on OCI now available for generative AI training

Head over to our on-demand library to view sessions from VB Transform 2023. Register Here

Nvidia announced today the wide accessibility of its cloud-based AI supercomputing service, DGX Cloud. This service will grant users access to thousands of virtual Nvidia GPUs on Oracle Cloud Infrastructure (OCI), along with infrastructure in the U.S. and U.K.

DGX Cloud was announced during Nvidia’s GTC conference in March. It promised to provide enterprises with the infrastructure and software needed for training advanced models in generative AI and other fields utilizing AI.

Nvidia said that the purpose-built infrastructure is designed to meet gen AI’s demands for massive AI supercomputing for training large, complex models like language models.

>>Follow VentureBeat’s ongoing generative AI coverage<<


VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.


Register Now

“Similar to how many businesses have deployed DGX SuperPODs on-premises, DGX Cloud leverages best-of-breed computing architecture, with large clusters of dedicated DGX Cloud instances interconnected over an ultra-high bandwidth, low latency Nvidia network fabric,” Tony Paikeday, senior director, DGX Platforms at Nvidia, told VentureBeat. 

Paikeday said that DGX Cloud simplifies the management of complex infrastructure, providing a user-friendly “serverless AI” experience. This allows developers to concentrate on running experiments, building prototypes and achieving viable models faster without the burden of infrastructure concerns.

“Organizations that needed to develop generative AI models before the advent of DGX Cloud would have only had on-premises data center infrastructure as a viable option to tackle these large-scale workloads,” Paikeday told VentureBeat. “With DGX Cloud, now any organization can remotely access their own AI supercomputer for training large complex LLM and other generative AI models from the convenience of their browser, without having to operate a supercomputing data center.”

>>Don’t miss our special issue: The Future of the data center: Handling greater and greater demands.<<

Nvidia claims that the offering lets generative AI developers distribute hefty workloads across multiple compute nodes in parallel, leading to training speedups of two to three times compared to traditional cloud computing.

The company also asserts that DGX Cloud enables businesses to establish their own “AI center of excellence,” supporting large developer teams concurrently working on numerous AI projects. These projects can benefit from a pool of supercomputing capacity that automatically caters to AI workloads as needed.

Easing enterprise generative AI workloads through DGX Cloud

According to McKinsey, generative AI could contribute over $4 trillion annually to the global economy by transforming proprietary business knowledge into next-generation AI applications. 

Generative AI’s exponential growth has compelled leading companies across various industries to adopt AI as a business imperative, propelling the demand for accelerated computing infrastructure. Nvidia said it has optimized the architecture of DGX Cloud to meet these growing computational demands.

Nvidia’s Paikeday said developers often face challenges in data preparation, building initial prototypes and efficiently using GPU infrastructure. DGX Cloud, powered by Nvidia Base Command Platform and Nvidia AI Enterprise, aims to address these issues.

“Through Nvidia Base Command Platform and Nvidia AI Enterprise, DGX Cloud lets developers get to production-ready models sooner and with less effort expended, thanks to accelerated data science libraries, optimized AI frameworks, a suite of pre-training AI models, and workflow management software to speed model creation,” Paikeday told VentureBeat. 

Biotechnology firm Amgen is using DGX Cloud to expedite drug discovery. Nvidia said the company employs DGX Cloud in combination with Nvidia BioNeMo large language model (LLM) software and Nvidia AI Enterprise software, including Nvidia RAPIDS data science acceleration libraries.

“With Nvidia DGX Cloud and Nvidia BioNeMo, our researchers can focus on deeper biology instead of having to deal with AI infrastructure and set up ML engineering,” said Peter Grandsard, executive director of research, biologics therapeutic discovery, Center for Research Acceleration by Digital Innovation at Amgen, in a written statement.

A healthy case study

Amgen claims it can now rapidly analyze trillions of antibody sequences through DGX Cloud, enabling the swift development of synthetic proteins. The company reported that DGX Cloud’s computing and multi-node capabilities have helped it achieve three times faster training of protein LLMs with BioNeMo and up to 100 times faster post-training analysis with Nvidia RAPIDS compared to alternative platforms.

Nvidia will offer DGX Cloud instances on a monthly rental basis. Each instance will feature eight powerful Nvidia 80GB Tensor Core GPUs, delivering 640GB of GPU memory per node.

The system uses a high-performance, low-latency fabric that enables workload scaling across interconnected clusters, effectively turning multiple instances into a unified massive GPU. Additionally, DGX Cloud is equipped with high-performance storage, providing a comprehensive solution.

The offering will also include Nvidia AI Enterprise, a software layer featuring over 100 end-to-end AI frameworks and pretrained models. The software aims to facilitate accelerated data science pipelines and expedite the development and deployment of production AI.

“Not only does DGX Cloud provide large computational resources, but it also enables data scientists to be more productive and efficiently utilize their resources,” said Paikeday. “They can get started immediately, launch several jobs concurrently with great visibility, and run multiple generative AI programs in parallel, with support from Nvidia’s AI experts who help optimize the customer’s code and workloads.”

Originally appeared on: TheSpuzz