Join gaming leaders, alongside GamesBeat and Facebook Gaming, for their 2nd Annual GamesBeat & Facebook Gaming Summit | GamesBeat: Into the Metaverse 2 this upcoming January 25-27, 2022. Learn more about the event.
Following Meta’s (formerly Facebook) October announcement that it’s pushing to stake its claim on the metaverse, the company today announced that it has developed the AI Research SuperCluster (RSC), which it claims is among the fastest AI supercomputers running today. Once it is fully built, Meta says it will be the fastest operating supercomputer — the company is aiming to complete it by the middle of this year.
CEO Mark Zuckerberg noted that the experiences the company is building for the metaverse require enormous compute power — reaching into quintillions of operations per second. The RSC will enable new AI models to learn from trillions of examples, understand hundreds of languages, and more.
Data storage company Pure Storage and chip-maker Nvidia are part of the supercluster that Facebook has built. Particularly, Nvidia has been a key player supporting the metaverse, with its omniverse product billed as “metaverse for engineers.”
After full deployment, Meta’s RSC will be the largest customer installation of Nvidia DGX A100 systems, said Nvidia in its press release today.
The 2nd Annual GamesBeat and Facebook Gaming Summit and GamesBeat: Into the Metaverse 2
Rob Lee, CTO at Pure Storage, told VentureBeat via email that the RSC is significant to other companies outside Meta because the technologies (such as AI and AR/VR) powering the metaverse are more broadly applicable and in-demand in industries across the board.
According to Lee, technical decision makers are always looking to learn from bleeding-edge practitioners, and the RSC provides great validation of the core components that are powering the world’s largest AI supercomputer.
“Meta’s world-class team saw the value of pairing the performance, density and simplicity of Pure Storage products to power Nvidia GPUs created for this groundbreaking work pushing the boundaries of performance and scale,” said Lee. He added that enterprises of all sizes will be able to benefit from Meta’s work, expertise, and learnings in advancing how they pursue their data, analytics, and AI strategies.
Scale is becoming a big deal
In a blog released today, Meta claims that AI supercomputing is needed at scale. According to Meta, realizing the benefits of self-supervised learning and transformer-based models requires various domains — whether vision, speech, language, or for critical applications like identifying harmful content.
AI at Meta’s scale will require massively powerful computing solutions capable of instantly analyzing ever-increasing amounts of data. Meta’s RSC is a breakthrough in supercomputing that will lead to new technologies and customer experiences enabled by AI, said Lee.
“Scale is important here in multiple ways,” said Lee. He noted that firstly, Meta processes a tremendous amount of information on a continual basis, and so there’s a certain amount of scale in data processing performance and capacity that requires.
“Secondly, AI projects depend on large volumes of data — with more varied and complete data sets providing better results. Thirdly, all of this infrastructure has to be managed at the end of the day, and so space and power efficiency and simplicity of management at scale is critical as well. Each of these elements is equally important, whether in a more traditional enterprise project or operating at Meta’s scale,” Lee said.
Tackling the security and privacy issues that come with supercomputing
Over the past few years, Meta has received several backlashes on its privacy and data policies, with the Federal Trade Commission (FTC) announcing it was investigating substantial concerns on Facebook’s privacy practices in 2018. Meta wants to tackle security and privacy issues from the get-go, stating that the company safeguards data in RSC by designing RSC from the ground up with privacy and security in mind.
Meta claims this will enable its researchers to safely train models using encrypted user-generated data that is not decrypted until right before training.
“For example, RSC is isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers. To meet our privacy and security requirements, the entire data path from our storage systems to the GPUs is end-to-end encrypted and has the necessary tools and processes to verify that these requirements are met at all times.” said the company blog.
Meta explains that data must go through a privacy review process to confirm it has been correctly anonymized before it is then imported into the RSC. The company also claims that the data is also encrypted before it can be used to train AI models, and decryption keys are deleted regularly to ensure old data is no longer accessible.
To build this supercomputer, Nvidia provided the compute layer — including the Nvidia DGX A100 systems as its compute nodes. The GPUs communicate via an Nvidia Quantum 200 Gbps InfiniBand two-level Clos fabric. Lee noted that contributions from Penguin Computing hardware and software are “the glue” that unite Penguin, Nvidia, and Pure Storage. Together, these three partners were crucial to providing Meta with a massive supercomputing solution.