At NVIDIA’s annual GPU Technology Conference (GTC) today, CEO Jensen Huang took to the stage in his usual passionate, leather-clad fashion. However, this GTC keynote was actually a live demo of sorts, delivered from the simulated constructs of the metaverse. In recent years, NVIDIA as a company has not only made great strides in AI (artificial intelligence) processing engines, but has also quickly emerged as quite literally the inventors and innovators of platforms for the creation of the metaverse. The metaverse, a tech term that is getting quickly worn out, is currently an almost mystical future destination where humankind will interact, collaborate and evolve together in experiences that exist beyond our physical world. AI and the metaverse seemingly go hand-in-hand, but NVIDIA is clearly all-in on both emerging markets and their enabling technologies. So, let’s unpack a few of NVIDIA’s top-line announcements today, as well as their impact on these burgeoning markets, though there’s much more to digest with NVIDIA’s many GTC announcements beyond this high-level synopsis, I assure you.
NVIDIA’s Powerful New Hopper GPU Architecture Takes A Cue From Its Namesake
First let’s start with NVIDIA’s bread and butter technology, and the GPU giant is definitely taking things up a notch again. Comprised of some 80 billion transistors and built on TSMC’s bleeding-edge 4N (roughly 4 nanometer) chip fab process, NVIDIA’s new H100 “Hopper” GPU (loosely, “Graphics Processing Unit” though they do much more these days with many more resources on board) is named after the computer scientist pioneer and US Navy Admiral, Grace Hopper, and is the successor to the company’s now 2 year-old Ampere GPU architecture. It’s also strapped with the latest memory and IO innovations with a PCI Express Gen 5 interface and up to 96GB of new HBM3 (memory) connected over a mile-wide 512-bit memory interface for gobs of memory bandwidth, up to 3TB/s.
The GPU itself is also massive, sporting 18,432 CUDA cores in its full implementation, which is over 2.5X more than the company’s previous generation Ampere A100 AI and data science GPU. A full GH100 Hopper GPU also sports 576 4th Generation Tensor cores (specialized cores for machine learning calculation) versus 512 3rd Gen Tensor cores in Ampere A100. Here’s a quick specs bullet list rundown from NVIDIA…
The full implementation of the GH100 GPU includes the following:
- 8 GPCs, 72 TPCs (9 TPCs/GPC), 2 SMs/TPC, 144 SMs per full GPU
- 128 FP32 CUDA Cores per SM, 18432 FP32 CUDA Cores per full GPU
- 4 Fourth-Generation Tensor Cores per SM, 576 per full GPU
- 6 HBM3 or HBM2e stacks, 12 512-bit Memory Controllers
- 60 MB L2 Cache
- Fourth-Generation NVLink and PCIe Gen 5
NVIDIA will also have SXM5 add-in card form factors for H100 as well, which will support the PCIe Gen 5 interface as previously mentioned and come equipped with a few less resources, with 16,869 CUDA cores and only 80GB of HBM3, for example. The new NVIDIA GPU will also support new DPX instructions that offer up to a 7X lift in performance over its previous-gen architecture for accelerating lots of different applications from autonomous machines to genomics DNA and protein classification and folding.
Of course, Hopper will also offer a powerful inference and training performance uplift, specifically optimized for the explosive growth in new transformer machine learning models that are capable of self-attention (knowing what part of training input is more important than others) like Megatron 530B, NVIDIA’s natural language model, which offers up to 30X higher throughput in real-time conversational AI chatbots, etc. The new GPU will also support NVIDIA’s second gen Multi-Instance GPU (MIG) technology that will allow up to 7 secure compute instances can run on a single GPU, with fully encrypted confidential computing support.
Needless to say, NVIDIA’s Hopper GH100 appears to have the compute resources, memory bandwidth and key optimizations to make it an absolute crusher accelerator for virtually any type of machine learning and HPC (High Performance Computing) workloads, from data center AI, to healthcare research and big data analytics. The new GPU will be shipping in the 3rd quarter of this year.
NVIDIA Grace CPU Superchip And NVLink C2C Complete A Powerful AI And HPC Platform
A high-performance GPU also needs to pair with a powerful host CPU, and last year at GTC 2021 NVIDIA announced Grace, a new 72-core CPU based on the Arm Neoverse (Arm v9) architecture. This year, NVIDIA is quite literally doubling-down on Grace and combining two of the chips in a single MCM (multi-chip module) package for 144 cores total. Currently, NVIDIA’s DGX AI supercomputers are shipping with X86 CPUs from Intel and AMD, but long-term NVIDIA aims to offer Grace-powered versions as well. The Grace CPU Superchip will support an LPDDR5x memory interface, and NVIDIA claims its new 144-core Grace CPU Superchip will deliver 1.5 higher performance compared to the X86 processors shipping in its DGX A100 (Ampere-based) systems now.
NVIDIA notes the chip will run all of its software stack, including NVIDIA RTX, NVIDIA HPC, NVIDIA AI and Omniverse, and run in standalone servers as well as GPU-accelerated servers with up to 8 Hopper GPUs. However, how these many-core CPUs are linked together is a new announcement in and of itself for NVIDIA, known as NVLink C2C (chip-to-chip).
NVLink is NVIDIA’s high speed serial interconnect that historically connects GPU boards together on a server blade or inside a rack. Now in its 4th generation, NVIDIA is announcing NVLink has gone chip-to-chip as well, and claims its 25x more power efficient and 90x more area efficient than PCIe Gen 5. The interface will be capable of supporting 900Gb/sec of bi-directional bandwidth and is slated to allow not only CPUs and GPUs to talk but also DPU (Data Processing Units), NICs (Network Interface) and even external custom chips to be integrated into custom chiplet solutions. If this sounds familiar to the recently announced UCIe interconnect standard (backed by Intel, AMD, Arm, and others) it definitely is, and NVIDIA is opening the technology to customers to implement it for semi-custom silicon level integrations with its various technologies. Incidentally, NVIDIA also intends to support UCIe and notes custom integrations can use either UCIe or NVLink for higher bandwidth and better power efficiency.
As I mentioned NVLink C2C is also the interconnect used to connect the two CPU complexes in NVIDIA’s Grace CPU Superchip, but it will also connect a Grace CPU and Hopper GPU in a single MCM package as well, known as, you guessed it, A Grace Hopper CPU-GPU superchip combining the best of both worlds on a single module. NVIDIA notes both Grace CPU Superchip and Grace Hopper Superchip will arrive in the first half of 2023.
AI Powered Platforms Bring Tools And Infrastructure For Cloud AI And The Metaverse
In addition, new DGX H100 enterprise AI servers and SuperPod data center supercomputers based on Hopper (and currently X86 CPUs) will be arriving in market this year and NVIDIA has multiple software and services platforms in support of them as well, including AI Enterprise 2.0, which brings support for on-prem and public cloud GPU accelerated services to a myriad of platform ecosystems from VMWare and RedHat containers to AWS, Google Cloud and Microsoft Azure.
Another major announcement in the cloud for NVIDIA was the unveiling of Omniverse Cloud. You can think of NVIDIA Omniverse as literally its building blocks and computing platforms of the metaverse. Operating on the USD (Universal Scene Description) framework, an open standard developed by Pixar, Omniverse allows creators, developers, engineers and designers from virtually any discipline to collaborate in virtual worlds and with digital twins of projects and even themselves using their design tools of choice. So, whether your jam is rending in Blender, designing in SketchUp or FreeCAD, it can all be simulated Omniverse in collaboration with teams in different locations.
Omniverse Cloud, however, is a new service that NVIDIA will stream on the same infrastructure and in the same data centers as its GeForce NOW game streaming services. So, like GeForce NOW gaming, Omniverse Cloud is a software and infrastructure as a service play for NVIDIA, for teams that don’t have the NVIDIA RTX systems, or resources to stand-up their own infrastructure to setup their own Omniverse. In addition, like GeForce NOW, Omniverse Cloud will run on literally any device, like a laptop or even a tablet, since workload processing is offloaded to the cloud and streamed to the device. NVIDIA hasn’t noted when Omniverse Cloud will be officially be available or its cost structure, but to be sure this will afford the company a significant time-to-market edge versus future competitive platforms, as well as foster early adoption of its hardware, tools and other resources for building the AI-infused metaverse.
Let’s Not Forget Our Robot Overlords – Enter Jetson AGX Orin
Of course, NVIDIA is also vigorously developing technologies for autonomous machines, from self-driving cars and robotaxis to all forms of robotics. And what enables autonomous machines to work their magic is a series of sensors and cameras for machine vision, backed up by a powerful AI processor that can process what its eyes (the cameras and sensors) are pulling in and then infer what to do with all that information so it can efficiently and safely navigate. Further, training and learning need to take place for continuous improvements in efficiency and safety. To that end, NVIDIA has announced a new powerful entry in its AI-infused robotics platform, known as Jetson AGX Orin.
Jetson AGX Orin is the follow-on to NVIDIA’s Jetson Xavier NX and the company claims it’s the most powerful AI-accelerated robotics solution on the market currently with over 8X the processing power of Jetson Xavier NX. Jetson AGX Orin combines a 12-core Arm A78 CPU complex, along with 2,048 NVIDIA Ampere GPU cores, 64 Tensor cores (think GeForce RTX 3050 class GPU) and 64GB of LPDDR 5 memory in its top-end configuration. With a 32GB memory-equipped dev kit priced at $1999, developers can work with Jetson AGX Orin with full backwards compatibility to previous generation platforms and support for Ubuntu 20.04 LTS. With pre-trained AI models and support for NVIDIA’s various frameworks in robotics, sensor analytics, computer vision and natural language processing, NVIDIA’s Jetson Jetpack SDK is a complete platform enablement solution for cutting-edge robotics engineering.
Once in production, customers can choose from various NVIDIA Jetson Orin modules, priced as low as $399, to $1599 for the top end module with 64GB of RAM that’s capable of pushing 275 TOPS (Tera Operations Per Second) of performance.
“We are extending the powerful Microsoft Azure platform to the intelligent edge. Combining Azure’s advanced capabilities with performance and software development tools such as NVIDIA Jetson AGX Orin helps give developers a seamless experience to easily build, deploy and operate production-ready AI applications,” notes Roanne Sones, corporate vice president, Microsoft Azure Edge + Platforms. NVIDIA notes Jetson AGX Orin is available for order now.
In summary, there’s just far too much to unpack and uncover at NVIDIA GTC this week to feature in just one article, so be sure to head over to the GTC event site and catch Jensen’s keynote — registration is free.
I personally can’t think of another major chip player in the market that is doing so much and executing so well to advance machine learning AI, high performance computing and autonomous machines. NVIDIA CEO Jensen Huang and his team continue to deliver an innovation pipeline of technologies and platforms like no other in these fast-paced and relentlessly evolving markets.
Credit: Source link