Today, Nvidia is finally hosting GTC, which was canceled in March and rescheduled for today as an online only event. Nvidia initially teased GTC with the tagline “get amped”, which alluded to the name for Nvidia’s upcoming architecture: Ampere. Thanks to Supermicro’s Product Manager of GPU Server Systems Carlos Weissenberg (who I really enjoyed talking to), we were able to get some details on Ampere and the upcoming HGX GPU board (which was shown off on May 12 by Nvidia’s CEO), which will be utilized in upcoming Supermicro systems.
We’re not entirely sure what exactly Nvidia is going to unveil today; when I spoke to Carlos, he couldn’t provide me with a ton of information since Nvidia has been a little secretive about Ampere. What he could tell me for sure is that Ampere focuses on performance, inter GPU and inter server coherency, and machine learning/AI. I wasn’t given specific performance figures in respect to CUDA cores, clock speeds, etc, but Ampere does support PCIe 4.0, NVLink, NVSwitch, and GPU Direct RDMA. Supermicro’s systems will, of course, be utilizing Nvidia’s next generation HGX boards, of which there are two models, Redstone and Delta. The Redstone model has 4 A100 GPUs and the Delta model has 8. These A100 GPUs are, presumably, based on the highest end Ampere dies that exist, which should be codenamed GA100. Though Supermicro advertises CUDA and Tensor cores, there is no mention of ray tracing cores whatsoever, which I found curious.
So, the one feature I got the most info on: GPU coherency. It might not be the most important thing about Ampere, but it is definitely something Nvidia cares about deeply. One of the most crucial things in making several GPUs a coherent mass is high bandwidth, which is why PCIe 4.0 support is a requirement. NVLink and NVSwitch will be leveraged to tie together up to 16 GPUs within a single server. For server to server coherency, Nvidia is using GPU Direct RDMA, a Mellanox technology; for those wondering why Nvidia bought a networking company, here’s the answer. This technology actually totally bypasses the CPU and allows GPUs to access memory directly to one another. This is important especially for machine learning which depends on high bandwidth and memory speeds. Unfortunately for Nvidia, AMD and Intel will not permit GPU to CPU coherency on their platforms, which is unsurprising since Nvidia is competing with the two. I think this is actually a really interesting technology, and while I’m sure many would have preferred to know just the core count, one has to recognize the value of being able to transmit so much data between GPUs and servers (up to 600 gigabytes, not gigabits, per second).
Like I said before, Nvidia has been tight lipped about Ampere, and so Carlos and Supermicro weren’t able answer many of my questions. They could not tell me if A100 or Ampere in general had ray tracing, what node A100 was on, how many cores the A100 had, and the differences between Ampere in this context vs. Ampere in another context (like gaming or desktop workstation). Ampere is shrouded in so much mystery, which makes it that much more interesting.
Carlos was however able to tell me some more general things about Ampere. Firstly, Ampere is focused on performance, evident by the 5 petaflops of FP16 deep learning compute that the 4U Redstone system delivers. He wasn’t able to elaborate too much on this point, but my impression is that Nvidia is attempting to utilize all of the advantages of the new node(s) by increasing clock speeds and creating large dies that are, of course, more dense than the 16/12nm TSMC Pascal and Turing GPUs. This is in contrast to AMD’s approach to 7nm which has not been exactly aggressive on performance, as AMD’s largest 7nm GPU (which is also the largest 7nm GPU in the world) is only 330 mm2, a size that Nvidia is absolutely going to exceed. There will probably be optimizations to the cores themselves (especially ray tracing cores, should they be present on Ampere), but when it comes to a new node, GPUs benefit the most from increased density and also increased clock speeds. Since the A100 GPU is so high end and so focused on the data center, I imagine Nvidia won’t be shy about making another large GPU, perhaps even as large as Turing or Volta. Of course, this is my own personal speculation, so don’t take it as fact (yet).
The second thing about Ampere is specialization. Again, Carlos was not able to elaborate much on this point, but he was told by Nvidia that there is a degree of specialization they are able to leverage this generation. This could mean they can tweak Ampere’s design to fit more use cases (such as data center, workstation, gaming, etc) or perhaps that Ampere is not Nvidia’s only arch this time around. It is clear at the moment that Ampere is replacing Volta but whether or not it replaces Turing is unknown, especially since I couldn’t get a confirmation on whether or not Ampere has ray tracing. I shared some concerns with Carlos over Nvidia designing one arch that is supposed to cover all of their markets, but the existence of these specializations is good and indicates that Nvidia is aware that one arch for everything is difficult to pull off (see AMD’s Vega). Whether it’s as simple as removing or adding ray tracing cores or as complex as making two architectures or making Ampere’s design more modular, I do not know, but it is certainly something to keep an eye out for.
Returning to Supermicro, their HGX based servers come in 2U and 4U sizes. These systems are data center, high performance computing, and machine learning focused; in the latter case, Supermicro says these systems provide “unprecedented speed”, an indication that Nvidia has increased the number of Tensor cores each GPU provides. These systems support both Xeon and Epyc CPUs, and I was told specifically that these systems go best with Epyc. It will certainly be interesting to see how Ampere compares against the last generation Volta and Turing architectures, as well as AMD’s upcoming CDNA architecture, which is totally compute focused. Hopefully Nvidia reveals those kinds of details today.