When AMD launched their brand new Zen microarchitecture merely two years ago, they attacked the consumer desktop space with fervor, handily decimating the existing offerings by competitor Intel. It essentially doubled the number of cores offered to the mainstream desktop sector, while simultaneously dramatically improving their own core’s IPC to within striking distance of Intel. AMD’s real killer design though, was that this desktop CPU was actually a server CPU cut down to desktop size. When AMD’s EPYC Naples platform launched shortly thereafter, we saw just how much of an impact Zen would be to the datacenter; offering up to 32 cores at a price significantly lower than Intel’s Xeons. However, Naples’ penetration into the datacenter was severely encumbered by OEM support, taking nearly a year after launch for the major OEMs such as HPE and Dell to jump on the EPYC movement and have Naples servers available for purchase, leaving that first year to Tyan and SuperMicro.
We’ve seen a lot of leaked and speculated information regarding AMD’s Zen2 cores, including our own AdoredTV’s chiplets theory, building up a lot of anticipation for EPYC Rome, and AMD did not disappoint. One again, AMD redefines the top end of server CPU performance, offering up to 64 cores with an IPC that actually exceeds Intel Xeon’s, and does it at half the power due to the new 7nm TSMC node and the efficiency of the design. If that wasn’t enough, AMD also doubled their L2 and L3 caches, now giving 256MB of cache where Xeons only have 38.5MB on their top-of-stack Platinum 8280M to accompany its 28 cores.
The EPYC Rome Lineup
The EPYC lineup has been leaked for a little while now. We covered it in our Zen2 Chiplet Quality article. However, here it is with official numbers, which surprisingly are almost exactly what we saw in the leak, except about 6% less expensive than we expected, with an intriguing minor downshift in caches and core count in a couple SKUs.
|CPU Model #||1Ku Pricing|
|Cores||Threads||Base Freq||Max Boost||Default TDP||L3 Cache|
|EPYC 7742||$6950||64||128||2.25 GHz||3.40 GHz||225 W||256 MB|
|EPYC 7702||$6450||64||128||2.00 GHz||3.35 GHz||200 W||256 MB|
|EPYC 7702P||$4425||64||128||2.00 GHz||3.35 GHz||200 W||256 MB|
|EPYC 7642||$4775||48||96||2.30 GHz||3.30 GHz||225 W||256 MB|
|EPYC 7552||$4025||48||96||2.20 GHz||3.30 GHz||200 W||192 MB|
|EPYC 7542||$3400||32||64||2.90 GHz||3.40 GHz||225 W||128 MB|
|EPYC 7502||$2600||32||64||2.50 GHz||3.35 GHz||180 W||128 MB|
|EPYC 7502P||$2300||32||64||2.50 GHz||3.35 GHz||180 W||128 MB|
|EPYC 7452||$2025||32||64||2.35 GHz||3.35 GHz||155 W||128 MB|
|EPYC 7402||$1783||24||48||2.80 GHz||3.35 GHz||180 W||128 MB|
|EPYC 7402P||$1250||24||48||2.80 GHz||3.35 GHz||180 W||128 MB|
|EPYC 7352||$1350||24||48||2.30 GHz||3.20 GHz||155 W||128 MB|
|EPYC 7302||$978||16||32||3.00 GHz||3.30 GHz||155 W||128 MB|
|EPYC 7302P||$825||16||32||3.00 GHz||3.30 GHz||155 W||128 MB|
|EPYC 7282||$650||16||32||2.80 GHz||3.20 GHz||120 W||64 MB|
|EPYC 7272||$625||12||24||2.90 GHz||3.20 GHz||120 W||64 MB|
|EPYC 7262||$575||8||16||3.20 GHz||3.40 GHz||155 W||128 MB|
|EPYC 7252||$475||8||16||3.10 GHz||3.20 GHz||120 W||64 MB|
|EPYC 7232P||$450||8||16||3.10 GHz||3.20 GHz||120 W||32 MB|
The 7002-series processors naming convention is straight-foward. The “2” on the end denotes Generation, the “7” denotes a server platform as opposed to a “3” for their Embedded line. The middle two numbers are the model. The “P” appended on the end of some CPUs designates them as single-socket processors.
AMD EPYC 7002 Series Architecture Overview
Without going too deep into the architecture, let’s look at the specifications surrounding the new 7002-series EPYC processors. The full 8-CCD EPYC has 32 billion transistors. In contrast, Intel’s Xeon Platinum 8280 only has 8 billion transistors. EPYC is available in 1-socket and 2-socket designs, can have up to 4TB of RAM per socket which can run at up to 3200MHz in 8-channels, and comes packed with 128 PCIe 4.0 lanes. TDP ranges between 120W to 225W, closely matching Intel’s offerings, allowing a straight replacement of existing Xeon systems, providing double the cores and thus double the performance, while maintaining the same power usage in the rack. The doubling of CPU cache gives a massive 576MB of combined L1, L2, and L3 cache in a 2-socket system.
Day 1 Partners
Perhaps the single most important difference between the launch of Rome and Naples is the support in the partner ecosystem. Rome can be the most performant platform on the market, but if it’s not sold by the big OEMs that are regularly purchased from in SMB on up to Enterprise, it doesn’t do AMD a bit of good. That’s why it’s so significant that this time HPE, SuperMicro, Lenovo, Gigabyte, and Tyan all have systems ready to purchase on launch. DellEMC isn’t in that list because their PowerEdge and other offers are slated for “this fall,” rather than at launch. Also, speaking with my sales representative at Lenovo about the SR655, he stated he didn’t see the 48-cores as an option just yet, so we may have a limited subset of the full lineup for immediate availability, at least at Lenovo.
AMD has also worked with several partners across the ecosystem to deliver a mature product at launch, with the appropriate performance patches in place. Remember how long it took for NUMA modes to be properly recognized in Naples, or thread adjacency issues with the CCXs newly introduced with Zen? The breadth of development around the new EPYC Rome processors makes it considerably less likely to see industry trying to catch up like we saw with Naples and first generation Zen.
The Vision of EPYC on 7nm
Another large leap was Zen2’s jump to 7nm. Rome is the first 7nm server chip and AMD had several options with how they were going to harness the benefits of 7nm. However, Mark Papermaster summed it up quite well during his presentation at the launch event:
“A server is power constrained. And so what you want is the most computing that you can deliver at the power that you’ve provisioned for that rack. And so it was obvious to us: take advantage of the density that 7nm provided to double the cores in roughly the same power envelope which we had delivered in the previous EPYC generation… It’s right smack at the heart of what server customers need: more performance per rack in the datacenter.”Mark Papermaster, AMD EPYC Rome Launch Event, August 7, 2019
AMD followed through on this ideal by doubling the cores from Naples to Rome, while keeping the power usage the same. They also managed to increase the IPC of the cores themselves, no doubt helped in part by the mammoth doubling of cache. Of course, what server runs just single-threaded workloads? It’s certainly more important to see how able the CPU is to feed all of its cores, and what that IPC is when the processor is fully loaded. EPYC Rome manages to claim a 23% Mean IPC uplift over Naples in a fully-loaded benchmark.
AMD did a lot more than just a 7nm process shrink to the Zen2 core to make such huge gains in IPC. Most of the benefit is coming from doubling the whole floating point path and working with instruction and data caches to keep the cores well-fed. The TAGE branch predictor is also brand-new, providing better accuracy. The doubling of the L3 cache is helping hide the latency of the chiplet design of Rome by bringing data closer to the execution units.
One of the problems that Naples faced was NUMA domains. Fortunately, server workloads are usually designed to take NUMA into account, as servers have been multi-socketed for decades, so the concept is not new. However, it does place limitations on workloads, especially if they have to straddle more than one NUMA domain. A EPYC Rome 2-socket server is once again back to 2 NUMA domains, where Intel is actually regressing to 2 NUMA domains per socket (by gluing two Cascade Lake Xeons together to make their Cooper Lake CPUs) just to not be left in the dust by AMD’s sheer performance per socket.
EPYC Rome, just like its desktop counterpart Ryzen 3000, features PCIe 4.0, doubling the bandwidth to critical server components such as storage controllers, local attached storage (particularly NVME), and networking like with the latest Infiniband and 100Gbps cards. Increasing RAM speeds to 3200MHz enables the 204GB/s DRAM bandwidth shown in the slide above. Memory is also decoupled from the Infinity Fabric to help with latency and performance.
First on the list of security features is, of course, Meltdown and Spectre. Most of the Spectre vulnerabilities since were not applicable to AMD hardware. They were able to provide “robust” mitigations when Spectre first came out, not losing nearly as much performance as Intel’s product stack in doing so. Mark Papermaster even mentions the new SWAPGS vulnerability that we covered here, pointing out that they were not vulnerable at all to two of the components, and the third is fixed by an OS-level patch that Microsoft had already issued. AMD learned from Spectre and hardened Zen2 against speculative execution.
Of course, there’s staples of security that EPYC has offered since Naples. Virtualization Encryption (SEV) makes it so VMs can be isolated in encrypted segments of RAM and unable to read from other segments (just due to being encrypted in a different domain, even if an exploit was discovered that would allow it). Secure Memory Encryption (SME) is the core of SEV. It allows individual memory pages to be encrypted with its own key. SEV gives AMD-V the ability to allow individual VMs to have their own keys for SME to encrypt their own VM in RAM.
At the launch event, VMware was demoing SEV2 implemented in VMware live on the floor, so we should be seeing production code “soon.”
AMD in High-Performance Computing (HPC)
Cray, the big name in Supercomputers, was a guest on stage during the launch. Their new fabric for Rome is called “Slingshot.” They worked on flexibility, congestion management, scaling, and real-time speeds. This is wrapped up in their new Shasta supercomputers. Since they introduced EPYC to the Cray lineup, they’re approaching $1 billion in EPYC new bookings, demonstrating the momentum EPYC is having in the space.
Frontier, the new supercomputer for Oak Ridge National Laboratory and the Department of Energy, was discussed as well. It will have more than 1.5 quintillion operations per second. That’s faster than the current top 100 supercomputers combined.
Cray announced a new supercomputer for the US Air Force weather agency. They’ll be installing a new Shasta system with EPYC Rome CPUs.
Not having a new EPYC Rome processor to benchmark ourselves, we’re reliant on reading other trusted 3rd party reviews to see the actual performance of this latest EPYC lineup, and the results do not disappoint. There’s benchmarks (specifically a Linux kernel compile) where a dual-socket EYPC 7742 system performs so well, the reviewer felt compelled to include a quad-socket Xeon platforms result just so the rest of the dual-socket systems weren’t left so far behind. The 7742 consistently topped the charts, offering so much performance that it was doubling the score of the Xeon 8280M frequently. They certainly back up the performance numbers that AMD is showing on their slides.
EPYC Total Cost of Ownership
A big business concern is datacenter costs. Adding computing power has been a delicate balance of rack power budget, cooling, floorspace, and systems cost. However, EPYC Rome upends that equation by improving power consumption with 7nm and providing double the cores in the same rackspace and power footprint. Datacenters looking to replace aging equipment now have the opportunity to get huge compute gains, reduce their datacenter rack power usage, lessen their cooling costs, reduce their datacenter footprint, or some combination of all of these. The actual cost of the system is usually immaterial next to these other concerns, but with EPYC Rome-based systems being less expensive than an Intel-based alternative, the incentives keep mounting.
Another thing to consider is software licensing costs. Windows, since Server 2016, has shifted to a per-core licensing cost, making the reduction in sockets that EPYC Rome provides seemingly less-important. However, the ability to have 128 cores, and 256 threads in a single 2-socket system makes a Windows Server Datacenter license much more cost-effective, allowing significantly more instances of Windows Server on a single host system, better saturating the usage of that “unlimited SOE” licensing. This higher saturation reduces the total number of servers, or from another perspective, reduces socket count in an environment, providing strong savings in the more-common licensing model: per-socket. VMware is the most notable piece of software that drives countless datacenters, and its licensing and ongoing support for everything from the base hypervisor to any of its add-on packages is all per-socket. The software used to backup these systems is usually also licensed per socket or system. Slashing OpEx in half, and reducing CapEx licensing costs by half on new acquisitions is tantalizing to say the least.
The Future is EPYC
Of course, the question is: can AMD keep releasing performance CPUs or will Zen turn into an Opteron? Mark Papermaster dispels any notion of Zen being a short-lived leader by stating:
“We can’t let up; we won’t let up. And I couldn’t be more pleased to share that, as I committed, we always have been working on our next designs while we’re doing our current design. We’re well along, in fact we’ve completed, the design phase of Zen3, so Zen3 is right on track. And we don’t stop there. We already have our engineers in the works on Zen4.”Mark Papermaster, AMD EPYC Rome Launch Event, August 7, 2019
AMD has done their best to demonstrate to the industry, particularly server partners, that they have a clear roadmap and are able to consistently deliver on that roadmap. As we see Zen3 and Zen4 come out, and get a glimpse of what comes after, I think we shall see a certainly EPYC future indeed.