AMD redefined the high-end desktop (HEDT) space when they first introduced their Threadripper series of CPUs. It brought us 16 cores with the 1950X in Gen1, then doubling that to 32 cores in the 2990WX for Gen2. Now with Gen3, they’ve redefined the HEDT once again, introducing three (yes, three) new Threadrippers: the 3960X, 3970X, and a previously unannounced, until today, 64 core 3990X. The 3960X and 3970X are available today and we have a Threadripper 3960X in-hand to test.
Let’s start with the specifications for the new Threadrippers:
|Threadripper 3960X||Threadripper 3970X||Threadripper 3990X|
|Cores/Threads||24C / 48T||32C / 64T||64C / 128T|
|Base/Boost||3.8GHz / 4.5GHz||3.7GHz / 4.5GHz||2.9GHz / 4.3GHz|
|Cache Qty||140MB (L2+L3)||144MB (L2+L3)||288MB (L2+L3)|
|CCD Configuration||4x 6-core CCDs||4x 8-core CCDs||8x 8-core CCDs|
|Transistor Counts||~3.9 billion per CCD|
~8.34 billion for IOD
|Die Size(s)||74mm2 per CCD|
416mm2 for IOD
|Lithography||7nm for CCDs (TSMC)|
12nm for IOD (GloFo)
|Socket Compatibility||sTRX4 (LGA 4094)||sTRX4 (LGA 4094)||??|
|Required Chipset||AMD TRX40|
(15W Peak TDP / 14nm GloFo)
|CPPC2 Fastest Cores||2x in CCD4||2x in CCD4||??|
February 7th, 2020
With the Threadripper 3000-series, we see the benefit of AMD’s chiplet design. The 3960X has the same base clock as the 3900X and just 100Mhz less on the boost clock. TDP is 30W higher than the Threadripper 2000-series, but brings a significant increase in base clock from the previous 3.0GHz to 3.8 and 3.7GHz. Also of note is “CPPC2 Fastest Cores” shows that AMD binned CCD4 for two “fastest cores” to improve performance by keeping low thread-count workloads on a single CCD.
Intel has recently announced their 10th Gen HEDT CPUs that may only have close to a direct comparison to the Threadripper 2000-series, with their coming-soon Core i9-10980XE, with a 3.0GHz base clock and only 18 cores. Intel has decided to slash prices of their HEDT CPUs, so this once-$2500 CPU is listed as suggested price of only $979, positioning it more against the 24-core 2970WX rather than against these new Threadrippers.
Memory configuration support is very important as well. Threadripper 3000-series is a quad-channel setup, and has an expected downscaling of memory clockspeed based on DIMM population.
|Memory Configuration||Maximum Official Speed Supported|
|4x8GB Single Rank (32GB)||DDR4-3200|
|8x8GB Single Rank (64GB)||DDR4-2933|
|4x16GB Dual Rank (64GB)||DDR4-3200|
|8x16GB Dual Rank (128GB)||DDR4-2667|
|4x32GB Dual Rank (128GB)||DDR4-3200|
|8x32GB Dual Rank (256GB)||DDR4-2667|
I’ve been running up against this problem on my 3900X, since I’ve been maxing out my motherboard with 4x dual rank 16GB DIMMs. The four channels in Threadripper will allow the same kit to reach much higher clockspeeds than on dual-channel mainstream CPUs, providing much faster RAM for higher densities. You’ll notice support for 32GB UDIMMs was added, which brings Threadripper’s max RAM supported to 256GB, which is a move Intel made in their 10th Gen HEDT CPUs as well. Another differentiating factor is select TRX40 motherboards have ECC support, which is a critical requirement for some workstation uses.
3rd Gen Threadripper is built on the same Zen2 cores and desktop Ryzen 3000 and utilizes the same packaging as EPYC Rome server CPUs, and is functionally more similar to a EPYC Rome with four memory channels instead of eight. We’ve already covered most of the architecture enhancements in our EPYC Rome coverage, including TSMC’s 7nm lithography, doubling of FPU, support for AVX2, and up to eight CCDs (chiplet dies) using the same chiplet packaging as EPYC Rome.
The biggest change from 2nd Gen Threadripper is the conversion to a single IO die (IOD) that gives uniform access to RAM and system IO to all cores. This gives consistent and deterministic performance across workloads. This contrasts to the very important shortfall of Threadripper 2990WX in particular, where two CCDs did not have an on-board memory controller and had to traverse the Infinity Fabric (IF) to access RAM. This IO die is a cut-down version of EPYC’s IOD, so has the same benefits as we see in EPYC Rome.
Another improvement on the new Threadrippers is a 27% reduction in Infinity Fabric power, which allows for more socket power to be used powering cores.
In the specifications section, we noted “CPPC2 Fastest Cores” as 2x in CCD4, which means CPPC2 will get the best performance on a rotating pair of cores in CCD4. This rotation helps ensure longevity and stability of the cores while extracting the best performance for lightly threaded workloads. We’ve seen this same kind of fast CCD/slow CCD combo in the 3900X and 3950X and is a solid strategy for offering us consumers performance, but also allowing AMD strategically allocate these highly-binned CCDs to optimize their entire product portfolio; this was the big benefit of chiplet architecture that Jim here at AdoredTV pointed out afterall.
TRX40 offers 72 PCIe 4.0 lanes which is effectively 4X the bandwidth that 2nd Gen Threadripper has. 4x USB 3.2 Gen 2 come straight from the processor, as well as 8 PCIe lanes that can be used for general PCIe lanes, NVMe, or groups of 4 SATA ports. There’s 48 lanes for general PCIe plus another 8 PCIe 4.0 lanes that are used as a downlink to the TRX40 chipset.
The TRX40 chipset provides 4 SATA 6Gbps ports, 8 USB 3.2 Gen 2 ports, and 4 USB 2.0 ports. There’s another 8 general PCIe 4.0 lanes that can be provided by the chipset as well and a further 8 PCIe lanes that can be split up similar to the lanes from the CPU to provide SATA, NVMe, or more PCIe lane connectivity.
All of this connectivity, running on PCIe 4.0, provides up to 133GB/s of concurrent device bandwidth, raising the bottleneck ceiling for IO, giving a huge boost to storage and connected device throughput. The socket pin-assignment was altered to help with scalability goals. All of this necessitated the new sTRX4 socket, which was met with some upset among owners of the X399 sTR4 platform, which now only lasted two generations, and while AMD can’t tell us more about the future roadmap to reassure us on the expected longevity of the socket (yes, I point-blank asked them this very question, and that was their response), they have at least shown their willingness to retain socket compatibility until it’s absolutely necessary to make a clean break. With DDR5 coming in the near future, and possibly PCIe 5.0 to the server world, we’ll see just how capable (and “scalable”) sTRX4 is soon enough. Changing motherboards with CPU upgrades has been par for the course for Intel-based workstations, so the potential to be able to skip that motherboard upgrade for a couple generations is still welcome.