Good Enough to be True

When headlines read “too good to be true” for Ryzen 3000 leaks last year, it was a telling sign that the tech world is not only accustomed to but expects incremental updates rather than massive overhauls. It’s not hard to see why. Intel has provided increasingly incremental improvements since Sandy Bridge released in 2011, to both IPC and clock speed (though for non-overclockers this might be important since stock speeds have been increasing steadily). AMD’s Bulldozer architecture, also released in 2011, was infamous for being initially slower than the previous generation of Phenom II processors in some regards and it too only saw incremental improvements over the years.

Ryzen 1000 was definitely not an incremental improvement, however. The Zen architecture boasts a 50% IPC improvement over Bulldozer and offers absolutely 8 real cores and 16 threads (with AMD’s Simultaneous Multi-Threading being arguably better than Intel’s Hyperthreading). Still, the rumors of 5 GHz, 16 core Ryzen CPUs available for no more than $600, even $500 has been met with much skepticism.

Perhaps it’s really down to the fact that while Ryzen was much faster than Bulldozer, it couldn’t beat Intel on single thread and struggled against Intel’s high clock speeds, leading CPUs like the 8700k tying with the 1800X in multi-threaded workloads despite having 2 less cores. Also, AMD could only improve Ryzen slightly with the 2000 series so that has also probably tempered expectations in regards to performance.

However, we should expect Ryzen 3000, Threadripper 3000, and the new Epyc series to deliver a boost much like Ryzen 1000 did. People seem to think the new generation of AMD CPUs is just about the 7nm node from TSMC, which it isn’t. To be fair, AMD is really focusing on 7nm, so it’s understandable where the confusion is coming from, but it’s not just about the node. Chiplets and the underlying Zen 2 architecture are crucial as well. AMD is hitting Intel with everything it has this year, and it will hurt a ton.

The 7nm node from TSMC

Originally, AMD was supposed to source CPUs from both TSMC and GlobalFoundries, AMD’s long-time foundry partner. However, last August GlobalFoundries announced it had scrapped its 7nm leaving AMD only with TSMC for CPU manufacturing. This is likely a good thing for AMD since GlobalFoundries has a history of underdelivering and TSMC has a great reputation for its manufacturing technology; for example, Nvidia uses TSMC’s 16nm and 12nm nodes for its Pascal and Turing GPUs, and the 7nm node itself has already seen use in Apple’s most recent processors.

According to AMD, the 7nm node allows for 50% lower power at the same clock speeds or 25% higher clock speeds at the same power. AMD can, of course, do a blend of both and take ~13% higher clock speeds and ~25% power consumption at the same time, for instance. Both of these things will do wonders for Epyc and Ryzen/Threadripper respectively. Server CPUs depend on efficient operation and desktop CPUs depend on high performance to carry out the needs of the end user. We could even see the Ryzen 3000 series achieve 5 GHz, even if that does require some overclocking.

The 7nm node also makes chips smaller and can help yields for larger processors, but initially 7nm will be pretty expensive to produce on as all new nodes are. The largest 7nm processor in existence right now is the Vega 20 GPU used in the Vega VII and Instinct MI60 and it’s not very big at 331 mm^2, which could be considered mid-range, and it costs at its least expensive $700; however, it does have 16 GB of HBM2 which contributes quite a bit to the high price. Nevertheless, don’t expect anything larger for a little while. Defect density for new nodes is very high initially and that affects large chips the worst.

The Zen 2 Architecture

Perhaps more important than the new node is the Zen 2 architecture. The name of this architecture betrays what it really is: an overhaul of the original Zen architecture. It’s not what Kaby Lake is to Sky Lake or what Zen+ is to Zen. Zen 2 brings extremely important improvements to the table, including much larger cache, far better FPUs (floating point units) with doubled widths in many parts, Infinity Fabric 2 which more than doubles in speed, PCIe 4.0, and security features against Spectre. In a fully FLOP (or floating-point operation) dependent workload, you could see double the performance per watt compared to the original Zen, and that’s without the 7nm node. However, it’s not often that a workload only uses FLOPs so don’t expect this to be the norm.

AMD hasn’t given us typical or specific figures for IPC, but we can probably expect a pretty decent uptick for many sorts of applications. Much of what AMD has said in regards to Zen 2 has been in conjunction with the 7nm node, meaning when AMD said they could double core counts without using more power compared to Zen 1, they were referring to both architectural improvements and the new node. Still, it seems like 7nm and the Zen 2 architecture will be working together and not unequally.

Chiplets

And of course, we have chiplets. For AMD this is the most important technology arriving with Zen 2 for the long term. Chiplets have come at a very convenient time for AMD with the advent of the 7nm node. New nodes are progressively getting more and more expensive to develop and manufacture on, and chiplets help cut out CPU features that don’t need to be on the latest node which helps increase yields by shrinking die sizes. All of the core CPU features such as the actual cores and cache will remain on the 7nm chiplets while the IO features and perhaps more cache will be moved to another die manufactured by GlobalFoundries on the old 14nm process.

While Zen 2 theoretically allows AMD to double core counts, chiplets makes that possibility a reality. AMD could not manufacture a 64 core CPU like Rome without the usage of chiplets. Furthermore, AMD won’t have to worry about bandwidth constraints thanks to the new Infinity Fabric 2 which touts over double the speed. Chiplets usher in a new age of high-density CPUs and is a massive victory for AMD over Intel where high density is needed.

Additionally, chiplets also allow AMD to be extremely flexible in the market. The biggest problem with monolithic chips is that it requires a lot designs for every use case for desktop, laptop, and server, and you can only really reuse monolithic chips by cutting them down, which doesn’t help if you need more of the higher core count CPUs. If there’s a sudden increase in demand for server CPUs, AMD can shift chiplets to Epyc without having to worry about changing processor orders from TSMC since all the chiplets are identical.

Rome will be a beast and Matisse will at least be powerful

Let’s put all of this together. When it comes to performance and power, the new Rome server CPUs could deliver up to 4 times the performance per watt thanks to 7nm and improved floating-point performance, though it will likely sit somewhere between 2 and 4 times since not all workloads use FLOPs exclusively. Also, considering the fact that AMD is delivering 64 cores per socket, compared to Naples the new Rome CPU will be four times as fast (double the cores, double the FLOPS) in a best-case scenario.

Now of course this “best case scenario” would have to utilize Rome’s FLOPs entirely and manage to not choke on other bottlenecks that may be present in Rome such as latency between the chiplets, which is a natural consequence of using chiplets. But most workloads use FLOPs at least to some degree so we could be seeing well above 2 times the performance per socket compared to Naples and no lower than 2 times as long as clock speeds remain about the same.

Talking about desktop chips is a little harder. Much of the information we have about Zen 2 comes from the Next Horizon presentation which focused on Zen 2 in relation specifically to Epyc, and that’s not exactly applicable to Ryzen 3000. When AMD said, “half the energy per operation”, what that really means is Rome will be using half the energy at the same core count compared to Naples. That is considering the 7nm node as well. What clock speeds does “half the energy” even allow for? It must be higher than Naples since the Zen 2 architecture brings some efficiency improvements and the 7nm node already halves power at the same clock speed, but we don’t know how high exactly.

It’s already complicated to determine what “half the energy” really means for Rome, let alone Matisse (the Ryzen 3000 series CPU). The vast majority of the power consumed on the Matisse chip will be by the 7nm chiplets, but if that IO die even needs 10 watts to perform well most of the time, it could bump up a CPU from the lower 65-watt TDP to the higher 95-watt TDP or cause AMD to scale clock speed back in order to fit it better within the 65-watt class.

Just a quick word on TDPs, by the way. AMD defines TDP in relation to heat and not power, and while more power creates more heat, it’s not 1:1. Ryzen processors generally have a 20-30% higher power draw than the TDP implies, so when you see a 2700X consuming 140 watts, that’s why. That is normal operation and you should generally expect that to happen especially with the higher end CPUs.

Too good to be true or not?

Whether you think it’s true or not, let’s focus on the only Ryzen 3000 leak that I think is of significance to the entire series as a whole: Jim’s (or Adored’s) spec sheet leak. Although I am a writer here, I do not and cannot know what exactly his source has told him, if this source is truly trustworthy, how much of this leak is informed by Jim’s own opinions and predictions, and so on. There has been much controversy around this alleged leak, but it’s a good jumping off point and it’s as much speculation as it is a leak. I find Jim to be right more often than wrong, so I feel comfortable discussing his opinion. I do disagree with some of his theory, but I find this spec sheet to be the most reliable information.

According to Jim, Ryzen 3000 will feature 16 and 12 core CPUs (these core counts are now largely confirmed), increased clock speeds to the mid to high 4 GHz range, and increased core counts without raising TDP with the exception of the 16 core CPUs. Many publications took issue with the last part here because raising core count and clock speed without raising power draw is pretty hard to do, even when you can halve the power draw on other parts. Of course, the biggest problem we have right now is that we don’t know what “half the energy” entails, if it’s only some kind of average figure, if it’s including a bump in clock speeds, and so forth.

Let’s try and figure this out, and please note this is very rough and shaky math since we know little. Our best indication of Ryzen’s performance and power consumption is the demo AMD showed at CES. Anandtech estimates the Ryzen CPU consumed about 75 watts of power and the 9900K tested alongside consumed 125. If we further compare this to the 2700, which seems to consume around 85 watts in Cinebench, the engineering sample is ~30% faster and consumes ~13% less power, making it ~50% more efficient, which is simply incredible. Even at the same power, it would still be 30% more efficient thanks to increased performance.

At this stage we might want to think about any IPC improvements that would uplift AMD’s score here, since the higher the IPC, the lower the clock speed for this CPU. I think 15% would be a good estimate but you could assume 10% if you want to be more conservative, however I think 15% is more likely since AMD has made significant improvements to architecture, but it won’t really affect our conclusions here that much. This is, of course, a wild guess, but we have to guess at IPC since we have little other reference. AMD did give a figure of 29% in the footnotes of an article, but that was for a very specific workload and AMD later said it wasn’t indicative of a general performance improvement. In my opinion, 29% is far too high, especially for Cinebench.

Let’s apply this assumed IPC improvement to the 2700, which typically scores about 1550 in Cinebench. 1550 multiplied by 1.15 is 1783. Note that we are short of the roughly 2050 points the Ryzen 3000 CPU scored in the demo, because the 2700’s all core clock speed is only around 3.5 GHz. 2050 divided by 1783 is about 1.15, or 15%, so we multiply 3.5 by 1.15 and get 4, our clock speed for the sample running at all cores. If you thought 10% higher IPC was more realistic, you’ll get about 4.2 GHz instead.

So, I think it’s reasonable to assume this CPU was running within the 4-4.2 GHz range. That lines up very well with Jim’s speculated Ryzen 5 3600, which has a clock range of 3.6 to 4.4 GHz and a TDP of 65 watts; the 2700, another CPU rated at 65 watts, consumes more than the 75 watts the engineering sample did. That also makes sense based on what AMD says about the 7nm node, which allows 25% higher clock speeds at the same power. It’d be hard for the engineering sample AMD showed us to be 33% faster just based on clock speed alone while also consuming less power, so a mix of IPC and clock speeds is most likely.

12 Cores at 5 GHz?

So, Jim also speculated on two 12 core parts: the 3700 at 3.8 to 4.6 GHz and the 3700X at 4.2 to 5 GHz, the former having a 95-watt TDP and the latter getting the slightly higher 105. It seems incredible but remember, AMD is packing a ton of new things into Ryzen 3000, and it’s not for nothing. Please keep in mind that these higher clock speeds are only happening on a few cores on more single threaded workloads. This spec sheet is not saying we’re getting all 12 cores at 5 GHz at 105 watts or even the more realistic 130.

AMD hasn’t even acknowledged the existence of 12 core Ryzen 3000 CPUs so we obviously have no demoes for them, but we can extrapolate data from the 8 core they tested. At 75 watts, each core is consuming about 9.5 watts, so if we multiply that by 12 (for 12 cores) we get about 115 watts, which is similar to the 1800X despite the 1800X having 4 less cores and a much lower clock speed. 115 or even 125 watts makes good sense for a CPU with a 95-watt TDP, such as the speculated 3700, so these 12 core CPUs seem reasonable.

Now, it’s not quite this simple of course. Take Threadripper for example, where the 16 core 2950X consumes about 180 watts, almost triple that of the 2700, yet only provides double the cores at roughly the same clock speed on all cores. Since Threadripper uses dual processors, it’s actually very similar to our 12 core Ryzen 3000 CPUs and perhaps the 8 core models as well if they use dual 4 core dies. The power (in)efficiency of Threadripper poses a problem for this spec sheet.

However, AMD has optimized the multi die nature of products like Threadripper; the processors are much closer together this time and support fewer features such as quad channel memory, and Infinity Fabric, which does consume quite a bit of power, has been updated, so perhaps Ryzen 3000 won’t suffer from the problems Threadripper suffers from today. Also consider that while AMD showed us a naked CPU with just one processor in it at CES, the actual CPU in the system could have been using two processors, and that would mean power consumption is just fine. Just something to consider.

Overall, though I think this napkin math definitely proves that this spec sheet is not impossible or even improbable. While it may fail the common-sense test, it just took a little critical thinking to see whether or not it was at least possible. It has to be said, though, that this is definitely one of the better case scenarios.

The IO Die Problem and Latency

As mentioned before, not all of Ryzen is getting onto 7nm, only the CPU chiplets. The IO die is still on 14nm, and as such it is not nearly as efficient as it would be under 7nm. This poses a large problem for the veracity of the spec sheet. Now, we don’t know what features the IO has and what features the CPU chiplets have, but if the typical workload causes the IO die to consume 10 watts for instance, that would be bad for Ryzen 3000 and it would cast serious doubt on the CPUs I discussed.

On the other hand, 14nm yields are really good so should AMD choose to, they could really undervolt and underclock these IO dies to be very efficient, even if not 7nm efficient. AMD would have much more leeway if the IO dies just consumed a watt or two.

One might wonder why this might be a problem since AMD’s CES demo would have included the power consumption of the IO die, but Cinebench is an application that doesn’t really access system memory or has cores communicate with each other, so it’s likely the IO die wasn’t using much if any power there. Theoretically, something like Prime95’s blend or large FFTs test would use IO much more heavily and thus may make that 75-watt power usage jump to 80 or higher.

At my request, Jim tested his 2700X at both stock and at 4050 MHz (the typical all core boost, locked for scientific reasons) and 1.25 volts in Cinebench R15, Prime 95 blend, Prime 95 large FFTs, and Prime 95 small FFTs. Cinebench and small FFTs use little IO while blend uses the most and FFTs uses some. I wanted to also test with a locked clock speed since I needed to know whether or not differences in power by test were down to clocks and voltage. The purpose of these benchmarks was to determine the difference between workloads that used mostly compute and almost no IO and workloads that used more IO than Cinebench but not as much compute.

We measured power using HWInfo’s package statistic and this is what we got.

2700X power usage (stock)2700X power usage (4050 MHz)
Cinebench R15125 watts107 watts
Prime 95 Blend99 watts84 watts
Prime 95 Large FFTs110 watts92 watts
Prime 95 Small FFTs140 watts120 watts

As we can see, it’s actually the compute intensive workloads, Cinebench and small FFTs, that are using the most power. Although IO usage should have risen significantly in the blend test, overall power usage dropped because of the drop in compute usage. Of course, it was expected that compute would use far more power than IO, though it might not have been expected that Cinebench would be the second most power using test on this chart, since it uses almost no IO whatsoever. These results are a good sign that we may not have to consider the IO die very much in what Ryzen 3000 CPU consumes in power.

But that’s only half of the problem with this IO die. What if the latency is really poor due to the IO die? It could affect gaming performance in particular, which would be disastrous for AMD’s gaming oriented processors. Well, we can’t really answer that. AMD has only showed us scenarios where there is practically no core to core communication, so if the latency is poor then we won’t know until AMD finally decides to tell us.

There are signs that latency will be fine though; Mark Papermaster, SVP at AMD, has said Ryzen 3000 will have “outstanding gaming performance” and that the IO die will be a sort of central hub. Gaming performance is usually heavily dependent on good latency, so Papermaster would probably not be saying this unless the latency was at least good enough to not be offset by the IPC and clock speed gains. Also, in theory using the IO die as a hub would increase latency, not reduce it, so there’s surely something in the IO die that helps reduce latency, such as additional cache shared between the two CPU dies or a logical controller or something else.

Still, if AMD had not used the IO die, power consumption and latency would be better. If Ryzen 3000 and the Zen 2 architecture as a whole fails at anything, it will be die to die communications. It will most certainly be better than Threadripper, but AMD needs to do something to at least put latency on par with Ryzen 2000, and we would hope it would actually get better and not just be on par with the last generation.

Conclusion

So, although much was covered in this article, it’s safe to say that there’s not enough information to make hard conclusions. But I think we have identified where Zen 2 may falter and succeed though. Zen 2 clearly has weaknesses in latency between dies, and AMD must know this since they have not demoed Zen 2 in a test that utilizes core to core communications heavily.

On the other hand, Zen 2 will surely have decently high clock speeds, decent IPC improvements for some workloads and large gains for others, and chiplets will definitely help yields and maximum clock speeds all across the board, from Epyc to Ryzen. All of these things might offset the penalties from bad latency if Ryzen 3000 suffers from that.

Jim’s spec sheet, which I’ve at least shown to be possible, isn’t the full story. These aren’t the things that will make Ryzen 3000 great, just things that could. If latency is any large issue, a 5 GHz clock speed won’t necessarily help. If Ryzen 3000 doesn’t have good enough latency, then Ryzen 3000 will struggle at workloads like gaming, and that’s what Ryzen 3000 will be billed as, as a gaming CPU.

I wish I could say at the end of this that Ryzen 3000 will be great with absolute confidence, but as I said, there’s just not enough information. Knowing that Jim’s proposed CPUs being within the realm of possibility is nice, though that’s not a confirmation that they actually exist and that they will be available at the price points Jim said they would. However, I believe we can expect Zen 2 to be very competitive and we can expect AMD to relentlessly find a way to patch its shortcomings should it have any.

P.S. – While I was writing this article (and this will reveal how long ago I started it), Jim was independently working on a video on a similar topic where he examined the possibility of Ryzen 3000 being worse than we have been expecting. You can check that out here.

Leave a Reply

https-adoredtv-com-1
More Stories
Nvidia CEO says buying a GPU without RTX capability is “crazy”
Do NOT follow this link or you will be banned from the site!