Alright guys how’s it going?
As we reported yesterday in both video and article formats, we do not believe the rumors of 50% additional floating-point performance in AMD’s Zen 3 architecture to be true. Simply put, and had we thought about it harder before, we should have known that such large uplifts in performance have come due to a large advancement in process node.
Floating point operations are often even more power-hungry than integer operations. Run Cinebench on your CPU and it’ll likely consume just as much or even more power than running Handbrake, or any program which fully utilizes a CPUs integer resources. This is also why AVX instructions force most Intel CPUs to run at lower clock speeds.
AMD already made a huge leap in floating-point with Zen 2 over Zen. Performance was almost quadrupled due to a doubling of cores (from 32 max to 64) and FPU width (from 128-bit to 256-bit). Over at Anandtech, Mark Papermaster was keen to point out that AMD’s floating-point instructions would not require a drop in CPU clock speeds.
IC: With the FP units now capable of doing 256-bit on their own, is there a frequency drop when 256-bit code is run, similar to when Intel runs AVX2?
MP: No, we don’t anticipate any frequency decrease. We leveraged 7nm. One of the things that 7nm enables us is scale in terms of cores and FP execution. It is a true doubling because we didn’t only double the pipeline with, but we also doubled the load-store and the data pipe into it.
That AMD were able to increase floating point to such a level without also requiring a drop in clocks was already a surprise. For them to do it again on a much lesser node progression would be fairly miraculous.
Anyway, thanks to Twitter yesterday, an obscure article from June was brought to our attention. In it we learned that Purdue university would receive a $10 million subsidy for their “Anvil” supercomputer but the real news was hidden deeper in the text.
Anvil will be built in partnership with Dell and AMD and will consist of 1,000 nodes with two 64-core AMD Epyc “Milan” processors each, and will deliver over 1 billion CPU core hours to XSEDE each year, with a peak performance of 5.3 petaflops.
This one paragraph told us a lot about what to expect from Milan. Supercomputers are rated in terms of petaflops which is simply a measure of how many FP64 (double precision) operations it is capable of per second – in the case of Anvil that will be 5.3 petaflops. Zen 2, with it’s dual 256-bit FPU, is capable of 16 flops (2×8) per clock cycle and the article tells us that there will be 1000 nodes with two 64-core Milan chips.
Feel free to look up how to do the calculation, or just use this calculator instead to enter all those numbers as written, like so.
There are other ways to reach a theoretical 5.3 petaflops, but none of them are even remotely realistic. We may see another large jump in floating-point performance with Genoa, but Milan will be closer to 10% than 50%, and the majority of that looks likely to come from the increase in clock speeds to 2.6GHz rather than any large architectural advance in the FPU.
We don’t know how much power each of these Milan chips will use yet but it’s unlikely to be far over the normal. Milan could have 10% or higher base clocks compared to Rome at similar power levels, which bodes very well for Zen 3 in every capacity, especially with the rumored IPC gains on top.
There’s not much else to say here except that, even though we missed this information it’s still good to see our analysis was on point. Probably better news is our leaks on Zen 3 look even more solid as they never made any claims of 50% improved floating-point performance.