Aurora Supercomputer
Aurora Supercomputer

Aurora Supercomputer Delayed, Department of Energy Considering “Different Options”

It shouldn’t be too surprising that the Aurora supercomputer, built by Intel, has been officially delayed. Intel’s Ponte Vecchio GPUs, originally fabbed at least partially on Intel’s 7nm process, went from their first planned 7nm product to not being that at all (Intel’s first 7nm product is now planned to be a client CPU). Ponte Vecchio was of course delayed because Intel’s 7nm node was delayed, and it was inevitable that this would also cause a delay for Aurora, because Aurora was supposed to be built with these GPUs in 2021; since these GPUs are not coming until late 2021 at best, Aurora’s future looks uncertain.

However, you may recall our exclusive from August which anticipated Aurora’s issues; our sources had informed us that there was a high probability that the Department of Energy would switch out the Intel Xe GPUs for either Nvidia Ampere or AMD Radeon Instinct GPUs. A Department of Energy official, Paul Dabbar, made a statement on the status of Aurora’s delivery:

I can’t go through exactly all the different options that we’re looking at for the Argonne machine… But the details are still being identified about exactly what we’re going to go through with Intel and their microelectronics. But we have confidence that machine also will be delivered and will be delivered right behind Oak Ridge.

Paul Dabbar

So, the implication seems to be that a processor swap is at least being considered for Aurora. Currently, there are two options for the supercomputer: Nvidia’s A100 and AMD’s MI100.

Nvidia’s A100 is an absolute monster for datacenter and AI work and was responsible for transforming Nvidia from a company dependent on gaming revenue to a company dependent on datacenter revenue. If they want the best for Aurora, it’s A100 at the moment. The succeeding Hopper architecture is not expected until 2022, so it’s highly unlikely Aurora could consider that GPU.

AMD’s MI100 on the other hand is AMD’s top end datacenter GPU, but it’s not faster than the A100. In our exclusive, we detailed its technical specifications and came to the conclusion it was only particularly competitive in single precision workloads. There is also a rumored MI200, but this is most likely just two MI100s on the same board or even package. By late 2021, it’s possible AMD might be able to provide the successor to the CDNA based MI100, but it’s unlikely.

It is likely at this point that Aurora will ditch Ponte Vecchio due to Intel’s mismanagement of its own 7nm node. Even if Ponte Vecchio debuts as a TSMC or Samsung fabbed GPU, it will probably not be finished in time for Aurora. We’ll have to see whether or not Intel can keep Ponte Vecchio in Aurora, but it’s not looking good for them at the moment.

Liked it? Take a second to support Matthew Connatser on Patreon!
Become a patron at Patreon!