Opinion: We Should Stop Using Teraflops to Measure GPU Performance

As someone who’s been writing about computer components for half a decade, with a specific interest in graphics cards, I’ve grown sort of numb to the marketing I hear from these companies every day. Every manufacturer will tout metrics that are favorable to their products, and sometimes even make up new metrics in order to gain a marketing advantage compared to their competitors. This isn’t unique to graphics cards, or even computer components in general as anyone who’s ever bought a set of sheets knows how illusive the dreaded threadcount can be.

However, as egregious as the IOPS, Giga Rays and TDPs of the world can be, there’s perhaps none more egregious than FLOPS or floating-point operations per second. This unit, which is most often used for measuring performance of graphics cards is in my opinion, the absolute worst and we should stop using it for this purpose, immediately.

The TeraFLOP Trap

Let me be clear, the FLOP is useful for determining the peak compute potential for scientific workloads that rely heavily on floating-point arithmetic. The problem is, while computer graphics workloads do heavily utilize floating-point instructions, it’s not quite that simple. Video games are a complex beast, utilizing different types of instructions and other parts of the GPU which can make it very difficult, if not impossible for a GPU to max out its peak FP32 compute potential, which is what those TFLOPs numbers are measuring.

What’s worse, is that this figure is completely unreliable for comparing competing graphics cards as they’re going to be designed with very different approaches from the ground up. Heck, even trying to compare GPUs in the same family can be difficult with just TFLOPs alone as not all architectures will scale perfectly linearly.

Take for example, the legendary GTX 1080 Ti, its GPU, packed with 3584 shaders, is capable of up to 11.34 TFLOPs of performance. That sounds impressive, but then compared to the Radeon Vega 64’s 12.66 TFLOPs, you’d be forgiven for thinking that the Radeon card was the more powerful of the two gaming graphics cards. Despite the Radeon offering roughly 10% more peak compute performance, the GeForce card can be 30% faster in gaming at 1440p on average.

Looking at a more recent example, NVIDIA’s latest RTX 3080 and its 29.77 TFLOPs offers roughly 3x the peak compute potential as its predecessor the RTX 2080 with 10.07 TFLOPs, while only offering 67% more performance on average at 4K. If that’s not enough, comparing the RTX 3080 to the RTX 3090 and its 35.58 TFLOPs, you’d expect two cards based on the same silicon to be comparable using these metrics, right? except that, despite offering nearly 20% more compute potential, the RTX 3090 is only 10% faster at 4K.

The FLOP is dead, bury it.

So, as we’ve illustrated above, when comparing two GPUs, regardless of their manufacturer, architecture, etc, the TeraFLOP is a terribly unreliable metric. Perhaps it wasn’t always so, but with modern games and graphics cards, the FLOP has become an archaic holdover that does little but confuse consumers and is arguably no more useful than Bananas Per Rotation or Triangles per Nanosecond.

Rather than TFLOPs, we should instead focus on useful metrics such as frame rates, frame times, latency, etc, which are all admittedly more difficult to measure, but are much more accurate and useful ways of determining a graphics card’s performance. We don’t rely on FLOPs, or MIPs or anything similar for measuring CPU performance and we should stop expecting we can with GPUs, as in this regard they are no different.

Liked it? Take a second to support Donny Stanley on Patreon!
Become a patron at Patreon!