Today, we’re taking a look at a very unique product which unfortunately will never be released – Centaur’s CHA SoC with 8 x86 CNS Cores. Many folks might not be familiar with Centaur. Here’s a quick background:
Centaur was founded in 1995, and created many x86 CPUs such as the WinChip processors in the 90s and many of VIA’s chips in the 2000s such as the VIA’s C3, C7, and Nano CPUs. They were one of the first companies to debut hardware encryption acceleration with the VIA C7 CPU, 4 years before Intel & AMD implemented AES-NI. Additionally, Centaur’s CPUs are the foundation for the designs of ZhaoXin‘s x86 CPUs.
Architecture Overview
The design team which created Centaur’s CNS CPU was comprised of only 100 persons from Austin, Texas. The CPU is manufactured on TSMC’s 16nm process and is 194mm2 in size. It has eight x86 CPU cores with quad-channel DDR4 memory, 44 PCI-e lanes, and limited AVX-512 support. Clock speeds on the pre-release CPUs vary from 2.0 to 2.5ghz. This CPU was initially planned for release in 2020.
If you look on other websites discussing Centaur’s CPU, you might see it referred to as “CHA” instead of “CNS“. That’s because “CHA” is the name of the SoC, whereas the x86 cores are “CNS”. The internal codename for this project was simply “NCORE“.
The overall IPC of the CNS Core is similar to Intel’s “Haswell” architecture, but it achieves this without a micro-op cache. The core-to-core latency is fairly good, though it is a bit worse than the P-core to P-core latency of Intel’s i5-12600k.
The “killer feature” of this CPU is the NCore, which is an AI co-processor with a peak performance of 20 TOPS for INT8. For those interested, Centaur answered a few questions about their CPU in an AMA on Reddit
There are also two publicly available videos where Centaur’s Chief AI Architect, Glenn Harry, gives overviews of the chips features.
The video below talks about the CHA SoC as a whole:
And this video specifically covers the NCORE AI co-processor
Chips N Cheese
Chips N Cheese has also covered the CHA SoC in three of their articles, check them out!
Examining Centaur CHA’s Die and Implementation Goals
Centaur CHA’s Probably Unfinished Dual Socket Implementation
VIA Part 4 – A Deep Dive into Centaur’s Last CPU Core: CNS
CHA001 Motherboard
This is a pre-release motherboard, and as such is has some quirks and unusual features not found in typical consumer motherboards.
On the left side of the motherboard, there are two debugging connections above the motherboard power connector. The first is labeled “JTAG_VIA_CONN” – I’m assuming this is a VIA specific connection for debugging. The port above that is a standard XDP (Intel® eXtended Debug Port) connection.
On the right side, you have the IO panel which supports audio, ethernet, 7x USB ports, PS/2 connections, and a serial port. If somehow you needed two serial ports, there is a header for a 2nd port directly behind the IO panel. Looking further up and we have a classic CD Audio header.
Moving to the top of the motherboard, we have a dip switch, a CPLD connection for directly programming via JTAG, 3x 2-pole dip switches, a TPM connection, and oddly – a USB-C connection that I would assume was used for debugging purposes.
While the M.2 slot next to the SATA port supports PCI-e 3.0, the M.2 slot next to the PCI-e slots only supports PCI-e 2.0 speeds
BIOS & Overclocking
The BIOS for the Centaur system is from AMI. This BIOS was cram packed full of different ways to tweak the system, and I couldn’t possibly cover them all if I wanted to.
From the menu shown below, I learned the CPU didn’t have a set bin.
However, in the CPU settings you have the ability to manually set the bin. The best setting was for 2.5ghz at 1.1v, however mine was not stable at those settings – I had to increase the voltage if I wanted to run 2.5ghz without instability.
I attempted to run the CPU at 2.6ghz, but no matter how much voltage I applied it remained unstable.
The BIOS also has an interesting option called “The Concluder”
This option lets the motherboard disguise the CPU architecture to programs, which is useful if a program doesn’t want to work because it doesn’t recognize the CPU – or because the program utilizes an unfair CPU dispatcher.
To my surprise, some parts of this motherboard utilize ZhaoXin technology. SATA & USB is handled by the ZhaoXin ZX-200 chipset. The on-board audio is also ZhaoXin-made.
Quirks, Bugs, & Oddities
Being pre-release hardware, I encountered a few quirks.
1) When I loaded Crysis Remastered, it warned that the CPU was unsupported – but otherwise had no issues.
2) I tried to run Cinebench R15, as there are a lot of benchmarks available running R15 with older CPUs with single core performance similar to Centaur’s CNS, but the program would not load. I tried the versions available for download from Guru3D, and the version included with HWBot’s benchmate – but both failed to launch the program.
3) When downloading games via Steam, if one opened up another program (such as Mozilla Firefox) the download speeds would severely throttle for a few moments until the other program was fully loaded. This only occured in Steam – I was unable to reproduce this quirk while running Ubisoft Connect, GOG, Origin, or Epic’s game launcher.
4) I had originally intended on testing RPCS3 to test emulation performance with AVX-512 enabled vs disabled, but the program always hung when it compiled the SPU cache when testing Demon Souls.
5) HWBOT’s BenchMate did not recognize the CPU, and because of that I was unable to submit scores via their benchmarking program. I was able to submit scores manually – you can view them at https://hwbot.org/user/albert.thomas/#My_Submissions
System Configuration
CPU | Centaur CNS, 8 cores @ 2.5ghz |
Motherboard | CHA0001 |
CPU Cooler | MSI MEG CoreLiquid S280 Liquid Cooler – Previous Review Here |
RAM | GSKill Trident 3466 set at 3200 JEDEC speeds |
GPU | Zotac Gaming Nvidia RTX 3060ti |
SSDs | 500gb Sabrent Rocket 4 500gb Western Digital SN550 |
Computer Case | DeepCool CK560WH – Previous Review Here |
PSU | DeepCool PQ1000M |
Benchmarks
For this I will only be able to benchmark the CNS x86 cores. I would love to be able to test the NCore – but those drivers are not publicly available.
One thing should noted: While the Centaur CNS core was designed with quad-channel memory support in mind, I only have 2x DDR4 sticks available, limiting support to a dual-channel configuration. This is important because any memory bound workloads will perform lower than otherwise expected.
CPU-Z
I chose to set the reference CPU to Ryzen 1600 in CPU-Z because 8c/8t is most similar to performance to a 6/12 CPU. In the multi-threaded benchmark here, the R5 1600 is performing ~40% faster. However, it also has 44% higher clockspeeds. Taking the clockspeeds into account, the Centaur CPU does well in this test. Intel’s i7-7700k scores 492 in single core performance in this benchmark – almost double that of Centaur’s CNS cores. In the multi-threaded test the i7 still pulls ahead, but not nearly as dramatically – 2648 vs 2206.
For those interested, here is a validated CPU-Z result: https://valid.x86.fr/910tkn
Cinebench R23
The single core performance of the Centaur CPU is not very impressive in this result. At only 552 points, it’s 2nd to last in this list – only outperforming a Westmere Xeon. This isn’t the best comparison, as the other CPUs in this chart are running at different clockspeeds. If we compare the CNS’s results to Ryzen 1700x, the 1700x has a 56% higher clock speed – but performs 73% better.
The multi-core results of 4141 are decent, scaling at 93.8% when compared to the single core score. Ryzen 1600 CPUs typically score around 6400 in this scenario. If you account for clock speeds differences between the Centaur CNS & Ryzen 1600, performance is good here.
AIDA64 GPGPU
Before looking at these results, keep in mind that GPGPU results are influenced by memory – and as such, the Centaur CNS will underperform unless paired with quad-channel memory. My setup runs dual-channel memory, so keep that in mind.
For AIDA64 GPGPU results, I found comparative results from Ryzen 3600 & Ryzen 5600 on /r/AMD posted by /u/coffeewithalex
There are a few interesting results here. In all of the Integer IOPS tests, the CNS CPU decently outperforms Ryzen 3600. Ryzen 5600 had massive improvements in these areas vs Ryzen 3600, as such it outperforms the CNS in all of these metrics – but the CNS still comes very close in 64-bit integer IOPS with 69.53 GIOPS on the CNS CPU vs 71.87 GIOPS on Ryzen 5600.
Despite being handicapped by dual-channel memory, the CNS CPU outperformed both Ryzen 3600 & Ryzen 5600 in memory write operations. This only applies to the single chiplet Ryzen CPUs – the 5900x & 5950x show performance that is ~2x of the 5600x’s results in memory writing.
Geekbench 5
Ryzen 1600 leads the Centaur CPU by 55% in single core performance, but that lead drops to 31.8% with multi-core results – indicating that the Centaur CPU scales better in this benchmark. Considering Ryzen’s higher clockspeeds, this is decent performance.
Corona 1.3
For the comparison data here, I’ve used numbers provided by Hardware Unboxed/Techspot available at https://www.techspot.com/review/1447-amd-ryzen-3-1200-1300-performance-preview/page2.html
In this benchmark, the Centaur CNS outperformed Intel’s i5-7600k by a decent margin. However, the Ryzen 1600x results obliterated the performance of the CNS – completing the task in nearly half the time! This is a bigger margin than clockspeeds alone would account for.
7-zip Compression/Decompression
For these comparative results, I source 7-zip’s website for the 1700x numbers and OpenBenchmarking.org for the i5 6500 results.
UserBenchmark
Generally, I am reluctant to incorporate UserBenchmark into my results because of their biases and drama – but it can be useful for certain comparisons.
With UserBenchmark, the Ryzen 1600 leads by 44.4% in single thread loads, 35.5% in quad core loads, and 54.6% in massively threaded loads – an overall performance advantage of 44%.
Y-Cruncher
Y-Cruncher is a result where we actually see the CNS CPU outperforming the Ryzen 1700x, which is pretty darned impressive. This is most likely due to Y-Cruncher’s support for AVX-512, which the Centaur CNS supports.
The Ryzen 1700x performance numbers for Y-Cruncher were sourced from TechReport: https://techreport.com/review/31366/amds-ryzen-7-1800x-ryzen-7-1700x-and-ryzen-7-1700-cpus-reviewed/
Gaming
There’s no better way to test an unreleased CPU with AVX-512 support and an AI Coproccesor than gaming, right? After all, can it really run Crysis?! Joking aside, I thought it would be interesting to test a few games on the Centaur platform. In theory, it should perform similar to a low-clocked i7-9700k.
This gaming test I did a little different than in past reviews – rather than comparing the results of multiple CPUs against each other, I just wanted to see what sort of performance Centaur’s CNS cores could sustain in gaming. I didn’t think that a in-depth testing against Ryzen or Intel CPUs would be very useful, given the clock speed and IPC differences it’s pretty much guaranteed that any first generation Ryzen CPU should outperform this in gaming.
For all of the results below, I recorded gameplay using Nvidia’s Shadowplay. This had a small impact on overall performance (5-10% performance loss).
Crysis Remastered
Let’s start with the one title that someone is bound to ask. Can it run Crysis? Well, the game engine didn’t think it could.
However, it actually ran Crysis rather well considering it’s low 2.5ghz clockspeed – definitely what I would consider a playable experience.
DOOM (2016)
I expected DOOM 2016 to run decently on this system, but it actually performed rather well. Framerates averaged around 120-130fps, with minimums in the 80s. It’s performance is comparatively weak when compared to modern CPUs, which are able to sustain a solid 200 fps framerate.
Strange Brigade
In this game, the Centaur CPU performed even better than it did in DOOM. Minimum framerates of ~107fps, averaging around 130-140.
Cyberpunk 2077
In many scenes, Centaur’s performance in Cyberpunk 2077 is quite acceptable. However, in driving scenes – especially while fighting – framerates will suffer a bit. It’s still what I would consider “playable”, but not comfortable.
Far Cry V
I wanted to test Far Cry V, because it is notoriously bound by single threaded performance. How bad was the performance? Well, let’s put it this way – it met the minimum standards for what I would consider playable, and the cores were relatively evenly loaded. Something was preventing the cores from being utilized fully, however, as peak single threaded load remained around 80%.
Assassin’s Creed : Odyssey
This title might look playable by it’s framerates, but the microstuttering made gameplay painful on Centaur’s CPU in AC:O. The micro-stutter is less apparent in the gameplay recordings, but if you watch the end of the video where it records the built-in benchmark it is easier to notice in this recording.
Summary & Thoughts
I really enjoyed testing this pre-release system. It makes me wonder what might have been had VIA decided to invest more resources in Centaur’s Austin, Texas team. Despite it’s relatively low clockspeeds, this CPU provides “playable” gaming performance in most games and overall is similar to a underclocked Ryzen 1600 in non-gaming benchmarks.
I wish I had been able to test the NCORE, but the drivers for it are essentially impossible to obtain. It appears the patents of the NCore are actually owned by ZhaoXin – as such we may see it in their future products, and who knows – maybe the drivers for those future CPUs will work with Centaur’s CHA SoC and we’ll be able to test how well these AI cores could have performed.
What really surprised me is that even without having the ability to utilize the NCORE, there were a few niche scenarios where this low clocked CPU was competitive with – or actually outperformed – higher clocked Ryzen CPUs.
The employees which comprised Centaur’s Texas team are now employed by Intel as part of a deal where Intel paid VIA $125 million for. I can’t help but wonder what about the CHA SoC impressed Intel more – the relatively good x86 performance achieved by a team of around 100 employees, or the higher alleged performance of the NCore when compared to Intel’s solutions.