Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The Official SPEC Numbers
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
SPECfp | ||
Clockspeed | SPEC fp 2000 | |
POWER5+ | 2200 | 3271 |
Itanium 2 | 1666 | 2851 |
Xeon 5160 | 3000 | 2783 |
Opteron | 2800 | 2256 |
Pentium 4 E | 3733 | 2232 |
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
SPECint | ||
Clockspeed | SPEC Int 2000 | |
Xeon 5160 | 3000 | 3057 |
Pentium 4 E | 3733 | 1870 |
Opteron | 2800 | 1837 |
Pentium 4 Xeon | 3733 | 1813 |
POWER5+ | 2200 | 1705 |
Itanium 2 | 1666 | 1502 |
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
LMBench | |||||||
Clockspeed | L1 (ns) | L1 (cycles) | L2 (ns) | L2 (cycles) | RAM (ns) | RAM (cycles) | |
Xeon 5160 3 GHz | 3000 | 1.01 | 3 | 4.7 | 14 | 117.3 | 345 |
Pentium- M 1.6 GHz | 1593 | 2 | 3 | 6 | 10 | 92.1 | 147 |
Sun T1 1 GHz | 980 | 3 | 3 | 22.1 | 22 | 107.5 | 105 |
Opteron 275 | 2209 | 1 | 3 | 5.5 | 12 | 73 | 161 |
Xeon Irwindale 3.6 GHz | 3594 | 1 | 4 | 8 | 28 | 48.8 | 175 |
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
91 Comments
View All Comments
JohanAnandtech - Wednesday, June 7, 2006 - link
Running 4 threads of SSL on 8 core Opteron gives you almost exactly the same scores as 4 threads on a 4 core Opteron. The 2.2 GHz DC Opteron runs at almost exactly the same speed as the 3 GHz woodcrest (that is why you can almost not see it), the 2.4 GHz is clearly in the lead.Sign/s: the 2.4 GHz Opteron is keeping up.
Windows tests: see Jason Clark :-). At the end of this week, Jason will present the Windows based ones.
hondaman - Wednesday, June 7, 2006 - link
Is gentoo using 64 bit?JohanAnandtech - Wednesday, June 7, 2006 - link
Yes. Added that to the configuration page. Thx.vortmax - Wednesday, June 7, 2006 - link
Can't keep a company the size of Intel with all it's resources down for too long. They came back and came back hard. I'm sure AMD was expecting this and I'm guessing they have some kind of contingency plan to counter. It might not be this year, but I would expect that in Q1 of 07, they will have some big announcements.Competition is a good thing.
stmok - Wednesday, June 7, 2006 - link
Yeah, the contingency plan is called K8L. ;-)Questar - Wednesday, June 7, 2006 - link
So much for the fanboys saying the FSB would cripple Woodcrest.DigitalFreak - Wednesday, June 7, 2006 - link
Woodcrest, maybe. It will be interesting to see how the quad processor version does.eRacer - Wednesday, June 7, 2006 - link
A typo in the conclusion...under AMD Opteron disadvantages:- DDR2 offers lower latency, less power and less cost (for now)
should be an advantage
vijay333 - Wednesday, June 7, 2006 - link
i think AT is correct - unless i'm mistaken, opterons don't support DDR2, so they don't have the benefits of DDR2 (lower latency etc)eRacer - Wednesday, June 7, 2006 - link
Oops, you are probably correct. I was thinking ahead to rev. F Opteron.