Sun’s T2000 “Coolthreads” Server: First Impressions and Experiences
by Johan De Gelas on March 24, 2006 12:05 AM EST- Posted in
- IT Computing
The T2000 as a heavy SAMP web server: the results
To interpret the graphs below precisely, you must know that the x-axis gives you the number of demanded requests, and the y-axis gives you the actual reply rate of the server. So, the first points all show the same performance for each server, as each server is capable of responding fast enough.
We tested the Opteron machines with both Linux on Solaris to get an idea of the impact of the OS.
The Sun T2000 isn't capable of beating our Quad core Opteron, but there are few remarks that I should make.
First of all, we tested the 1 GHz T1, which is, of course, about 20% slower than the best T1 at 1.2 GHz. The T2000 peaked at 950 req/s, the quad core Opteron at 1368 (Linux) and 1244 (Solaris) req/s. However, the T2000 was capable of delivering 935 req/s without any error (request timeout) while the Quad Opteron delivered 1100 (Solaris) and 1250 (linux) req/s without any errors. So, given the criteria that there cannot be any time-out, the difference gets a little bit smaller.
In defense of the Opteron and Xeon: the average response time for one particular request was (most of the time) between 5 and 15 ms. Once the server came close to its saturation point, we noted a maximum of 70 ms. With the T2000, the response time was at least 20 ms, typically 40 ms, with peaks of up to 220 ms when we came close to the peak throughput.
Of course, this is the disadvantage of the lower single-threaded performance of this server CPU: the individual response times are higher. In case of OLTP and web server, this is hardly a concern; in case of a decision support system, it might be one.
There is a better way to do this test, of course: enable the mod_deflate module and get some decent gzip compression. Once we enabled compression, our network I/O, which peaked at up to 12 MB/s, came down to a peak network I/O of 1.8 MB/s. Let us see the next benchmark, where we measured with gzip compression on.
The Sun T1 starts to show what it can do: performance has hardly decreased. Gzip compression is almost free on the T1; compression lowers performance by only 2%. The Opteron sees its performance lowered by 21% (977 vs 1244), and the Xeon by 19% (730 vs 899).
On Solaris, the T1 performs like a quad Opteron. Linux, which has probably slightly better drivers for our Opteron server, gives the quad Opteron the edge.
Let analyse this a little further.
As you can see, our application should be a prime example of an application where multi-core server CPU feels at home. With Gzip compression enabled, performance is still almost perfect at 96% going from 4 to 8 T1 cores.
So, why aren't we seeing the performance that the Sun claims regarding, for example, Spec Web2005, where the T1 has no problem outperforming quad core x86 CPUs? We are not sure, as we measured that 97% of the processing was done in the OS code (97% "system") and only 2-3% of the CPU load was done in the actual application ("user"). We suspect that the relatively light load of FP operations might have lowered the T1's performance. Depending on the tool that we used, we saw 0.66 to 1% of FP operations in our instruction mix, with peaks to 2.4%. Those FP operations are a result of our script calculating averages most likely.
To interpret the graphs below precisely, you must know that the x-axis gives you the number of demanded requests, and the y-axis gives you the actual reply rate of the server. So, the first points all show the same performance for each server, as each server is capable of responding fast enough.
We tested the Opteron machines with both Linux on Solaris to get an idea of the impact of the OS.
The Sun T2000 isn't capable of beating our Quad core Opteron, but there are few remarks that I should make.
First of all, we tested the 1 GHz T1, which is, of course, about 20% slower than the best T1 at 1.2 GHz. The T2000 peaked at 950 req/s, the quad core Opteron at 1368 (Linux) and 1244 (Solaris) req/s. However, the T2000 was capable of delivering 935 req/s without any error (request timeout) while the Quad Opteron delivered 1100 (Solaris) and 1250 (linux) req/s without any errors. So, given the criteria that there cannot be any time-out, the difference gets a little bit smaller.
In defense of the Opteron and Xeon: the average response time for one particular request was (most of the time) between 5 and 15 ms. Once the server came close to its saturation point, we noted a maximum of 70 ms. With the T2000, the response time was at least 20 ms, typically 40 ms, with peaks of up to 220 ms when we came close to the peak throughput.
Of course, this is the disadvantage of the lower single-threaded performance of this server CPU: the individual response times are higher. In case of OLTP and web server, this is hardly a concern; in case of a decision support system, it might be one.
There is a better way to do this test, of course: enable the mod_deflate module and get some decent gzip compression. Once we enabled compression, our network I/O, which peaked at up to 12 MB/s, came down to a peak network I/O of 1.8 MB/s. Let us see the next benchmark, where we measured with gzip compression on.
The Sun T1 starts to show what it can do: performance has hardly decreased. Gzip compression is almost free on the T1; compression lowers performance by only 2%. The Opteron sees its performance lowered by 21% (977 vs 1244), and the Xeon by 19% (730 vs 899).
On Solaris, the T1 performs like a quad Opteron. Linux, which has probably slightly better drivers for our Opteron server, gives the quad Opteron the edge.
Let analyse this a little further.
PHP/MySQL No Gzip | |||
Single Opteron 275 | 665 | 4-core T1 | 535 |
Dual Opteron 275 | 1244 | 8-core T1 | 949 |
Scaling 2 Opteron cores to 4: | 87% | Scaling 4 to 8 T1 cores: | 77% |
PHP/MySQL Gzip | |||
Single Opteron 275 | 538 | 4-core T1 | 477 |
Dual Opteron 275 | 977 | 8-core T1 | 933 |
Scaling 2 Opteron cores to 4: | 82% | Scaling 4 to 8 T1 cores: | 96% |
Gzip performance vs no Gzip | |||
Opteron 275 | 79% | Sparc T1 | 98% |
As you can see, our application should be a prime example of an application where multi-core server CPU feels at home. With Gzip compression enabled, performance is still almost perfect at 96% going from 4 to 8 T1 cores.
So, why aren't we seeing the performance that the Sun claims regarding, for example, Spec Web2005, where the T1 has no problem outperforming quad core x86 CPUs? We are not sure, as we measured that 97% of the processing was done in the OS code (97% "system") and only 2-3% of the CPU load was done in the actual application ("user"). We suspect that the relatively light load of FP operations might have lowered the T1's performance. Depending on the tool that we used, we saw 0.66 to 1% of FP operations in our instruction mix, with peaks to 2.4%. Those FP operations are a result of our script calculating averages most likely.
26 Comments
View All Comments
drw - Friday, March 24, 2006 - link
Based on the kernel versions listed, I assume that a 32-bit distro was used?If so, am curious how a 64-bit distro would compare, as both Apache and MySQL benefit greatly by 64 bit.
JohanAnandtech - Friday, March 24, 2006 - link
Fully 64 bit. uname -a clearly indicates 64 bitdefter - Friday, March 24, 2006 - link
Dual Opteron 275HE had 5% higher power consumpion (198W vs 188W), but it was 5-30% faster (depending wherever or not gzip was used). These results would suggest that dual Opteron has won performance/watt battle in this benchmarks.
Pricing is also quite important. What's the price for dual Opteron 275HE server with 8GB of memory? About $5000-7000?
PeterMobile - Friday, March 24, 2006 - link
Definitely interesting to see a 3. party review of the T2000. I think it could also be interesting to compare both the Sun machine and the x86 servers to an IBM p5 510Q. That's a 4-way 1.5 GHz Power5+, which including 4 GB RAM and 2 Ultra320 disks lists for $8,536.Calin - Friday, March 24, 2006 - link
I saw there is almost no loss of performance for compressing data... how about encrypting it?cxl - Friday, March 24, 2006 - link
Actually, MOD operation can be very important for servers, as it is basis for any hashing operations, commonly used in many server applications. E.g. to identify variable in a script, interpreters routinely use hashtables.
114 cycles per MOD operation is performance disaster.
Calin - Friday, March 24, 2006 - link
The performance in the tested configuration was quite good - I wonder how other benchmarks and maybe other "twists" of the benchmark tested would look like.cosmotic - Friday, March 24, 2006 - link
Did you mean certainly NOT least?
JohanAnandtech - Friday, March 24, 2006 - link
definitely ... Fixed. Just checking if you read it carefully :-)cosmotic - Friday, March 24, 2006 - link
Why no graphs? It makes reading benchmarks SO much easier.