IBM has consistently attracted the brightest minds, the kind of engineers who deserve the moniker "computer scientist." In the 1980s, these scientists cooked up a processor architecture that was built for performance: the IBM 801, the original RISC processor. The 801's legacy lives on in the IBM Power series of enterprise-class processors.
The major difference between a RISC processor and a CISC processor, such as Intel's x86, can be viewed as a tug-of-war between programmers and chip designers. CISC processors are designed to make application developers' lives easier by reducing common operations to single, long-executing native instructions, giving CISC a reputation as a slow but friendly design. Compared in that light, RISC is fast and unfriendly. Each of its simple instructions serves a very narrow purpose, executes quickly, and parallelizes exceptionally well. RISC requires patient, gifted programmers and meticulously optimized compilers; RISC's success attests to an abundance of both.
Multicore a partial story
The best known Power5 attribute is its integration of two discrete RISC cores on a single chip. Announcements from Advanced Micro Devices, Intel, and Sun regarding upcoming multicore processors focused attention on this aspect of Power5, but multicore was also a feature of its predecessors, Power4 and Power4+.
According to IBM, Power5 is fully compatible with Power4 executables. The wonder of multicore is that it delivers the pipe dream of more speed in less space without a marked increase in heat. But multicore is not simply SMP (Symmetric Multiprocessing) on a chip.
No catch in the cache
The Power5's cores share a very fast Level 2 cache. The speed and quantity of cache is a factor in the performance of all microprocessors. (The evolution of the x86 shows Intel to be utterly cache-obsessed.) With simple instructions flying through a RISC processor so rapidly, the cache's efficiency in reducing the number of trips to RAM becomes the key to the whole design.
The Power5's Level 2 cache totals just less than 2MB. With a shared cache, data fetched by one core is immediately available to the other, increasing the likelihood that fetching the next program instruction or block of data won't require a trip to performance-killing RAM. But the shared cache also makes it more likely that the cores will try to access the cache at the same time, which they cannot do.
IBM implemented a cache-contention stopgap, splitting the Level 2 cache into three segments. This design permits quasi-simultaneous access to cache as long as both cores are hitting different cache segments. IBM has another creative solution to the Level 2 cache-contention issue: a ponderous 36MB external Level 3 cache. Each core owns its Level 3 cache exclusively, so there's no possibility for conflict between cores. Although Level 3 cache isn't nearly as fast as Level 2, Level 3 is much faster than main memory, and Power5's design makes the connection between its core and its associated Level 3 cache a direct link.
Infoworld considers IBM's reworking of the Level 3 cache design to be one of the top design wins in Power5.
Another substantial Power5 gain is its on-chip memory controllers. Each Power5 core has its own controller and is capable of managing a dedicated block of main memory. This has a huge impact on overall performance, as we've seen in comparing the memory throughput of Opteron and Xeon, for example. And in Power5's case, the design fits with IBM's strategy of multilevel parallelization.
Two is not enough
Power5 isn't just dual-core; it implements Power4's SMT (Simultaneous Multi-Threading) facility, which gives each core the capability of executing instructions from two threads simultaneously, under certain conditions. SMT is similar to Intel's HTT (Hyper-Threading Technology) but with distinct advantages that make "certain conditions" broader and that dynamically optimize parallelization by analyzing and prioritizing threads to make parallel execution more efficient - much more efficient, Infoworld thinks. Although it's difficult to isolate in testing, Power5's implementation should outgun the maximum 30 per cent boost that Intel projects for HTT.
Power5 adds two basic, but much-needed, thread-prioritization schemes. Dynamic Resource Balancing attempts to keep instruction streams flowing smoothly by analyzing the behaviour of threads and by sidelining code that could slow down an SMT stream. For example, instructions that must be executed in sequence to derive an accurate result can lock that thread in the processor for a time. Power5 tries to predict this and run simpler instructions until there's room to execute the sequence without clogging SMT.
In another awesome design gain, Power5's adjustable thread priority gives OSes, drivers, and applications the capability of assigning an arbitrary priority level to each thread. This application-defined thread priority is factored into Dynamic Resource Balancing calculations and is used more broadly to determine the length of time a thread remains active in the CPU. It also gives operating systems an easy way to control power conservation.
If you've got a lot of high-priority threads running, the box will run hot. But as the OS knocks thread priorities down, the CPU will run more idle cycles and therefore run cooler. If you knock all thread priorities down to their lowest level, the CPU goes into a sleeplike low-power mode. That's the simplest approach to power management we can imagine.
Finally, Power5 uses what it knows about the facilities needed by each RISC instruction to, in essence, power down portions of the chip that aren't needed at that moment. This potentially puts a new spin on Power's infamous power and heat problems. It certainly seems simpler than OS-driven power management schemes such as those employed by x86 processors.
You might never notice
On technology alone, Power5 is positioned to rule. But unbelievable as it might seem to the many Itanium 2 sceptics who share their opinions with InfoWorld, the majority of observers have already called the Itanium 2/Power5 contest in Intel's favor.
That's an odd assessment because, in this case, IBM is pulling an Intel on Intel. RISC owns the Unix market, Unix owns the midrange to high-end market, and Intel doesn't do RISC. It's out in the cold on those multimillion dollar, big-iron purchase orders. Intel is effectively locked out unless it can convince buyers that Itanium 2 obsoletes RISC. Will Intel be able to break in? We think it'll take years for the Itanium to push RISC aside, and while it's breaking in, Power and Sparc will continue to evolve.
What makes this hard to call is that IBM wants Intel's market as much as Intel wants IBM's. IBM is selling Power5 servers for $5,000 with Linux preinstalled. Go back up and scan the specs to understand why a $5,000 Power5 server might be nice to have around.
Power future profits
Analysts etching headstones for Power note that IBM's chip business isn't making money. But its systems business is, and now those two units are one. That's a smart move: Make chips for systems you sell; build systems around the chips you're making. Releasing the design and tools to the public is smart, too. Every open licensee is a potential manufacturing customer, and unencumbered intellectual property is going to flow in from geniuses not on IBM's payroll.
Where is IBM's marketing genius?
These are good strategies for cozying up to the entry market. If only IBM didn't have to deal with customers. Big Blue has never been able to bring the low end of its catalogue the brand polish and customer trust that Dell and HP enjoy in spades. The great work IBM's engineers have done is gated by the company's poor marketing. In all likelihood, if you're not running IBM gear now, you'll never look at a Power5 server regardless of the price.
IBM has intentionally hitched Power5's success to Linux at the entry level. But it's hard to extract added value from software that the public believes it can download for free, and Linux is an OS that buyers don't tend to purchase new hardware to run. In other words, Linux won't sell Power5 entry servers. At $5,000 to $6,000, IBM's least expensive Power5 server isn't cheap enough compared with a dirt cheap Opteron or Xeon EM64T (Extended Memory 64 Technology) server running Linux.
On the other hand, big Unix iron sells itself, and customers will always buy more of what they're already using. They'll buy what their solution consultants advise. IBM exceeds all others in its ability to fawn over major accounts. You cannot pry a customer loose from IBM hardware at the midrange and up. So the overall message on Power5 will be garbled to the press and the public at large, but the suits in the field bypass IBM's marketing. In IBM-to-customer relationships, you can't beat IBM.
Power5's got just about everything: speed, simplicity, innovation, seamless backward compatibility, a mature development toolset, and the backing of a technological giant. It's an unrivaled engineering achievement, created by what may be the world's smartest engineers. If IBM's marketing ever matches the intelligence of its engineering, watch out, Intel.
Tom Yager is technical director of the InfoWorld Test Centre.