CPUs, GPUs, APUs and You
March 28, 2014
p>The marketing war going on between the purveyors of high-performance computing (HPC) hardware may seem to be much like the cola wars of the 1980s, where selecting a flavored, carbonated, sugar-infused can of water became a matter of marketing prowess over substance.
However, choosing the proper processor for advanced simulations, 3D modeling and complex designs should not be based upon vendor’s marketing budget. There must be substance behind the manufacturing claims, which can be translated into a price vs. performance argument. It must ensure that a CPU, GPU or APU can deliver both productivity and value.
With that in mind, understanding how today’s HPC solutions operate and how they differ have become the primary prerequisites for selecting the appropriate high-performance processor, especially when thousands of dollars are on the line and failure is not an option.
CAD, CAM, Design and Simulation Today
For the most part, engineers, scientists and designers have come to rely on the tried-and-true workstation to process their workloads, and a multitude of vendors have met the call to supply those pricey and powerful machines. Vendors such as HP, Dell, BOXX, Xi, Thinkmate and Lenovo have strived to build the most powerful workstations, consistently outpacing one or another to claim the gold crown of performance.
[gallery columns=“1” link=“none” ids=”/article/wp-content/uploads/2014/03/OLD_CPU_ARC.jpg|AMD’s take on how its processor reduces latency.,/article/wp-content/uploads/2014/03/hq1.jpg|”]
However, performance is a relative term. Engineers today have to decide what level of performance they need to guarantee positive results in a timely and cost-effective fashion. Some engineers focus on simulation; others on animation, and the rest focus on something that is critical to their operation, be it 3D modeling, CAD, CAM or big-data analytics.
Yet those performance identifiers all have something in common: They rely on the throughput of a processing unit, which has to crunch the numbers, run the algorithms, and digest the data into usable output—and that processing hardware is not created by the workstation vendors. Simply put, CPUs, GPUs and APUs come from chip manufacturers, such as Intel, AMD and NVIDIA. Each company focuses on what they define as the ultimate in performance, either by incorporating multiple cores or parallel processing, or even becoming the primary engines of compute farms.
In the past, choosing a processing platform was mostly driven by the software in use, with software vendors such as Autodesk, Dassault Systemes, Siemens PLM Software and PTC calling the shots. However, things are starting to change. As hardware platforms evolve, software vendors are playing a strategic game of catch-up to make sure their products perform across multiple choices—and are not tied to any one specific chip manufacturer. For example, the latest version of SolidWorks from Dassault Systemesworks with both Intel and AMD CPUs, yet can leverage the additional processing power offered by NVIDIA Quadro FX GPUs.
While that may sound like it covers the gambit of chip manufacturers, there is another factor to consider: How efficiently are those processing components being used?
For Dassault and the majority of software vendors, designing software to fully leverage the processing power available has become a challenge, simply because of the architectures involved. Take, for example, today’s architectures that require the copying of data from a CPU to a GPU to allow the GPU to contribute to processing. Applications today will copy data from system RAM to the GPU memory over the PCIe bus to do some computational work, and then send the results back over the PCIe bus when the computation is complete. That process adds significant overhead to the parallel processing chores to the otherwise-efficient GPUs.
Major software vendors can attempt to “code” around those limitations, but that requires subverting the initial hardware architecture and accessing the hardware directly—which may not even be possible based upon the design. In other words, eliminating those engineered “waits” may be all but impossible, unless something changes with architecture design.
Are HSA, hUMA and hQ the Future?
It has become obvious that maximizing performance on workstations is going to require some sort of architectural design change. Software alone is not going to overcome the architectural handicaps placed on CPU-to-GPU communications.
However, there are some new acronyms on the horizon that may spell relief for software vendors looking to maximize performance: Heterogeneous System Architecture (HSA), Heterogeneous Unified Memory Architecture (hUMA) and Heterogeneous Queuing (hQ) are technologies that are coming together to remove processing bottlenecks and move simulation, design and modeling into the future. While HSA, hUMA and hQ are associated with chipmaker AMD, the ideology behind what the technologies have to offer can be found on the drawing boards of other chip manufacturers.
AMD is hoping that HSA takes hold, and has made the industry take notice by forming the HSA Foundation. At press time, the foundation has garnered the backing of chip and system makers—save for the company’s main rivals NVIDIA and Intel, who apparently are taking a wait-and-see approach as to what the organization is looking to accomplish.
AMD claims HSA means faster and more power-efficient personal computers, tablets, smartphones and cloud servers. What’s more HSA, works with hUMA, which is the latest way for processors to access the memory inside an APU. HSA allows developers to take control of the GPU and make it an equal partner with the CPU, as well as other processors. It does this by incorporating hQ, which allows software to communicate with the GPU; this eliminates the need for software to wait for the CPU to orchestrate communications to the GPU.
Vendor partners include Imagination Technologies, ARM, Samsung, MediaTek, Qualcomm and Texas Instruments.
Nevertheless, AMD’s move to HSA lacks one significant element: mainstream simulation, modeling and CAD/CAM software designed to leverage the company’s APU architecture. That said, there is still growing interest in the HSA Foundation, with national laboratories (Lawrence Livermore, Oak Ridge, Argonne) and tech industry giants (Oracle, Huawei, Broadcom, Canonical) joining the organization last fall. That is sure to lend credibility to HSA, and it may not be long before CAD/CAM software joins the development fray.
Intel’s Dash to the Future
While much of the industry is taking note of what may come of HSA, Intel is not one to sit on its haunches. It has long been a favorite among workstation manufacturers with its continually evolving Xeon processor, and the company promises to move ahead with innovations that will keep the Xeon at the top of the heap of high-performance CPUs.
Case in point is the company’s plan to bring to market a standalone Xeon Phi CPU that can replace the combination of Xeon CPU and Xeon Phi coprocessor widely used in HPC systems today. The company did not say when to expect the product, which will be built using a 14nm process technology. Raj Hazra, VP of Intel’s data center group and general manager of its technical computing group, offered some info at the SC13 Supercomputing Conference in Denver. Hazra said the concept of making Phi a host processor will do away with the notion of having to off-load code across a PCIe or some other limited-capacity connection. He added that one of the CPU’s most important features will be in-package memory, which means memory will be part of the CPU package, as opposed to a separate card on the motherboard.
Applications will be able to use the memory resource as part of the overall memory space, as cache or as a hybrid of the two, Hazra explained. It will use a familiar programming model for using processor cores and the memory, which will be connected with a high-bandwidth link. “This is a fundamental advancement on the path to many-core and exascale,” he said.
The Xeon Phi CPU is not the only thing on Intel’s road map. The company just introduced the Xeon E7 v2 processors. According to the company, the E7 v2 family has triple the memory capacity of the previous generation processor family. That capacity enables in-memory analytics, which places and analyzes an entire data set in the system memory rather than on traditional disk drives. The E7 v2 family is built for up to 32-socket servers, with configurations supporting up to 15 processing cores and up to 1.5TB of memory per socket. Intel says it achieves twice the average performance of the previous generation.
The company also expects to release the Xeon E5-4600 v2 series processors later this year. Last, but not least, Intel also plans to announce a 15-core Xeon processor, at 2.8 GHz and at 155W, which should help to give Intel the HPC crown with existing software for some time.
NVIDIA Not to be Left Out
Arguably, NVIDIA rules supreme in the GPU market. The company offers many different solutions for those looking to improve workstation performance—mostly in the form of a graphics card. However, as software developers discovered the power of the GPU for more than just processing graphics, software evolved to take advantage of what GPUs bring to the table, such as parallel processing and low power consumption.
With the company’s Tesla GPUs and compute unified device architecture (CUDA) parallel computing platform, NVIDIA has made GPU HPC available to almost anyone. It aims to increase market share with its Tesla GPU Accelerators, which enable the use of GPUs and CPUs together.
However, NVIDIA’s future lies with its Kepler architecture, which the company claims is the world’s fastest and most efficient for HPC. For the workstation market, it has recently launched the Tesla K40 GPU Accelerator, which brings cluster level performance to workstations and is based upon the NVIDIA Kepler architecture.
“GPU accelerators have gone mainstream in the HPC and supercomputing industries, enabling engineers and researchers to consistently drive innovation and scientific discovery,” says Sumit Gupta, general manager of Tesla Accelerated Computing products at NVIDIA. “With the breakthrough performance and higher memory capacity of the Tesla K40 GPU, enterprise customers can quickly crunch through massive volumes of data generated by their big-data analytics applications.”
Even more performance is expected to arrive when NVIDIA moves forward with its Maxwell GPU architecture that will replace Kepler sometime in 2014, according to the company’s GPU road map. The Volta architecture will follow Maxwell sometime beyond 2014, bringing even more HPC capabilities to workstations.
What Does It All Mean?
Engineers working with simulation, modeling, CAD/CAM and other design tools are going to have to keep a close eye on the HPC/workstation battles that occur in 2014. From a value standpoint, AMD seems to show the a lot of promise, if—and only if—major software vendors fully support HSA. However, both Intel and NVIDIA’s viability in the workstation market seems to be assured, remaining safe choices for the masses looking for speed, reliability and productivity.
Ultimately, choosing a platform still comes down to what software is being used and hardware is optimized for the primary tasks faced by each engineer.
More Info
- Advanced Micro Devices
- Argonne National Laboratory
- ARM
- Autodesk
- BOXX Technologies
- Broadcom
- Canonical
- Dassault Systemes
- Dell
- HP
- HSA Foundation
- Huawei
- Imagination Technologies
- Intel
- Lawrence Livermore National Laboratory
- Lenovo
- MediaTek
- NVIDIA
- Oak Ridge National Laboratory
- Oracle
- PTC
- Qualcomm
- Samsung
- Siemens PLM Software
- Texas Instruments
- Thinkmate
- Xi Computer Corp.
Subscribe to our FREE magazine,
FREE email newsletters or both!About the Author
Frank OhlhorstFrank Ohlhorst is chief analyst and freelance writer at Ohlhorst.net. Send e-mail about this article to DE-Editors@deskeng.com.
Follow DE