Navigation Chart

Main Navigation

Page Content

The LOEWE-CSC Supercomputer

LOEWE-CSC is a heterogenous supercomputer at the Goethe University Frankfurt. It was installed at the Industriepark Hoechst in the fourth quarter of 2010. There are 768 compute nodes with two 12-core AMD CPUs and one GPU. Additionally there are login nodes and other infrastructure nodes. For highly parallel, CPU-intensive applications, 40 nodes with four of the same CPUs are available.


The design of the LOEWE-CSC aims for a large spectrum of applications. Many scientific areas are included, such as Lattice-QCD, Hydrodynamics, UrQMD, data-reconstruction in high-energy physics (e.g. track reconstruction), etc.

Processing Power

The CPUs are clocked at 2.1 GHz, leading to an overall performance of 162.9 TFlop/s in double precision. The graphics cards (Cypress architecture) provide an additional 417.8 TFlop/s. In single precision, the CPU performance is doubled and the GPU performance is even five times higher. Many algorithms require double precision only for a small critical part, while the remaining calculations can be executed in single precision. It is important to optimize software, employing parallelization, vectorization, and GPU programming, to fully exploit the available computing power. This is why the users of the computer work closely together with the specialists in this area.

A DGEMM (general matrix multiplication) was implemented for this computer to demonstrate that it is possible to utilize the compute power of this heterogeneous architecture. The DGEMM can fully load both the GPU and the CPU. The GPU kernel by itself reaches 90% of the theoretical peak performance. Far more than 80% of the accumulated theoretical peak performance of GPU and CPU are available to the system; this includes the DMA transfer. (CALDGEMM source code and documentation)

The HPL Benchmark was adapted to build upon this DGEMM. HPL is a Linpack implementation, which is used as the standard benchmark to rate the performance of supercomputers. A special method to almost completely hide the transfer times was developed in order to use the GPU without any breaks. A parallel Linpack run on several hundred GPU nodes achieves approx. 70% of the theoretical peak performance. This outperforms traditional heterogeneous supercomputers by far. They often only reached up to 50%. The LOEWE-CSC scored the 22nd place on the Top500 List of supercomputers in November 2011. (GPU-HPL source code and documentation)

Energy Efficiency

Conventional data centers often require up to 50% of their energy consumption for cooling. A novel cooling concept enables the LOEWE-CSC to require less than 10% (PUE < 1.1) of its energy for cooling. A heat exchanger, using water cooling, is located in the back doors of the racks. There is no need for additional fans. Instead, the computers themselves create enough air pressure to build a sufficient air flow through the heat exchangers. Two large chillers, located outside, keep the water cool. The second chiller is only required for operation in the summer.

The required cooling duty is further reduced through the use of relatively warm cooling water. Thus the efficiency of the chillers is increased considerably. The computers were designed such that they can cope with an increased ambient temperature.

Additionally the computers provide an extremly high computer power per required energy. Primarily, the use of graphics cards facilitates this efficiency. Using the previously described Linpack benchmark, the LOEWE-CSC achieved the 8th place on the Green500 list in November 2010.

Current power consumption: N/A

To Navigation Chart

Bottom of Page