Gross cluster

See Gross documentation for usage instructions.

The compute nodes of this cluster have a gross (dozen dozen, 12*12=144) cores. The cluster was built by Aeon Computing in Spring 2010.

Purpose

The primary purpose of the compute nodes is to serve for wildfire simulations for the NSF CDI wildfires project, in particular as a back-end for web-initiated computations. The remaining capacity is available for academic research (including externally funded) and educational uses only. The grant was funded by NSF grant 0835579, Principal Investigator Jan Mandel, with contributions from the Department of Mathematical and Statistical Sciences and the Center for Computational Mathematics.

Configuration

12 compute nodes with 2 Intel X5670 Westmere CPUs with 6 cores each. Thus, each node is an SMP with 12 processors. Each compute node has 24GB memory.
Front end with 2 Intel X5670 CPUs (total 12 cores), 144GB memory, and NVIDIA Tesla C1070 supercomputing system for high-end virtual graphics rendering and GPU computing
Storage server with 20 2TB disks, configured as RAID 1+0 array for 20TB effective capacity.
QDR Infiniband, connecting the above components at 40 Gbit/s.
Offsite 20TB backup storage server.

Documentation

Access

Use of all CCM computing equipment including the Gross cluster is subject to U.S. Government Export controls.
Any math user can get a Gross cluster account on request. If the cluster gets overloaded so that the primary purpose (wildfire simulation) cannot be satisfied we'll deal with that when that happens.
Every Gross cluster user must belong to at least one project that the use of the cluster is requested for. Typically user accounts are requested by a permanent faculty member who acts as a project leader.
Every project leader must maintain an up-to-date wiki page for every project on this cluster as a condition of granting and continuing access. The project page needs to include list of users, funding sources, list of publications resulting from the project with fulltext links, and a summary of major results. Use of images is encouraged. This information important for reporting to funding agencies as well as to UCD as well as to document compliance with the export controls.
To access the cluster, ssh command line: ssh to math.ucdenver.edu, then ssh gross from there.
See Gross documentation for information on using the cluster.

Status

The cluster is available on request to users with existing shell accounts on math.ucdenver.edu.

Performance

See Gross cluster performance and Gross cluster HPL benchmark for performance data.

Max theoretical performance of one X5670 processor core is 2.93MHz * 4DP operations = 11.72 Gflops/core. Computational nodes total 144*11.72 = 1687.7 Gflops, including the head node 156*11.72 = 1828.3 Gflops
HPL benchmark with ATLAS BLAS 677 Gflops
Sustained writes from a compute node to the storage server (100GB file) over Infiniband 557 MB/s
MPI latency 1.85 us, 4.8 us (1kB messages)
MPI node-to-node bandwidth 3.4 GB/s, bi-directional 6.5GB/s

Funding

The acquisition was funded from Jan Mandel's grants and smaller part from other sources:

NSF grant AGS 0835579
NFS grant DMS 0713876
Department of Mathematical & Statistical Sciences
Center for Computational Mathematics
Some parts (UPS, racks) were reused from the earlier Beowulf cluster built in 2001 and funded by the NSF grant DMS 0079719
The NVIDIA Tesla C1070 system was donated by NVIDIA in 2009 for a GPU computing class

The cluster is operated by and the operating costs covered by the Center for Computational Mathematics.

Projects

See the list of projects on the Gross cluster.

Real-time monitoring