Gross cluster

From openwfm
Revision as of 16:03, 16 October 2021 by Jmandel (talk | contribs) (Created page with "{{legacy}} : ''See Gross documentation for usage instructions.'' The compute nodes of this cluster have a gross (dozen dozen, 12*12=144) wiki...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Legacy

See Gross documentation for usage instructions.

The compute nodes of this cluster have a gross (dozen dozen, 12*12=144) cores. The cluster was built by Aeon Computing in Spring 2010.

Purpose

The primary purpose of the compute nodes is to serve for wildfire simulations for the NSF CDI wildfires project, in particular as a back-end for web-initiated computations. The remaining capacity is available for academic research (including externally funded) and educational uses only. The grant was funded by NSF grant 0835579, Principal Investigator Jan Mandel, with contributions from the Department of Mathematical and Statistical Sciences and the Center for Computational Mathematics.

Configuration

  • 12 compute nodes with 2 Intel X5670 Westmere CPUs with 6 cores each. Thus, each node is an SMP with 12 processors. Each compute node has 24GB memory.
  • Front end with 2 Intel X5670 CPUs (total 12 cores), 144GB memory, and NVIDIA Tesla C1070 supercomputing system for high-end virtual graphics rendering and GPU computing
  • Storage server with 20 2TB disks, configured as RAID 1+0 array for 20TB effective capacity.
  • QDR Infiniband, connecting the above components at 40 Gbit/s.
  • Offsite 20TB backup storage server.

Documentation

Access

  • Use of all CCM computing equipment including the Gross cluster is subject to U.S. Government Export controls.
  • Any math user can get a Gross cluster account on request. If the cluster gets overloaded so that the primary purpose (wildfire simulation) cannot be satisfied we'll deal with that when that happens.
  • Every Gross cluster user must belong to at least one project that the use of the cluster is requested for. Typically user accounts are requested by a permanent faculty member who acts as a project leader.
  • Every project leader must maintain an up-to-date wiki page for every project on this cluster as a condition of granting and continuing access. The project page needs to include list of users, funding sources, list of publications resulting from the project with fulltext links, and a summary of major results. Use of images is encouraged. This information important for reporting to funding agencies as well as to UCD as well as to document compliance with the export controls.
  • To access the cluster, ssh command line: ssh to math.ucdenver.edu, then ssh gross from there.
  • See Gross documentation for information on using the cluster.

Status

The cluster is available on request to users with existing shell accounts on math.ucdenver.edu.

Performance

See Gross cluster performance and Gross cluster HPL benchmark for performance data.
  • Max theoretical performance of one X5670 processor core is 2.93MHz * 4DP operations = 11.72 Gflops/core. Computational nodes total 144*11.72 = 1687.7 Gflops, including the head node 156*11.72 = 1828.3 Gflops
  • HPL benchmark with ATLAS BLAS 677 Gflops
  • Sustained writes from a compute node to the storage server (100GB file) over Infiniband 557 MB/s
  • MPI latency 1.85 us, 4.8 us (1kB messages)
  • MPI node-to-node bandwidth 3.4 GB/s, bi-directional 6.5GB/s

Funding

The acquisition was funded from Jan Mandel's grants and smaller part from other sources:

The cluster is operated by and the operating costs covered by the Center for Computational Mathematics.

Projects

Real-time monitoring