HPCC users,

 

Early this morning there was a cooling failure in the KEC datacenter which allowed temperatures to climb to unsafe levels, resulting in the automatic shutdown of all DGX-2 nodes and thus the termination of all jobs running on these nodes. Cooling has been restored to safe temperatures, and all of the DGX-2 nodes are back online. I know many of you have deadlines coming up, so may want to check the status of your jobs and resubmit as needed.

 

Rob Yelle

HPC Manager