
10 Nov
2022
10 Nov
'22
9:42 a.m.
HPCC users, Early this morning there was a cooling failure in the KEC datacenter which allowed temperatures to climb to unsafe levels, resulting in the automatic shutdown of all DGX-2 nodes and thus the termination of all jobs running on these nodes. Cooling has been restored to safe temperatures, and all of the DGX-2 nodes are back online. I know many of you have deadlines coming up, so may want to check the status of your jobs and resubmit as needed. Rob Yelle HPC Manager