CoE HPC News, November 10 2022 Edition: Cooling failure

10 Nov
2022
10 Nov
'22
5:42 p.m.
HPCC users, Early this morning there was a cooling failure in the KEC datacenter which allowed temperatures to climb to unsafe levels, resulting in the automatic shutdown of all DGX-2 nodes and thus the termination of all jobs running on these nodes. Cooling has been restored to safe temperatures, and all of the DGX-2 nodes are back online. I know many of you have deadlines coming up, so may want to check the status of your jobs and resubmit as needed. Rob Yelle HPC Manager
855
Age (days ago)
855
Last active (days ago)
0 comments
1 participants
participants (1)
-
Yelle, Robert Brian