CoE HPC News, August 8 2022 Edition: Cooling outage, HPC portal apps, DGX queue reminders

HPCC users, Please see the latest cluster news below. Datacenter cooling outage Most of the HPC cluster underwent an emergency shutdown last night due to a cooling failure in the KEC datacenter as temperatures had reached critical levels. Unfortunately many jobs or interactive sessions were terminated as a result of the emergency shutdown. Most of the cooling has been restored, and HPC resources are slowly being brought back online to a level that can be accommodated by the available cooling. New HPC portal apps New interactive apps have been added to the HPC portal: Matlab Mathematica R Studio Ansys Workbench (for approved Ansys users only) If you use any of these applications, check these out and let me know if you have any trouble using them. You can check out the HPC portal here: DGX queue change reminder This is a reminder that the DGX partitions have been redefined as follows: If you need 4 GPUs or less, please use the “dgx” partition. If you need 4 GPUs or more, please use the “dgx2” partition. If your jobs to the dgx/dgx2 partitions are pending with “QOSMinGRES” or “QOSMaxGRES”, or if your jobs are rejected for those reasons, that means you need to change the partition as noted above. For the latest cluster news and status updates, check out the link below:<> Cheers, Rob Yelle HPC Manager
participants (1)
Yelle, Robert Brian