CoE HPC Newsletter, Nov 21 Edition: DGX queue change

Cluster users, Please take note of DGX queue change and upcoming cluster maintenance below. DGX queue change The load on the DGX systems has sustained a load of 90-100% utilization for the last several weeks, making it difficult for many to obtain resources from the dgx and dgx2 partitions. To help reduce the load and make GPU resources more readily available on the DGX nodes without further reducing the overall GPU limits, the max GPU per job limit for the dgx partition has been reduced from 4 to 3. So if you need 4 GPUs or more per job, please use the dgx2 partition instead. Jobs requesting 4 GPUs from the dgx partition will result in the “QOSMaxGRESPerJob” error message. Winter maintenance This is an early heads-up that the cluster will undergo its regular quarterly maintenance after finals week, December 12-16. To see the latest updates on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news Cheers, Rob Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian