Cluster users,
Please take note of DGX queue change and upcoming cluster maintenance below.
DGX queue change
The load on the DGX systems has sustained a load of 90-100% utilization for the last several weeks, making it difficult for many to obtain resources from the dgx and dgx2 partitions. To help reduce the load and make GPU resources more readily
available on the DGX nodes without further reducing the overall GPU limits, the
max GPU per job limit for the dgx partition has been reduced from 4 to 3. So if you need 4 GPUs or more per job, please use the dgx2 partition instead. Jobs requesting 4 GPUs from the dgx partition will result in the “QOSMaxGRESPerJob” error message.
Winter maintenance
This is an early heads-up that the cluster will undergo its regular quarterly maintenance after finals week, December 12-16.
To see the latest updates on the cluster, check out the link below:
https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news
Cheers,
Rob Yelle
HPC Manager