HPCC users,
Please check out the latest HPC cluster news below.
DGX partition changes
Two changes are being introduced to the DGX systems this month to improve job scheduling and flexibility. First, the “dgx” and “dgx2” partitions are being redefined.
Starting tomorrow morning, the “dgx” partition will be used for smaller GPU workloads, i.e. 4 GPUs or less, and 12 CPUs or less, whereas the “dgx2” partition will be for larger GPU workloads, i.e. 4 GPUs or more, and 12 CPUs or more. For those of you
who normally request fewer than 4 GPUs at a time under the “dgx2” partition, please change to the “dgx” partition. As some of you are aware, the larger the resource request, the longer the wait, and this change was recommended by the vendor as a way to improve
the scheduling of larger workloads on the DGX systems.
New DGX limits
The second change involves the DGX resource limits. The GPU and CPU limits for the DGX systems sometimes change based on overall load and resource availability. Lately
we have settled to a limit of 8 GPUs and 32 CPUs in use at a time per user. This limit will temporarily be lifted in lieu of new limits based on cumulative GPU and CPU running times. This means that if GPUs are available, then more GPUs than the normal
limit can be used by each user at a time, though for a shorter period (e.g. 16 GPUs for one day). This is to improve job flexibility while also maximizing use of resources, to allow users to run more jobs at a time, or to run larger single calculations or
experiments.
The new limits will be activated starting this week, and may require a lot of adjustment at first to optimize the load on the
dgx partitions. These limits will be posted on the HPC status page once activated. If you have any questions about, or experience issues due to the new limits, let me know.
Jupyter Notebook app
The Jupyter Notebook app on the HPC Portal is being replaced by the new Jupyter Server app, and will no longer appear in the list of interactive apps.
To follow status updates on the news mentioned above, check out the link below:
https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news
Cheers,
Rob Yelle
HPC Manager