
HPCC users, Please check out the latest news below regarding the COE HPC cluster: New Nvidia DGX H100 systems are online! Many of you may have received the announcement last week that three new Nvidia DGX H100 systems have been added to the COE HPC cluster. Each DGX H100 system has 112 CPUs, 8 Nvidia H100 GPUs, and 2TB RAM! These systems are now available in a new partition called “dgxh”. This partition is in a testing period for at least this week to resolve existing issues and smooth out any other kinks that crop up. During this testing period, the current resource limits are 2 GPUs and 32 CPUs per user, and the time limit is 24 hours. Be advised that the DGX H100 systems are running RHEL9 based linux, which is different than the RHEL7 based systems currently used by the rest of the cluster. Also, these systems are not yet available through the HPC portal or through ssh. Give them a try and let me know of any issues you encounter. DGX partition change We currently have separate “dgx” and “dgx2” partitions for the our DGX-2 systems, depending on how many GPUs are needed. For various reasons that have come up over time, it is no longer advantageous to have separate partitions for these systems, so these will again be merged into a single “dgx2” partition. The “dgx” partition will be phased out or possibly re-purposed, so please use “dgx2” instead. The resource limits will remain at 4 GPUs and 32 CPUs on the dgx2 partition for now, but may increase again in the future. Submit-a offline this week Just a reminder that Submit-a will be offline until next Monday the 30th for maintenance. Until then, please use submit-b or submit-c. If you have any questions or concerns, let me know. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C01%7Ccluster-users%40engr.orst.edu%7Cc5d545a1eb3c4b3a7c1908dbd4df74e7%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638337828417343177%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pH%2B9wcsLYJvJhQtxfEcHxNYnbUeB5O4Vmk7Atatg6Qc%3D&reserved=0> Cheers, Rob Yelle HPC Manager