CoE HPC News, February 2 2024: Updates on EL8/EL9 migration, EL9 fixes, VSCode

HPCC users, See below the latest news on the CoE HPC cluster. Busy queues The demand for resources, especially GPUs, has been very high lately, due to upcoming deadlines. This is resulting in much longer wait times than usual, so please be patient. If you have been waiting for over a day for your resources, let me know and leave your job in the queue as I may be able to work it in or at least determine what is holding it up. EL8/EL9 upgrades The cluster upgrade to EL8 and EL9 has been progressing slowly as various bugs are being worked through. Progress is expected to pick up the week of February 12, as submit nodes and more compute nodes from the “share”, “dgx2” and “dgxs” partition will be migrated. These partitions will not be migrated all at once, but each week one or more nodes from each partition will be upgraded until the migration is completed. At this point it is anticipated that the migration will be completed by the end of March. At present the DGX nodes “dgx2-4” and “dgxs-3” have been upgraded to EL9 with support for Cuda 12.2, and are now available through the dgx2 and dgxs partitions. Note that if you do not wish to land on an EL8 or EL9 node at this time, then you may request the “el7” feature, either through the HPC portal or by adding the “--constraint=el7” option to your srun or sbatch options. For those of you who would prefer to land on an EL9 node, for instance, because of updated dependencies and software support, or the availability of more recent versions of Cuda, then you may request the “el9” feature through the portal or use the “--constraint=el9” option in srun or sbatch. Please continue to report any issues that you encounter with the cluster as the upgrade progresses. Portal and ssh fixes on Nvidia DGX H100 and other EL9 systems The HPC portal and ssh are now working properly on the dgxh and other EL9 systems. The dgxh partition is currently available on the Advanced Desktop (Xfce desktop only!) and Jupyter Server apps. Note that with the Advanced Desktop on EL9 systems, you should disable the screen lock or you risk being locked out of your session and you’ll have to start over. To do this, click on “Applications” on the upper left hand corner, then select “Settings”, then “Xfce screensaver”, then the “Lock Screen” tab, and turn off the “Enable Lock Screen”. The dgxh partition will remain in testing phase until further notice while other issues are being addressed. VSCode issues Some users have recently reported a problem using VSCode on the HPC submit nodes. It appears that the latest version or update of VSCode no longer supports EL7, so you may not be able to run it on the submit nodes. The submit nodes will be upgraded to EL8 later this month through early March. Until then, my recommendation is to use an older version of VSCode if possible, or use the Flip servers until the submit nodes are upgraded. Springbreak Maintenance The next cluster maintenance is scheduled for Springbreak, March 25-29. If you have any questions or concerns, let me know. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C1294b51a7b6d4621194c08dc2443db64%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638425121056315858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qFonYp91LBAQEHaUPIAP7tsisnSKU%2FZvM4fp%2F812LT4%3D&reserved=0> Have a nice weekend, Rob Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian