CoE HPC News, October 4 2023: Updates, upgrades and trainings

HPCC users, Here is the latest news regarding the COE HPC cluster: OS and GPU driver updates The cluster is currently running at reduced capacity while OS updates are being rolled out. In addition, some users have reported that they cannot run their GPU codes on some GPU nodes, which appears to be due to the older drivers on these nodes. New GPU drivers are available, so the GPU nodes are scheduled to be offline next week in a staggered fashion (some Monday the 9th and some Tuesday the 10th) so that the new drivers can be installed. If you need GPU resources, please schedule your jobs so that they can complete before these offline periods, otherwise they will remain pending with the message “Required Node not available” or “ReqNodeNotAvail”. If that happens, you may be able to get in by reducing your time requirement using the “--time” or “-t" option in srun or sbatch. OS upgrade Many of you are aware that many COE linux servers have been or are being upgraded to Enterprise Linux 9 (EL9). This will also happen on the HPC cluster over the course of this Fall quarter. More details to come. Intro to HPC workshop and other HPC training resources I am offering my “Intro to HPC” workshops every Wednesday at 3pm and Thursday at 4pm from October 11 through November 9. This workshop is designed to help new users become acquainted with and start using the cluster. To book a workshop session, click on the “Intro to HPC training session” link on the site below: https://it.engineering.oregonstate.edu/hpc<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc&data=05%7C01%7Ccluster-users%40engr.orst.edu%7C7a07f2eb71354e4f228208dbc50a1d6f%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638320419444610911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MmrrV2xyXiWMiEbd8s9N7QamVvv1ZrbhnyETCd9b98A%3D&reserved=0> In addition to the workshop, other HPC resources and training offerings from Nvidia and Mark III are displayed on the right hand side of the HPC web site. Mark III is offering a series of AI and ML trainings every Tuesday starting October 17 through November 14. I encourage you to check them out and sign up for them if you are interested. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C01%7Ccluster-users%40engr.orst.edu%7C7a07f2eb71354e4f228208dbc50a1d6f%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638320419444610911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oN%2FufWY1dRhhizexBfHOSfL43m0rEeC7Pw74bS3%2F4zk%3D&reserved=0> Cheers, Rob Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian