
Cluster users, The COE cluster is back online and nearly restored to full capacity. If anyone experiences any issues with the cluster, let me know. Happy Holidays! Rob Yelle HPC Manager From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Monday, December 2, 2024 at 12:09 PM To: cluster-users@lists.engr.oregonstate.edu <cluster-users@lists.engr.oregonstate.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: CoE HPC News, December 2 2024: Christmas break maintenance Cluster users, The next cluster maintenance is scheduled for the week of December 16. The maintenance activities planned for this time include: Head node upgrade OnDemand HPC portal upgrade Slurm upgrade and configuration changes Operating system image updates Nvidia GPU and infiniband driver updates BIOS and firmware updates as needed Miscellaneous hardware maintenance as needed The entire cluster will be offline starting Monday, December 16 at 1pm, and will remain offline until approximately Wednesday the 18th at 4pm. Jobs scheduled to run into that offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time limit accordingly, e.g. to change to 2 days do: scontrol update job {jobid} TimeLimit=2-00:00:00 Alternatively, you can cancel your pending job, adjust your walltime (using the --time option) and resubmit. If you have any questions or concerns, let me know. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C02%7Ccluster-users%40lists.engr.oregonstate.edu%7C060f502fc81349a1de6f08dd21384ca1%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638703247864428897%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=6EqcMzLwP8wmd2JK0ZQYshyMMFLmIIGyd9nYWfguLlM%3D&reserved=0> Rob Yelle HPC Manager