CoE HPC News, June 5 2023: Summer break maintenance

HPCC users, Please check out the latest HPC cluster news regarding upcoming maintenance below. Summer break maintenance The cluster will undergo its regularly scheduled quarterly maintenance during Summer break, June 19-21. The following activities will be performed: Operating system updates Slurm upgrade and configuration changes BIOS and firmware updates as needed Nvidia/CUDA driver updates Infiniband updates The entire cluster will be offline starting Monday afternoon the 19th at 1pm, and until approximately Wednesday the 21st at 3pm. Jobs scheduled to run into this offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time accordingly. For the latest cluster news and status updates, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C01%7Ccluster-users%40engr.orst.edu%7C08e518d1a23c432bf3f708db661e16e3%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638216051629511813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KU2JRlUfWiZvlZstN4I9a67rpeHXvUnzy7jRKoWKiAk%3D&reserved=0> Robert Yelle HPC Manager

Cluster users, This is your friendly reminder that next week is maintenance week (see message below), so if you submit long jobs that will not complete by next Monday at 1pm, they will remain pending until after the offline period ends (est. next Wednesday at 3pm). If you want your job to start before then, please modify your time limit so that it can complete before next Monday at 1pm (e.g. to 4 days or less). Cheers, Rob From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Monday, June 5, 2023 at 4:39 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: CoE HPC News, June 5 2023: Summer break maintenance HPCC users, Please check out the latest HPC cluster news regarding upcoming maintenance below. Summer break maintenance The cluster will undergo its regularly scheduled quarterly maintenance during Summer break, June 19-21. The following activities will be performed: Operating system updates Slurm upgrade and configuration changes BIOS and firmware updates as needed Nvidia/CUDA driver updates Infiniband updates The entire cluster will be offline starting Monday afternoon the 19th at 1pm, and until approximately Wednesday the 21st at 3pm. Jobs scheduled to run into this offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time accordingly. For the latest cluster news and status updates, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C01%7Ccluster-users%40engr.orst.edu%7C15449e1129b741049e7d08db6d25517c%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638223779258702622%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tYUmtW149MAgCcN5laRLYmQyN8lxHXim%2FPaUiNoagKw%3D&reserved=0> Robert Yelle HPC Manager

Cluster users, The cluster is back online but at limited capacity while maintenance continues. The cluster should return to full capacity by tomorrow. Cheers, Rob From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Wednesday, June 14, 2023 at 3:18 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: Re: CoE HPC News, June 5 2023: Summer break maintenance Cluster users, This is your friendly reminder that next week is maintenance week (see message below), so if you submit long jobs that will not complete by next Monday at 1pm, they will remain pending until after the offline period ends (est. next Wednesday at 3pm). If you want your job to start before then, please modify your time limit so that it can complete before next Monday at 1pm (e.g. to 4 days or less). Cheers, Rob From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Monday, June 5, 2023 at 4:39 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: CoE HPC News, June 5 2023: Summer break maintenance HPCC users, Please check out the latest HPC cluster news regarding upcoming maintenance below. Summer break maintenance The cluster will undergo its regularly scheduled quarterly maintenance during Summer break, June 19-21. The following activities will be performed: Operating system updates Slurm upgrade and configuration changes BIOS and firmware updates as needed Nvidia/CUDA driver updates Infiniband updates The entire cluster will be offline starting Monday afternoon the 19th at 1pm, and until approximately Wednesday the 21st at 3pm. Jobs scheduled to run into this offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time accordingly. For the latest cluster news and status updates, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C01%7Ccluster-users%40engr.orst.edu%7C2a0f9064d53a437e33fe08db733ae72a%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638230469036866633%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gOFIPd691gul9CwSZeUVZqAleU%2F8fYjTfMRDPyMd1bI%3D&reserved=0> Robert Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian