
HPCC users, The HPC cluster is back online with limited availability. Various portions of the cluster are still undergoing maintenance, and additional compute resources will become available in time as cluster maintenance progresses. If you encounter any problems, let me know. Rob From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Friday, June 11, 2021 at 9:52 AM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Subject: Re: CoE HPC News, June 2 edition: Summerbreak maintenance HPCC users, This is just a friendly reminder that cluster maintenance will take place next week, details described in message below and in this link: https://it.engineering.oregonstate.edu/hpc-cluster-status-and-news Please note that jobs that run into the offline window will be pending with the message “ReqNodeNotAvail”, so if you need your job to run before the offline period, please see instructions provided below or in the link above. Rob From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Wednesday, June 2, 2021 at 12:00 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Subject: CoE HPC News, June 2 edition: Summerbreak maintenance HPCC users, The HPC Cluster will undergo its regularly scheduled quarterly maintenance after Finals Week, during Summerbreak (June 14-18). The following maintenance activities will be performed: OS updates BIOS and firmware updates Infiniband updates Slurm scheduler configuration changes Power redistribution for various compute nodes Miscellaneous hardware maintenance as needed Maintenance activities will begin the Monday after Finals week, and the entire cluster will be offline starting Tuesday the 15th at 8am, and will remain offline until Wednesday the 16th at 2pm. After that, maintenance will continue on parts of the cluster throughout the week. Jobs found running on the Univa scheduler by Tuesday morning will need to be terminated, and Slurm jobs scheduled to run into that offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the maintenance period begins, you will need to adjust your time limit accordingly, e.g. to change to 2 days do: scontrol update job {jobid} TimeLimit=2-00:00:00 Alternatively, you can cancel your pending job, adjust your walltime using the --time option, and resubmit. If you have any questions or concerns, let me know. Rob Yelle HPC Manager