Cluster users,
The next cluster maintenance is scheduled for the week of September 23. The maintenance activities planned for this time include:
Operating system updates
OnDemand HPC portal upgrade
Slurm update and configuration changes
Nvidia driver updates
BIOS and firmware updates as needed
Miscellaneous hardware maintenance as needed
The entire cluster will be offline starting Monday, September 23 at 8am, and will remain offline until approximately Tuesday the 24th at 4pm. Jobs scheduled to run into that offline period will remain pending with the message “ReqNodeNotAvail,
Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time limit accordingly, e.g. to change to 2 days do:
scontrol update job {jobid} TimeLimit=2-00:00:00
Alternatively, you can cancel your pending job, adjust your walltime (using the --time option) and resubmit.
If you have any questions or concerns, let me know. For up-to-date status on the cluster, check out the link below:
https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news
Rob Yelle
HPC Manager