
Cluster users, Cluster online The cluster is back online, but currently at limited capacity while maintenance continues on the rest of the cluster. If you are trying to access a resource and you get either of these messages below: (partitiondown) Or (ReqNodeNotAvail) This means that the resource you are requesting is not available yet. But keep checking, additional resources will be brought back online as the maintenance progresses. Submit-a is still undergoing maintenance until next week, so in the meantime please use Submit-b, Submit-c, or just Submit, instead. When you access the upgraded nodes, you may be met with the following message: "host key for submit-b.hpc.eng.oregonstate.edu has changed and you have requested strict checking. Host key verification failed." Or something similar. To address this, please remove your old host keys as follows: ssh-keygen -R submit-a.hpc.engr.oregonstate.edu ssh-keygen -R submit-b.hpc.engr.oregonstate.edu ssh-keygen -R submit-c.hpc.engr.oregonstate.edu ssh-keygen -R submit.hpc.engr.oregonstate.edu After that, try connecting again and accept the new host keys and you should be set. New HPC storage All HPC share data has been migrated to our new DDN storage appliance, still located on /nfs/hpc/share, and all upgraded nodes are now using this storage. Everyone should check their HPC share directories to make sure their data is there as expected, and if you think something is missing or doesn’t look right, let me know. Survey If you haven’t had a chance yet, please take a few minutes to complete the survey below, your feedback is important. https://oregonstate.qualtrics.com/jfe/form/SV_290Wnkkv7IFqSW2<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Foregonstate.qualtrics.com%2Fjfe%2Fform%2FSV_290Wnkkv7IFqSW2&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C2eb650edc2e645589ef808dc915270fb%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638545030460896542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4IqBWtNfWZFsAVNpR9EES9lQ18W7hyw5AN7ikTlnlfg%3D&reserved=0> If anyone has any questions or problems related to the maintenance or upgrade, let me know. Rob Yelle HPC Manager From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Monday, June 17, 2024 at 3:45 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: CoE HPC News, June 17 2024: HPC survey Cluster users, The cluster is currently offline for maintenance. While you are waiting for the cluster to come back online, please take a few minutes to fill out the short survey below to provide feedback on how the cluster has contributed to your research and how we can better support the cluster going forward. https://oregonstate.qualtrics.com/jfe/form/SV_290Wnkkv7IFqSW2<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Foregonstate.qualtrics.com%2Fjfe%2Fform%2FSV_290Wnkkv7IFqSW2&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C2eb650edc2e645589ef808dc915270fb%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638545030460896542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4IqBWtNfWZFsAVNpR9EES9lQ18W7hyw5AN7ikTlnlfg%3D&reserved=0> We appreciate your feedback! Rob Yelle From: Yelle, Robert Brian <robert.yelle@oregonstate.edu> Date: Monday, June 10, 2024 at 3:50 PM To: cluster-users@engr.orst.edu <cluster-users@engr.orst.edu> Cc: Thompson, Christopher Scott via staff <staff@engr.oregonstate.edu> Subject: CoE HPC News, June 10 2024: Summer maintenance reminder Cluster users, This is a reminder that Summer maintenance is scheduled to start next week (after Finals week), and will go on until completed. During this time I plan to complete the migration of the remaining cluster nodes to the EL8 and EL9 based operating systems. In addition to the OS upgrades, regular maintenance activities will be performed during this time, e.g.: OnDemand HPC portal upgrade Slurm upgrade and configuration changes Nvidia driver updates BIOS and firmware updates as needed Miscellaneous hardware maintenance as needed The entire cluster will be offline starting Monday, June 17 at 1pm, and will remain offline until approximately Wednesday the 19th at 4pm. Jobs scheduled to run into that offline period will remain pending with the message “ReqNodeNotAvail, Reserved for maintenance”. If you wish for your Slurm job to start and finish before the offline period begins, you will need to adjust your time limit accordingly, e.g. to change to 2 days do: scontrol update job {jobid} TimeLimit=2-00:00:00 Alternatively, you can cancel your pending job, adjust your walltime (using the --time option) and resubmit. If you have any questions or concerns, let me know. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C2eb650edc2e645589ef808dc915270fb%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638545030460896542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=EJEUtu2lEsdiTenmsCU8K5NKZtvIuyEpiG%2FI7iTg2Qk%3D&reserved=0> Rob Yelle HPC Manager