CoE HPC News, June 27 2024: Current status and various reminders

Cluster users, Survey If you haven’t had a chance yet, please take a few minutes to complete the survey below, your feedback is important. https://oregonstate.qualtrics.com/jfe/form/SV_290Wnkkv7IFqSW2<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Foregonstate.qualtrics.com%2Fjfe%2Fform%2FSV_290Wnkkv7IFqSW2&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C9c1fb80791df419ec4be08dc96e4f3fc%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638551157283602454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2kSBvlfMGlQFKKu1eelaquJgyeH48kKL%2BHvbQVKM8yw%3D&reserved=0> Cluster status Most of the cluster has been upgraded and is back online. The following hosts are still offline until further notice: submit-a (please use submit-b or submit-c instead) dgx2-[3,4] dgxs-[1-3] cn-h-[5-8] The HPC portal is back to a “mostly working” state, but some interactive applications do not work well on the portal at present. The portal is still being worked on. In the meantime, if your desired application does not work well, try using the Advanced COEHPC Desktop and launch the application from there. Submit node hostkeys When you access the upgraded nodes, you may be met with the following message: "host key for submit-b.hpc.eng.oregonstate.edu has changed and you have requested strict checking. Host key verification failed." Or something similar. To address this, please remove your old host keys as follows, e.g.: ssh-keygen -R submit-b.hpc.engr.oregonstate.edu Repeat this for all submit nodes, and for any other HPC hosts that you need ssh access for. After that, try connecting again and accept the new host keys and you should be set. New HPC storage All HPC share data has been migrated to our new DDN storage appliance, still located on /nfs/hpc/share, and all upgraded nodes are now using this storage. This directory is available on the HPC cluster only and is not visible on the Flips (access.engr.oregonstate.edu)! I will mount the old share on /mnt/share on the submit nodes for a week or so to give people a chance to check the current HPC shares against the old one, and to copy files or directories that may be missing. If anyone has any questions or problems, let me know. For up-to-date status on the cluster, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C02%7Ccluster-users%40engr.orst.edu%7C9c1fb80791df419ec4be08dc96e4f3fc%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638551157283602454%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=EIsyxwJaSqMIIae6Z2GsEaP3h5mYNMi4eATN2%2BK6m18%3D&reserved=0> Rob Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian