CoE HPC News, April 9 2024: DGX-2 update, GPU Day and AI Training Series Reminders

HPCC users, Update on DGX-2 nodes Last week a number of users reported a problem with the Nvidia GPU driver on the DGX-2 systems after the upgrade to EL9, so the newly upgraded DGX-2 systems had to be taken offline. A ticket was opened with Nvidia last week to troubleshoot the issue. Earlier this morning Nvidia spotted a potential problem, and a fix is currently being rolled out to the DGX-2 nodes. Most of the DGX-2 systems should be back online by this evening. For up-to-date status on the cluster, including the DGX nodes, check out the link below: https://it.engineering.oregonstate.edu/hpc/hpc-cluster-status-and-news<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fit.engineering.oregonstate.edu%2Fhpc%2Fhpc-cluster-status-and-news&data=05%7C02%7Ccluster-users%40engr.orst.edu%7Cb1f985c2cd1f43f6f55c08dc58c54481%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638482851471707664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qPU%2BRVltoh%2FBm2uSnKlaBrTkFEhdIuT1EQjfs5%2F4kbo%3D&reserved=0> AI Week and GPU Day Reminder Just a reminder that this is AI Week. Tomorrow (Wednesday April 10) is GPU Day, where Nvidia and Mark III will team up to “bring you an action-packed day of learning about what GPUs are, how they can help your research, and how to optimize them for your workloads”. To register for GPU day and other AI Week events, please check out the link below: https://dri.oregonstate.edu/ai-week<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdri.oregonstate.edu%2Fai-week&data=05%7C02%7Ccluster-users%40engr.orst.edu%7Cb1f985c2cd1f43f6f55c08dc58c54481%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638482851471707664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=u%2F%2BwUw9QuOlcsT38t1FAjN2Zri2%2BBiz0FLtaZ22OVS0%3D&reserved=0> MarkIII AI Training Series Reminder Another reminder that Mark III is offering a seven part series on AI and ML trainings every Tuesday at 11am starting next week, April 16 and running through May 28. The training topics are listed below: April 16 - Intro to Machine Learning and AI: The Basics, A Tutorial, and Lab April 23 - Intro to Deep Learning: An Introduction to Neural Networks April 30 - Introduction to Datasets May 7 - Introduction to Large Language Models May 14 - Getting Started with Containers and the software stack around AI + How to get started working with OSU HPC Services May 21 - Intro to Omniverse & Digital Twins May 28 - Intro to Isaac Sim and AI in Robotics I encourage you to check them out and sign up for them using the link below if you are interested: https://trending.markiiisys.com/osu-aiseries-2024<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftrending.markiiisys.com%2Fosu-aiseries-2024&data=05%7C02%7Ccluster-users%40engr.orst.edu%7Cb1f985c2cd1f43f6f55c08dc58c54481%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638482851471707664%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=%2BqEiH7Nj1NvieEW3yqgjy2gFLa0H1DpTanSTBqsBZOk%3D&reserved=0> If you have any questions or concerns, let me know. Rob Yelle HPC Manager
participants (1)
-
Yelle, Robert Brian