Job Summary:
NVIDIA AI is a leader in GPU Computing, focused on markets such as gaming, automotive, and AI. They are seeking a Senior Software SDET Test Development Engineer to develop and execute test plans for their platforms, ensuring high reliability and performance through automation and collaboration with various teams.
Responsibilities:
• Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
• Installing and testing various systems OS, server firmware and SW stack.
• Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
• Build, develop/debug server and OS level automation front-end and back-end framework and tests
• Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
• Work in an agile software development team with very high production quality standards.
• Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
Qualifications:
Required:
• Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
• 5+ years proven experience; or master’s degree.
• Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
• Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
• Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
• Experience in using AI development tools for test plans creation, test cases development and test cases automation
• Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
• Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
Preferred:
• AI related tools, LLM and NLP.
• Experience working with NVIDIA GPU hardware is a strong plus.
• Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
• Background in parallel programming ideally CUDA/OpenCL is a plus
Company:
Explore the latest breakthroughs made possible with AI. Founded in , the company is headquartered in Santa Clara, CA, US, , with a team of 10001+ employees. The company is currently Late Stage.