Job Summary:
NVIDIA is the world leader in GPU Computing, passionate about various markets including gaming, automotive, and AI. They are seeking a Senior Software SDET Test Development Engineer to develop and execute test plans for their platforms, focusing on reliability and automation in a high-quality production environment.
Responsibilities:
โข Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
โข Installing and testing various systems OS, server firmware and SW stack.
โข Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
โข Build, develop/debug server and OS level automation front-end and back-end framework and tests
โข Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
โข Work in an agile software development team with very high production quality standards.
โข Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
Qualifications:
Required:
โข Bachelorโs Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
โข 5+ years proven experience; or masterโs degree.
โข Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
โข Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etcโฆ) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
โข Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etcโฆ), NLP and LLM benchmarking
โข Experience in using AI development tools for test plans creation, test cases development and test cases automation
โข Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
โข Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) โ huge plus
Preferred:
โข AI related tools, LLM and NLP.
โข Experience working with NVIDIA GPU hardware is a strong plus.
โข Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
โข Background in parallel programming ideally CUDA/OpenCL is a plus
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.