Job Summary:
NVIDIA is a leading technology company known for pioneering visual computing and AI technologies. They are seeking a Senior Systems Software Engineer to take ownership of full-stack OS enablement for their DGX Station, focusing on both Windows and Linux platforms to ensure seamless functionality for AI applications.
Responsibilities:
• Own end-to-end Windows enablement for DGX Station—driving the platform from initial bring-up on Windows through WHQL certification to customer-ready shipping quality.
• Drive Linux bring-up and continuous enablement for DGX Station on DGX OS / Ubuntu, including kernel module integration, device tree and ACPI configuration, systemd services, initramfs, and dkms packaging.
• Enable and validate BIOS/UEFI, BMC, and system-level firmware for Windows and Linux on the Grace (Arm) + Blackwell GB300 architecture.
• Coordinate GPU driver, display driver, and compute driver bring-up and validation on Windows (WDDM, MCDM) and Linux (open-gpu-kernel-modules, DRM/KMS).
• Ensure the CUDA toolkit, cuDNN, TensorRT, NCCL, and NVIDIA’s AI SDK stack are fully functional on DGX Station on both Windows and Linux.
• Validate that NVIDIA AI applications—NIM microservices, NemoClaw, AI Workbench, and developer tools—run correctly on DGX Station across Windows and Linux.
• Drive the overall test strategy for DGX Station on Windows and Linux: functional testing, stress testing, power/thermal validation, sleep/resume and S-state cycles, Windows Update and Linux kernel-upgrade compatibility, and long-duration reliability.
• Be the primary technical interface with Microsoft (Windows on Arm, WHQL, driver signing) and ODM/OEM partners shipping DGX Station.
• Profile and optimize system performance—boot time, GPU compute throughput, NVLink-C2C and memory bandwidth utilization, power efficiency, and thermal behavior.
• Create and maintain platform documentation for DGX Station on Windows and Linux: bring-up guides, known issues, driver compatibility matrices, recovery and re-imaging procedures, and developer setup instructions.
Qualifications:
Required:
• BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience) and 12+ yrs of confirmed experience in systems software engineering with deep expertise in Windows platform enablement, driver development, or OS integration, and proven hands-on experience bringing up Linux on new hardware platforms.
• Strong hands-on experience with Windows internals: kernel-mode drivers, ACPI, power management, Secure Boot, UEFI, WDM/WDF driver frameworks, and the WHQL certification process.
• Solid understanding of Linux platform enablement: kernel modules, device tree / ACPI on Arm, systemd, initramfs, dkms, and packaging for Ubuntu / DGX OS.
• Experience with GPU driver stack, display drivers, or compute drivers on Windows and/or Linux. Familiarity with DirectX, WDDM, DRM/KMS, and GPU compute APIs is a strong plus.
• Experience enabling hardware platforms—bring-up, driver integration, validation, and certification for shipping products on Windows and Linux.
• Strong debugging and root-cause analysis skills across firmware, driver, and OS boundaries. Comfortable with WinDbg, kernel debugging (kd, kgdb/crash), crash dump analysis, ftrace/ETW, and performance profiling tools.
• Ability to work across organizational boundaries—coordinating with GPU driver, CUDA, firmware, BMC, and AI software teams as well as external partners (Microsoft, ODM/OEMs).
• Proficiency in C/C++ and Python. Experience with Arm architecture is a plus.
Preferred:
• Experience with Windows on Arm platforms—driver enablement, performance optimization, or application compatibility on Arm-based Windows devices.
• Hands-on experience with CUDA, TensorRT, or AI/ML frameworks on Windows and Linux—especially on Arm + NVIDIA GPU systems.
• Prior experience working with OEM/ODM partners or silicon vendors on Windows and Linux platform certification for workstation- or server-class hardware.
• Track record shipping workstation or server hardware products—from bring-up through general availability—with both Windows and Linux support.
• Experience with BMC, Redfish, out-of-band management, or platform manageability software on high-end workstations or servers.
• Experience with GPU-accelerated applications: AI training and inference, content creation tools, or scientific computing on Windows and Linux.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.