Job Summary:
NVIDIA is the platform for every new AI-powered application, seeking a senior engineer to own and evolve the core NIM Platform SDK and microservice framework. This hands-on role involves solving deep software engineering challenges and collaborating across product teams to deliver production-grade software supporting NVIDIA and the wider AI ecosystem.
Responsibilities:
• Develop and advance the inference microservice framework: OpenAI-compatible API endpoints, inference backend integrations (vLLM, SGLang, TensorRT-LLM, Dynamo), middleware, observability instrumentation, and production hardening across cloud, on-prem, and Kubernetes environments.
• Architect significant new features in open-source codebases, shepherding them through project acceptance and into production.
• Build and optimize high-performance model download and caching pipelines across multiple cloud storage backends (NGC, HuggingFace, S3, GCS) - parallel transfers, integrity verification, and seamless multi-cloud operability.
• Implement the model profile and manifest system that ensures NIMs are optimized for every NVIDIA GPU platform - profile selection, validation, and multi-GPU configuration.
• Develop and refine cloud microservice patterns - service discovery, health checking, graceful degradation, API gateway integration, and end-to-end request lifecycle management - to ensure NIMs operate reliably at scale in diverse cloud deployment environments.
• Be a role model for high-quality code across Python, Rust, and C/C++, and model guidelines in test-driven development, agentic AI-assisted development, code review, and cross-team collaboration.
• Mentor teammates and establish high engineering standards for container quality, security, and operability.
Qualifications:
Required:
• BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience).
• 8+ years of demonstrated experience developing performant microservice, cloud software and/or platform infrastructure roles.
• Deep technical expertise in cloud-native microservice architecture, including service mesh, API gateways, load balancing, and distributed system build patterns.
• Expertise in high-performance data pipelines with parallel I/O, caching strategies, and integrity verification across distributed storage systems.
• Solid understanding of containerized application delivery using technologies such as Docker, Kubernetes, and Helm.
• Understanding of application security principles, including secure coding practices, vulnerability mitigation, secrets management, and supply chain integrity for containerized environments.
• Strong problem-solving skills grounded in first-principles reasoning and critical analysis.
• Excellent programming skills in Python and Rust, with strong foundations in algorithms, development patterns, and software engineering principles.
Preferred:
• Direct involvement in open-source inference backends such as vLLM, TRTLLM, or SGLang.
• Direct involvement in disaggregated serving frameworks like NVIDIA Dynamo.
• Experience building and operating production microservices at scale.
• Deep knowledge of multi-cloud deployment strategies across AWS, GCP, Azure, and OCI.
• Experience operating in regulated, air-gapped, or disconnected environments where strict security and compliance controls are required.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.