Senior System Software Engineer - Dynamo-Triton Inference Server
Seattle, WA · On-site
$139K - $183K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Seattle, WA · On-site
$139K - $183K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Seattle, WA · On-site
$139K - $183K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$143K - $189K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$143K - $189K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$142K - $188K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$142K - $188K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
OR · On-site
$122K - $161K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$143K - $189K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
Santa Clara, CA · On-site
$143K - $189K/yr
... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...
$109K - $149K/yr
The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure ...
$109K - $149K/yr
The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure ...
$105K - $137K/yr
Charlotte, NC Duration: 9-12 Month JD: vLLM TensorRTLLM Triton Inference Server SGLang Inference ... HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills
$105K - $137K/yr
Charlotte, NC Duration: 9-12 Month JD: vLLM TensorRTLLM Triton Inference Server SGLang Inference ... HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills
... NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT - CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) - Hugging Face Transformers, LangChain, LlamaIndex - Model quantization: GGUF, AWQ, GPTQ ...
... NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT - CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) - Hugging Face Transformers, LangChain, LlamaIndex - Model quantization: GGUF, AWQ, GPTQ ...
... NVIDIA Triton Inference Server • Familiarity with auction mechanics or bidding systems in an ad tech context • Experience embracing AI as a strategic advantage in engineering, following ...
... NVIDIA Triton Inference Server • Familiarity with auction mechanics or bidding systems in an ad tech context • Experience embracing AI as a strategic advantage in engineering, following ...
Tampa, FL · On-site
NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT * CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) * Hugging Face Transformers, LangChain, LlamaIndex * Model quantization: GGUF, AWQ, GPTQ
Tampa, FL · On-site
NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT * CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) * Hugging Face Transformers, LangChain, LlamaIndex * Model quantization: GGUF, AWQ, GPTQ
Laurel, MD · On-site +1
Configure and optimize containers using NVIDIA Triton Inference Server for high-performance inference * Profile, tune, and optimize solutions for production workloads * Create comprehensive user ...
Laurel, MD · On-site +1
Configure and optimize containers using NVIDIA Triton Inference Server for high-performance inference * Profile, tune, and optimize solutions for production workloads * Create comprehensive user ...
Design, deploy, and optimize vLLM inference infrastructure on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang * Implement advanced inference optimizations: continuous ...
Design, deploy, and optimize vLLM inference infrastructure on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang * Implement advanced inference optimizations: continuous ...
Santa Clara, CA · On-site
$73.50 - $96.75/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Santa Clara, CA · On-site
$73.50 - $96.75/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework ( , NVIDIA Triton Inference Server ( , TensorRT ( , TensorRT-LLM ( * Excellent C/C++ programming skills, including ...
Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework ( , NVIDIA Triton Inference Server ( , TensorRT ( , TensorRT-LLM ( * Excellent C/C++ programming skills, including ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
$74.50 - $102.25/hr
... Triton Inference Server, TensorRT, TensorRT-LLM, NVIDIA CUDA-X * Hands-on expertise with scaled AI cloud environments (e.g., AWS, Azure, GCP) and on-premises / hybrid infrastructure, in particular ...
$74.50 - $102.25/hr
... Triton Inference Server, TensorRT, TensorRT-LLM, NVIDIA CUDA-X * Hands-on expertise with scaled AI cloud environments (e.g., AWS, Azure, GCP) and on-premises / hybrid infrastructure, in particular ...
OR · On-site
Work directly with startup founders and engineering teams toarchitect and optimize AIworkloadsusing NVIDIA technologies including CUDA-X libraries, TensorRT-LLM, Triton Inference Server, NVIDIA NeMo ...
Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework, NVIDIA Triton Inference Server, TensorRT, TensorRT-LLM * Excellent C/C++ programming skills, including debugging ...
Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework, NVIDIA Triton Inference Server, TensorRT, TensorRT-LLM * Excellent C/C++ programming skills, including debugging ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
Santa Clara, CA · On-site
$74 - $97.50/hr
Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...
$5.53 - $7.82
7% of jobs
$7.82 - $10.12
15% of jobs
$10.64 is the 25th percentile. Wages below this are outliers.
$10.12 - $12.41
12% of jobs
The median wage is $14.05 / hr.
$12.41 - $14.71
22% of jobs
$16.62 is the 75th percentile. Wages above this are outliers.
$14.71 - $17
22% of jobs
$17 - $19.30
11% of jobs
$19.30 - $21.59
4% of jobs
$21.59 - $23.89
2% of jobs
$23.89 - $26.18
2% of jobs
$26.18 - $28.47
1% of jobs
$28.47 - $30.77
1% of jobs
$5
$15
$30
| Aspect | Nvidia Triton Inference Server | Data Scientist |
|---|---|---|
| Primary Role | Deploying and managing AI inference models in production | Analyzing data to extract insights and build models |
| Required Skills | Machine learning deployment, server management, GPU utilization | Statistical analysis, programming, data visualization |
| Work Environment | Data centers, cloud platforms, AI infrastructure | Research labs, corporate offices, data analysis environments |
| Certifications | Deep learning, cloud computing, GPU certifications | Data science, analytics, programming certifications |
While Nvidia Triton Inference Server focuses on deploying AI models efficiently in production environments, Data Scientists primarily analyze data and develop models. Both roles require technical skills but serve different stages of AI development and deployment.
$139K - $183K/yr
Full-time
Posted 23 days ago
We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server. NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in problems from image classification to speech recognition to natural language processing. We are a fast-paced team building a highly-performant AI inference platform to make design and deployment of new AI models easier and accessible to all users.
What you'll be doing:
Develop world-class GPU-accelerated AI inference serving software.
Contribute to feature development and drive broad customer adoption.
Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform. This platform will ensure feature parity and effectively serve both Large Language Model (LLM) and non-LLM workloads.
Be an active member of the open source deep learning software engineering community.
Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies.
What we need to see:
MS or PhD in Computer Science or relevant field (or equivalent experience).
5+ years of professional experience working on deep learning software.
Excellent Rust & C++ skills, familiarity with Python, and strong programming & software design skills including debugging, performance analysis, and test design.
Experience with high-scale distributed systems and ML systems.
Strong communication skills and ability to work in a fast-paced, agile team environment.
Ways to stand out from the crowd:
Prior experience with AI frameworks and engines, such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM.
Knowledge of GPU memory management, cache management, or high-performance networking.
Experience with distributed systems programming.
Experience in contributing to a large open source project: use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc.
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993