Nvidia Triton Inference Server Jobs (NOW HIRING)

Senior System Software Engineer - Dynamo-Triton Inference Server

$139K - $183K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia

Senior System Software Engineer - Dynamo-Triton Inference Server

Seattle, WA · On-site

$139K - $183K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$143K - $189K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$143K - $189K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

NVIDIA

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$142K - $188K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

NVIDIA

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$142K - $188K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia

OR · On-site

$122K - $161K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia

OR · On-site

$122K - $161K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia Corporation

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$143K - $189K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

Nvidia Corporation

Senior System Software Engineer - Dynamo-Triton Inference Server

Santa Clara, CA · On-site

$143K - $189K/yr

... Triton Inference Server ... NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic ...

ITCAPS LLC

Senior MLOps / LLMOps Engineer

Jersey City, NJ

$109K - $149K/yr

The ideal candidate will have strong experience with Kubernetes/OpenShift, NVIDIA TensorRT-LLM, Triton Inference Server, and scalable AI infrastructure. This role focuses on building reliable, secure ...

ITCAPS LLC

Senior MLOps / LLMOps Engineer

Jersey City, NJ

$109K - $149K/yr

Apex 2000

LLM Inference / AI Infrastructure Engineer

Charlotte, NC

$105K - $137K/yr

Charlotte, NC Duration: 9-12 Month JD: vLLM TensorRTLLM Triton Inference Server SGLang Inference ... HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills

Apex 2000

LLM Inference / AI Infrastructure Engineer

Charlotte, NC

$105K - $137K/yr

Charlotte, NC Duration: 9-12 Month JD: vLLM TensorRTLLM Triton Inference Server SGLang Inference ... HAVE YOU WORKED ON Nvidia H200? If yes, chances are you will know all above skills

Tampa Brass & Aluminum Corp

AI Software Engineer

Tampa, FL

... NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT - CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) - Hugging Face Transformers, LangChain, LlamaIndex - Model quantization: GGUF, AWQ, GPTQ ...

Tampa Brass & Aluminum Corp

AI Software Engineer

Tampa, FL

Unity South APAC (SEA, ANZ, IND Subcont.)

Staff Backend Engineer, ML Inference Systems

Mountain View, CA · On-site

... NVIDIA Triton Inference Server • Familiarity with auction mechanics or bidding systems in an ad tech context • Experience embracing AI as a strategic advantage in engineering, following ...

Unity South APAC (SEA, ANZ, IND Subcont.)

Staff Backend Engineer, ML Inference Systems

Mountain View, CA · On-site

Tampa Brass & Aluminum Corp

AI Software Engineer

Tampa, FL · On-site

NVIDIA Triton Inference Server, NVIDIA NIM, NVIDIA NeMo, TensorRT * CUDA, cuBLAS, cuDNN, NCCL (multi-GPU) * Hugging Face Transformers, LangChain, LlamaIndex * Model quantization: GGUF, AWQ, GPTQ

Tampa Brass & Aluminum Corp

AI Software Engineer

Tampa, FL · On-site

Wyetech

Software Engineer 2 (Hybrid)

Laurel, MD · On-site +1

Configure and optimize containers using NVIDIA Triton Inference Server for high-performance inference * Profile, tune, and optimize solutions for production workloads * Create comprehensive user ...

Wyetech

Software Engineer 2 (Hybrid)

Laurel, MD · On-site +1

Hallmark Global Solutions Ltd

AI/ML Inference Engineer

Charlotte, NC · Hybrid

Design, deploy, and optimize vLLM inference infrastructure on NVIDIA H200 GPU clusters using TensorRT-LLM, Triton Inference Server, and SGLang * Implement advanced inference optimizations: continuous ...

Hallmark Global Solutions Ltd

AI/ML Inference Engineer

Charlotte, NC · Hybrid

NVIDIA

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$73.50 - $96.75/hr

Experience with one of NVIDIA Dynamo, Triton Inference Server, or TensorRT-LLM for model optimization and serving. * GPU orchestration using NVIDIA GPU Operator, NIM Operator, and Multi-Instance GPU ...

NVIDIA

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$73.50 - $96.75/hr

NVIDIA

Senior Solutions Architect, Generative AI

Santa Clara, CA · On-site

Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework ( , NVIDIA Triton Inference Server ( , TensorRT ( , TensorRT-LLM ( * Excellent C/C++ programming skills, including ...

NVIDIA

Senior Solutions Architect, Generative AI

Santa Clara, CA · On-site

Nvidia

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Nvidia

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Nvidia

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Nvidia

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Nvidia

Senior Solutions Architect, NVIDIA Cloud Partners

Santa Clara, CA

$74.50 - $102.25/hr

... Triton Inference Server, TensorRT, TensorRT-LLM, NVIDIA CUDA-X * Hands-on expertise with scaled AI cloud environments (e.g., AWS, Azure, GCP) and on-premises / hybrid infrastructure, in particular ...

Nvidia

Senior Solutions Architect, NVIDIA Cloud Partners

Santa Clara, CA

$74.50 - $102.25/hr

Nvidia

OR · On-site

Work directly with startup founders and engineering teams toarchitect and optimize AIworkloadsusing NVIDIA technologies including CUDA-X libraries, TensorRT-LLM, Triton Inference Server, NVIDIA NeMo ...

Nvidia

OR · On-site

Nvidia

Senior Solutions Architect, Generative AI

Santa Clara, CA · On-site

Experience with NVIDIA GPUs and software libraries, such as NVIDIA NeMo Framework, NVIDIA Triton Inference Server, TensorRT, TensorRT-LLM * Excellent C/C++ programming skills, including debugging ...

Nvidia

Senior Solutions Architect, Generative AI

Santa Clara, CA · On-site

Nvidia Corporation

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Nvidia Corporation

Solutions Architect, Inference Deployments

Santa Clara, CA · On-site

$74 - $97.50/hr

Showing results 1-20

People also search for

Ai Mod

Nvidia Triton Inference Server Jobs

Nvidia Triton Inference Server information

See salary details

$15

$30

How much do nvidia triton inference server jobs pay per hour?

As of Jun 5, 2026, the average hourly pay for nvidia triton inference server in the United States is $15.29, according to ZipRecruiter salary data. Most workers in this role earn between $10.34 and $17.31 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an Nvidia Triton Inference Server Engineer, and why are they important?

To thrive as an Nvidia Triton Inference Server Engineer, you need a strong background in software engineering, deep learning frameworks, and computer science principles, typically supported by a degree in a relevant field. Experience with Triton Inference Server, containerization technologies (like Docker), and cloud platforms, as well as proficiency in languages such as Python and C++, are highly valuable. Problem-solving, attention to detail, and effective communication are critical soft skills for collaborating with teams and troubleshooting complex deployment issues. These skills ensure efficient deployment of AI models, optimal server performance, and successful integration in production environments.

What are some typical challenges faced when deploying machine learning models with Nvidia Triton Inference Server in a production environment?

When deploying machine learning models using Nvidia Triton Inference Server, job seekers often encounter challenges such as optimizing model throughput and latency to meet real-time requirements, managing GPU resources efficiently for multiple concurrent models, and ensuring compatibility with various model frameworks (like TensorFlow, PyTorch, or ONNX). Additionally, integrating Triton with existing CI/CD pipelines and monitoring inference performance in production can require close collaboration with DevOps and data engineering teams. Overcoming these challenges typically involves strong troubleshooting skills and a solid understanding of both machine learning deployment and cloud infrastructure.

What is Nvidia Triton Inference Server?

Nvidia Triton Inference Server is an open-source software platform designed to simplify the deployment of AI models at scale. It supports multiple frameworks such as TensorFlow, PyTorch, ONNX, and more, allowing organizations to serve models from different sources using a single standardized interface. Triton offers features like model versioning, concurrent model execution, GPU and CPU support, and advanced scheduling to maximize inference performance. It is commonly used in production environments to manage and optimize the deployment of machine learning and deep learning models.

What is the difference between Nvidia Triton Inference Server vs Data Scientist?

Aspect	Nvidia Triton Inference Server	Data Scientist
Primary Role	Deploying and managing AI inference models in production	Analyzing data to extract insights and build models
Required Skills	Machine learning deployment, server management, GPU utilization	Statistical analysis, programming, data visualization
Work Environment	Data centers, cloud platforms, AI infrastructure	Research labs, corporate offices, data analysis environments
Certifications	Deep learning, cloud computing, GPU certifications	Data science, analytics, programming certifications

While Nvidia Triton Inference Server focuses on deploying AI models efficiently in production environments, Data Scientists primarily analyze data and develop models. Both roles require technical skills but serve different stages of AI development and deployment.

Senior System Software Engineer - Dynamo-Triton Inference Server

Nvidia

Seattle, WA • On-site

Apply

$139K - $183K/yr

Full-time

Posted 23 days ago

Job description

We are looking for a Senior System Software Engineer to work on Dynamo-Triton Inference Server. NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. Academic and commercial groups around the world are using GPUs to power a revolution in AI, enabling breakthroughs in problems from image classification to speech recognition to natural language processing. We are a fast-paced team building a highly-performant AI inference platform to make design and deployment of new AI models easier and accessible to all users.

What you'll be doing:

Develop world-class GPU-accelerated AI inference serving software.
Contribute to feature development and drive broad customer adoption.
Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform. This platform will ensure feature parity and effectively serve both Large Language Model (LLM) and non-LLM workloads.
Be an active member of the open source deep learning software engineering community.
Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies.

What we need to see:

MS or PhD in Computer Science or relevant field (or equivalent experience).
5+ years of professional experience working on deep learning software.
Excellent Rust & C++ skills, familiarity with Python, and strong programming & software design skills including debugging, performance analysis, and test design.
Experience with high-scale distributed systems and ML systems.
Strong communication skills and ability to work in a fast-paced, agile team environment.

Ways to stand out from the crowd:

Prior experience with AI frameworks and engines, such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM.
Knowledge of GPU memory management, cache management, or high-performance networking.
Experience with distributed systems programming.
Experience in contributing to a large open source project: use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 1, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993

Website

nvidia.com

Social media

View All Nvidia Jobs

Apply

Nvidia Triton Inference Server Jobs (NOW HIRING)

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior System Software Engineer - Dynamo-Triton Inference Server

Senior MLOps / LLMOps Engineer

Senior MLOps / LLMOps Engineer

LLM Inference / AI Infrastructure Engineer

LLM Inference / AI Infrastructure Engineer

AI Software Engineer

AI Software Engineer

Staff Backend Engineer, ML Inference Systems

Staff Backend Engineer, ML Inference Systems

AI Software Engineer

AI Software Engineer

Software Engineer 2 (Hybrid)

Software Engineer 2 (Hybrid)

AI/ML Inference Engineer

AI/ML Inference Engineer

Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

Senior Solutions Architect, Generative AI

Senior Solutions Architect, Generative AI

Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

Senior Solutions Architect, NVIDIA Cloud Partners

Senior Solutions Architect, NVIDIA Cloud Partners

Developer Relations Manager - AI Natives

Developer Relations Manager - AI Natives

Senior Solutions Architect, Generative AI

Senior Solutions Architect, Generative AI

Solutions Architect, Inference Deployments

Solutions Architect, Inference Deployments

People also search for

Nvidia Triton Inference Server information

See salary details

How much do nvidia triton inference server jobs pay per hour?

What are the key skills and qualifications needed to thrive as an Nvidia Triton Inference Server Engineer, and why are they important?

What are some typical challenges faced when deploying machine learning models with Nvidia Triton Inference Server in a production environment?

What is Nvidia Triton Inference Server?

What is the difference between Nvidia Triton Inference Server vs Data Scientist?

Senior System Software Engineer - Dynamo-Triton Inference Server

Share this job

Job description

About Nvidia

Industry

Company size

Headquarters location

Year founded

Website

Social media

Share this job