$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Boston, MA · On-site +1
$174K - $287K/yr
As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...
Boston, MA · On-site +1
$174K - $287K/yr
As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$158K - $212K/yr
This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack ...
Santa Clara, CA · On-site
$158K - $212K/yr
This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack ...
New York, NY · On-site
$134K - $176K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
New York, NY · On-site
$134K - $176K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
Bellevue, WA · On-site
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
Bellevue, WA · On-site
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
Santa Clara, CA · On-site
$144K - $190K/yr
Hands-on understanding of vLLM, SGLang, or similar inference stacks * Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects Deep ...
Santa Clara, CA · On-site
$144K - $190K/yr
Hands-on understanding of vLLM, SGLang, or similar inference stacks * Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects Deep ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · Hybrid
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · Hybrid
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
Quick apply
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
... vLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source • Follow the latest state-of-the-art in the open source and ...
... vLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source • Follow the latest state-of-the-art in the open source and ...
Charlotte, NC · On-site
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
Charlotte, NC · On-site
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
| Aspect | Vllm | Data Analyst |
|---|---|---|
| Required Credentials | Typically requires knowledge of machine learning, AI, and programming languages like Python or R | Requires skills in statistics, Excel, SQL, and data visualization tools |
| Work Environment | Often in tech companies, research labs, or AI-focused teams | Commonly in business, finance, healthcare, and marketing sectors |
| Industry Usage | Emerging role in AI and machine learning projects | Established role in data-driven decision making |
| Common Search/Comparison | Vllm vs Data Analyst |
The main difference between Vllm and Data Analyst lies in their focus and skill set. Vllm professionals specialize in AI and machine learning models, often working in tech environments, while Data Analysts focus on interpreting data to inform business decisions. Both roles require analytical skills, but Vllm roles demand programming and AI expertise, whereas Data Analysts emphasize statistical analysis and data visualization.

$122K - $161K/yr
Full-time
Posted 26 days ago
Innovating and developing new AI systems technologies for efficient inference.
Designing, implementing, and optimizing kernels for high impact AI workloads.
Designing and implementing extensible abstractions for LLM serving engines.
We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build innovative AI systems software to accelerate for AI inference. As a member of the team, you'll develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. This means designing and building things like new abstractions, efficient attention kernel implementations, new LLM inference runtimes components, and kernel code generators to accelerate large language models, agents, and other high-impact AI workloads.
What you'll be doing:
Innovating and developing new AI systems technologies for efficient inference
Designing, implementing, and optimizing kernels for high impact AI workloads
Designing and implementing extensible abstractions for LLM serving engines
Building efficient just-in-time domain specific compilers and runtimes
Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams
Contributing to open source communities like FlashInfer, vLLM, and SGLang
What we need to see:
Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred
6+ years (academic/ industry) experience with ML/DL systems development preferable
Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.
Strong Python and C/C++ programming skills
Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
Ways to stand out from the crowd:
Background in domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
Expertise in inference engines like vLLM and SGLang
Expertise in machine learning compilers (e.g. Apache TVM, MLIR)
Open source project ownership or contributions
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993