$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
$122K - $161K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Santa Clara, CA · On-site
$143K - $189K/yr
Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...
Hands-on understanding of vLLM, SGLang, or similar inference stacks * Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects Deep ...
Hands-on understanding of vLLM, SGLang, or similar inference stacks * Experience with distributed inference scaling and a proven track record of contributing to upstream open-source projects Deep ...
Charlotte, NC · Hybrid
$105K - $137K/yr
Deploy and manage inference engines including vLLM and TensorRT-LLM. Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration ...
Charlotte, NC · Hybrid
$105K - $137K/yr
Deploy and manage inference engines including vLLM and TensorRT-LLM. Hardware Utilization: Optimize GPU throughput tuning, batching strategies, and latency optimization. Manage workload orchestration ...
$158K - $212K/yr
This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack ...
$158K - $212K/yr
This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...
Bellevue, WA · On-site
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
Bellevue, WA · On-site
$206K - $333K/yr
If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...
Santa Clara, CA · Hybrid
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · Hybrid
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Upstream features and performance fixes into vLLM, SGLang, and llm-d * Enable customer PoCs and production deployments on AMD platforms * Build and maintain benchmark-grade inference pipelines ...
Upstream features and performance fixes into vLLM, SGLang, and llm-d * Enable customer PoCs and production deployments on AMD platforms * Build and maintain benchmark-grade inference pipelines ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Santa Clara, CA · On-site
$164K/yr
We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...
Seattle, WA · On-site
$250K - $350K/yr
You've worked with vLLM, SGLang, or similar frameworks and have opinions about where they fall short. Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal ...
Seattle, WA · On-site
$250K - $350K/yr
You've worked with vLLM, SGLang, or similar frameworks and have opinions about where they fall short. Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal ...
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...
Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
... vLLM/TGI/Triton), caching, batching, and streaming. • Implement logging, tracing, observability, and offline evaluation pipelines. Qualifications : Required : • 5+ years building ML or ...
... vLLM/TGI/Triton), caching, batching, and streaming. • Implement logging, tracing, observability, and offline evaluation pipelines. Qualifications : Required : • 5+ years building ML or ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute VLM-related features to Open-Source projects like vLLM * Collaborate closely with Research and Product teams and influence our common roadmaps What we need to see: * Master's of Science in ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute VLM-related features to Open-Source projects like vLLM * Collaborate closely with Research and Product teams and influence our common roadmaps What we need to see: * Master's of Science in ...
Boston, MA · On-site +1
$189K - $312K/yr
As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...
Boston, MA · On-site +1
$189K - $312K/yr
As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding ...
Santa Clara, CA · On-site
$143K - $189K/yr
Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding ...
Charlotte, NC · On-site
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
Charlotte, NC · On-site
Must-Have Skills (Mandatory Keywords) LLM Inference & Optimization vLLM, TensorRT-LLM, Triton Inference Server, SGLang Inference optimization techniques: Continuous batching Speculative decoding KV ...
| Aspect | Vllm | Data Analyst |
|---|---|---|
| Required Credentials | Typically requires knowledge of machine learning, AI, and programming languages like Python or R | Requires skills in statistics, Excel, SQL, and data visualization tools |
| Work Environment | Often in tech companies, research labs, or AI-focused teams | Commonly in business, finance, healthcare, and marketing sectors |
| Industry Usage | Emerging role in AI and machine learning projects | Established role in data-driven decision making |
| Common Search/Comparison | Vllm vs Data Analyst |
The main difference between Vllm and Data Analyst lies in their focus and skill set. Vllm professionals specialize in AI and machine learning models, often working in tech environments, while Data Analysts focus on interpreting data to inform business decisions. Both roles require analytical skills, but Vllm roles demand programming and AI expertise, whereas Data Analysts emphasize statistical analysis and data visualization.
$122K - $161K/yr
Full-time
Posted 29 days ago
We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build innovative AI systems software to accelerate for AI inference. As a member of the team, you'll develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. This means designing and building things like new abstractions, efficient attention kernel implementations, new LLM inference runtimes components, and kernel code generators to accelerate large language models, agents, and other high-impact AI workloads.
What you'll be doing:
Innovating and developing new AI systems technologies for efficient inference
Designing, implementing, and optimizing kernels for high impact AI workloads
Designing and implementing extensible abstractions for LLM serving engines
Building efficient just-in-time domain specific compilers and runtimes
Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams
Contributing to open source communities like FlashInfer, vLLM, and SGLang
What we need to see:
Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred
6+ years (academic/ industry) experience with ML/DL systems development preferable
Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.
Strong Python and C/C++ programming skills
Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
Ways to stand out from the crowd:
Background in domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
Expertise in inference engines like vLLM and SGLang
Expertise in machine learning compilers (e.g. Apache TVM, MLIR)
Open source project ownership or contributions
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993