Vllm Jobs (NOW HIRING)

Senior AI Software Engineer, Kernel Libraries

Santa Clara, CA · On-site

$143K - $189K/yr

Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)

Senior AI Software Engineer, Kernel Libraries

Santa Clara, CA · On-site

$143K - $189K/yr

$122K - $161K/yr

$122K - $161K/yr

Senior Software Engineer - AI Inference

$143K - $189K/yr

Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...

Senior Software Engineer - AI Inference

$143K - $189K/yr

Senior AI Software Engineer, Kernel Libraries

Santa Clara, CA · On-site

$143K - $189K/yr

Senior AI Software Engineer, Kernel Libraries

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer - AI Inference

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer - AI Inference

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer - AI Inference

New York, NY

$134K - $176K/yr

Senior Software Engineer - AI Inference

New York, NY

$134K - $176K/yr

Senior Software Engineer, Matrix Multiplication

$143K - $189K/yr

Senior Software Engineer, Matrix Multiplication

$143K - $189K/yr

Senior Software Engineer, Matrix Multiplication

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer, Matrix Multiplication

Santa Clara, CA · On-site

$143K - $189K/yr

XPath Solutions

Generative AI Engineer

Charlotte, NC

$60 - $72/hr

The ideal candidate will have hands-on experience with Large Language Models (LLMs) , Vision Language Models (Vision LLMs/VLMs) , vLLM inference framework , prompt engineering , and modern Generative ...

Quick apply

XPath Solutions

Generative AI Engineer

Charlotte, NC

$60 - $72/hr

Principal Engineer - Perf and Benchmarking

Bellevue, WA · On-site

$206K - $333K/yr

If MLPerf (Training & Inference), Working closely with NVIDIA (Megatron-LM, TensorRT-LLM & DGX cloud) and the open-source community (llm-d, vLLM and all popular ML frameworks) speak to you, come help ...

Principal Engineer - Perf and Benchmarking

Bellevue, WA · On-site

$206K - $333K/yr

Senior Software Engineer, AI Inference Systems

Santa Clara, CA · On-site

$143K - $189K/yr

Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding ...

Senior Software Engineer, AI Inference Systems

Santa Clara, CA · On-site

$143K - $189K/yr

AI Inference Performance Engineer - New College Grad 2026

Santa Clara, CA · On-site

$164K/yr

We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...

AI Inference Performance Engineer - New College Grad 2026

Santa Clara, CA · On-site

$164K/yr

AI Inference Performance Engineer

Santa Clara, CA · On-site

$164K/yr

AI Inference Performance Engineer

Santa Clara, CA · On-site

$164K/yr

AI Inference Performance Engineer - New College Grad 2026

Santa Clara, CA · On-site

$164K/yr

AI Inference Performance Engineer - New College Grad 2026

Santa Clara, CA · On-site

$164K/yr

Principal Engineer - Perf and Benchmarking

Sunnyvale, CA · On-site

$206K - $333K/yr

Quick apply

Principal Engineer - Perf and Benchmarking

Sunnyvale, CA · On-site

$206K - $333K/yr

AI Inference Performance Engineer

Santa Clara, CA · Hybrid

$164K/yr

AI Inference Performance Engineer

Santa Clara, CA · Hybrid

$164K/yr

Triune Infomatics Inc

AI Inference Engineer

San Jose, CA · On-site

Responsibilities : • Build, operate, and optimize production model-serving stacks using frameworks such as vLLM, SGLang, Triton Inference Server, TensorRT-LLM, TorchServe, or KServe • Develop and ...

Triune Infomatics Inc

AI Inference Engineer

San Jose, CA · On-site

NVIDIA

Senior Software Engineer, Quantized Inference

Redmond, WA · On-site

$137K - $180K/yr

Responsibilities : • Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang) • Own model export pipelines (ModelOpt, Megatron-LM HuggingFace), ensuring quantized ...

NVIDIA

Senior Software Engineer, Quantized Inference

Redmond, WA · On-site

$137K - $180K/yr

Senior Software Engineer - VLM Microservices for Neural Reconstruction

$143K - $189K/yr

Contribute VLM-related features to Open-Source projects like vLLM * Collaborate closely with Research and Product teams and influence our common roadmaps What we need to see: * Master's of Science in ...

Senior Software Engineer - VLM Microservices for Neural Reconstruction