Vllm Jobs (NOW HIRING)

Senior Software Engineer, Quantized Inference

Redmond, WA · On-site

$137K - $180K/yr

Responsibilities : • Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang) • Own model export pipelines (ModelOpt, Megatron-LM HuggingFace), ensuring quantized ...

Senior Software Engineer, Quantized Inference

Redmond, WA · On-site

$137K - $180K/yr

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Santa Clara, CA

$143K - $189K/yr

Contribute VLM-related features to Open-Source projects like vLLM * Collaborate closely with Research and Product teams and influence our common roadmaps What we need to see: * Master's of Science in ...

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Santa Clara, CA

$143K - $189K/yr

Stay hands-on across the AMG stack (Python, C++, CUDA, vLLM, NIXL/Dynamo, Kubernetes), contributing directly to production systems while providing technical leadership to the team. * Solve Hard ...

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Santa Clara, CA · On-site

$143K - $189K/yr

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Redmond, WA · On-site

$137K - $180K/yr

Senior Software Engineer - VLM Microservices for Neural Reconstruction

Redmond, WA · On-site

$137K - $180K/yr

Member of Technical Staff - Model Optimization and Inference (New Grad)

Seattle, WA · On-site

$200K - $300K/yr

You've worked with vLLM, SGLang, or similar frameworks (through coursework, research, internships, or open-source) and have opinions about where they fall short. This posting is aimed at early-career ...

Member of Technical Staff - Model Optimization and Inference (New Grad)

Seattle, WA · On-site

$200K - $300K/yr

Tech Lead - AI Inference

Accenture Federal Services

Tech Lead - AI Inference

Netskope

Principal / Distinguished Engineer, Machine Learning

Design, optimize, and deploy highly scalable AI/ML inference systems, leveraging the latest LLM serving technologies such as vLLM, SGLang, and advanced KV Cache optimization to maximize throughput ...

Netskope

Principal / Distinguished Engineer, Machine Learning

AI Engineer

Arlington, VA · On-site

Leverage tools and frameworks such as LangGraph, Semantic Kernel, vLLM, Ollama, and Ray for scalable AI solutions * Integrate with NVIDIA GPU ecosystems and vector databases to enhance AI performance ...

Accenture Federal Services

AI Engineer

Arlington, VA · On-site

EnCharge AI

LLM Inference Deployment Engineer

$180K - $240K/yr

Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution. * Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications. * Develop and ...

EnCharge AI

LLM Inference Deployment Engineer

$180K - $240K/yr

CUDA Libraries and Frameworks Product Marketing Manager

Santa Clara, CA · On-site

$180K/yr

Come help craft the story for CUDA, core NVIDIA acceleration libraries like cuDNN, NCCL, NIXL, and AI frameworks like PyTorch, JAX, vLLM, and SGLang. What you'll be doing: * Own positioning and ...

CUDA Libraries and Frameworks Product Marketing Manager

Santa Clara, CA · On-site

$180K/yr

CUDA Libraries and Frameworks Product Marketing Manager

Santa Clara, CA · On-site

$180K/yr

CUDA Libraries and Frameworks Product Marketing Manager

Santa Clara, CA · On-site

$180K/yr

Member of Technical Staff - Model Optimization and Inference (Experienced)

Seattle, WA

$250K - $350K/yr

You've worked with vLLM, SGLang, or similar frameworks at scale and have strong opinions about where they fall short. This posting is aimed at experienced engineers and researchers who've operated at ...

Member of Technical Staff - Model Optimization and Inference (Experienced)

Seattle, WA

$250K - $350K/yr

Engineering Manager, LLM Performance

Santa Clara, CA · On-site

We are accelerating LLM inference across the stack and across all open source LLM frameworks like TensorRT LLM, vLLM and SGLang. With demand for AI exploding, particularly in the realm of large ...

Engineering Manager, LLM Performance

Santa Clara, CA · On-site

OpenAI

Software Engineer, Inference - AMD GPU Enablement

San Francisco, CA · On-site

Responsibilities : • Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware. • Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety ...

OpenAI

Software Engineer, Inference - AMD GPU Enablement

San Francisco, CA · On-site

Samsung Semiconductor

Principal Engineer, AI Serving Framework Architect (Software)

San Jose, CA · On-site

... such as vLLM Qualifications : Required : • PhD in Computer Science or a related field with 10+ years of experience in AI Serving Framework for large-scale computing, with focusing on the AI ...

Samsung Semiconductor

Principal Engineer, AI Serving Framework Architect (Software)

San Jose, CA · On-site

DigitalOcean

Senior Engineer II, AI Inference Optimization

Seattle, WA · Hybrid

$167K - $209K/yr

Familiarity with LLM serving stacks such as vLLM, TensorRT-LLM, or similar technologies * Experience building systems for inference optimization, rate limiting, routing, or workload orchestration ...

DigitalOcean

Senior Engineer II, AI Inference Optimization

Seattle, WA · Hybrid

$167K - $209K/yr

Senior Software Engineer - TensorRT Edge-LLM

Santa Clara, CA · On-site

$143K - $189K/yr

Preferred : • Demonstrated development experience or open-source contributions to LLM inference frameworks and libraries, such as SGLang, vLLM, or FlashInfer. • Proficiency with CUDA, including ...