1

Vllm Jobs (NOW HIRING)

Experience working on high-performance server systems--you'd be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase. • Significant performance engineering ...

Senior Machine Learning Engineer

Boston, MA · On-site +1

$174K - $287K/yr

As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...

Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...

OR

$122K - $161K/yr

Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)

OR

$122K - $161K/yr

Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion. * Implement and optimize ...

OR

$134K - $180K/yr

This role involves contributing to upstream inference engines like vLLM and SGLang. You will ensure they run outstandingly on NVIDIA GPUs and systems. You will also strengthen the underlying stack ...

OR · On-site

Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)

next page

Showing results 1-20

Vllm information

How does a VLLM (Very Large Language Model) Engineer typically collaborate with data scientists and product teams during model deployment?

VLLM Engineers work closely with data scientists to understand the specific requirements and fine-tuning needs of large-scale language models. They are often responsible for integrating these models into production systems, ensuring scalability and efficiency. Collaboration with product teams is crucial to align model capabilities with user needs and to troubleshoot real-world application challenges. Frequent communication and agile workflows are common, as updates or optimizations may be needed rapidly based on feedback from both teams.

What is a VLLM and what do they do?

VLLM stands for 'Virtual Large Language Model.' In the context of AI development, VLLM professionals work with optimized inference engines for large language models, enabling faster and more efficient deployment of AI models in production environments. Their responsibilities often include integrating LLMs into applications, optimizing model performance, and ensuring scalability for real-time use cases. They may also collaborate with data scientists and engineers to manage resources and streamline AI workflows.

What is the difference between Vllm vs Data Analyst?

AspectVllmData Analyst
Required CredentialsTypically requires knowledge of machine learning, AI, and programming languages like Python or RRequires skills in statistics, Excel, SQL, and data visualization tools
Work EnvironmentOften in tech companies, research labs, or AI-focused teamsCommonly in business, finance, healthcare, and marketing sectors
Industry UsageEmerging role in AI and machine learning projectsEstablished role in data-driven decision making
Common Search/ComparisonVllm vs Data Analyst

The main difference between Vllm and Data Analyst lies in their focus and skill set. Vllm professionals specialize in AI and machine learning models, often working in tech environments, while Data Analysts focus on interpreting data to inform business decisions. Both roles require analytical skills, but Vllm roles demand programming and AI expertise, whereas Data Analysts emphasize statistical analysis and data visualization.

What are the key skills and qualifications needed to thrive as a Machine Learning Engineer working with vLLM, and why are they important?

To thrive as a Machine Learning Engineer specializing in vLLM (a high-throughput LLM inference library), you need a strong understanding of machine learning principles, deep learning frameworks, and experience with Python programming. Familiarity with tools like PyTorch, CUDA, distributed computing, and cloud platforms, as well as relevant certifications in ML or data engineering, is highly valuable. Strong problem-solving, collaboration, and communication skills are essential for optimizing model performance and integrating with cross-functional teams. These capabilities ensure effective deployment and scaling of large language models, driving innovation and efficiency in AI applications.
More about Vllm jobs
What cities are hiring for Vllm jobs? Cities with the most Vllm job openings:
What states have the most Vllm jobs? States with the most job openings for Vllm jobs include:
Infographic showing various Vllm job openings in the United States as of May 2026, with employment types broken down into 1% Internship, 98% Full Time, and 1% Part Time. Highlights an 81% Physical, 4% Hybrid, and 15% Remote job distribution.
Senior Software Engineer - AI Inference

Senior Software Engineer - AI Inference

NVIDIA

Santa Clara, CA

$142K - $188K/yr

Other

Posted 24 days ago


Job description

NVIDIA is the platform upon which every new AI‑powered application is built. We are seeking a Senior Software Engineer – AI Inference to advance open‑source LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run best‑in‑class on NVIDIA GPUs and systems-and by improving the underlying stack that enables high‑throughput, low‑latency inference at scale.

This is a hands-on role for an engineer who enjoys digging into performance bottlenecks, designing pragmatic runtime improvements, and shipping high‑quality changes that are broadly useful to the community and production deployments.

What you'll be doing:

  • Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion.

  • Implement and optimize inference‑runtime capabilities: batching and scheduling policies, streaming, request lifecycle management, and KV‑cache efficiency (paging/sharding) to improve throughput and tail latency.

  • Profile and improve hot paths across layers-from Python orchestration to C++/CUDA kernels-using data to guide optimization work.

  • Improve multi‑GPU inference performance and reliability: parallelism strategies, communication patterns, and resource utilization across NVIDIA platforms.

  • Build and maintain performance and correctness regression tests to prevent slowdowns and ensure stable behavior across model and hardware configurations.

  • Collaborate with model, platform, and SRE teams to translate production requirements into upstreamable solutions with strong operability and maintainability.

What we need to see:

  • 5+ years building production software with solid systems engineering fundamentals and a track record of delivering performance or reliability improvements.

  • Experience with LLM inference/serving stacks (e.g., vLLM, SGLang) and an understanding of the tradeoffs that drive real production performance.

  • Strong programming skills in Python plus C++ and/or CUDA; ability to debug and optimize performance‑critical code.

  • Experience with profiling and performance investigation (microbenchmarks, flame graphs, GPU profiling) and a measurement‑driven mindset.

  • Familiarity with distributed systems concepts and concurrency (queues/schedulers, multi‑process/multi‑threading, scaling across GPUs/nodes).

  • Strong communication skills and comfort working with open‑source communities (issues, PR discussions, code review).

  • BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience).

Ways to stand out from the crowd:

  • Open‑source contributions to vLLM, SGLang, PyTorch, Triton, NCCL, Dynamo or adjacent serving/runtime projects.

  • Shipped performance work such as improved attention/KV cache efficiency, speculative decoding, scheduler improvements, quantization-aware serving, or streaming latency reductions.

  • Experience building reproducible benchmarking and performance regression infrastructure for latency/throughput.

  • Systems performance background spanning memory bandwidth, kernel fusion, PCIe/NVLink effects, and network fabrics (e.g., InfiniBand).

We are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward‑thinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .

Applications for this job will be accepted at least until April 18, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.


Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993