1

Vllm Jobs (NOW HIRING)

OR

$122K - $161K/yr

Contributing to open source communities like FlashInfer, vLLM, and SGLang What we need to see: * Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience)

Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...

We work directly within TensorRT-LLM, SGLang, and vLLM, building the tools that evaluate serving performance at scale. This team sits at the intersection of GPU performance engineering and public ...

Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. * Work with the training team to identify opportunities to ...

Principal Machine Learning Engineer

Boston, MA · On-site +1

$189K - $312K/yr

As leading developers, maintainers of the vLLM project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to ...

next page

Showing results 1-20

Vllm information

How does a VLLM (Very Large Language Model) Engineer typically collaborate with data scientists and product teams during model deployment?

VLLM Engineers work closely with data scientists to understand the specific requirements and fine-tuning needs of large-scale language models. They are often responsible for integrating these models into production systems, ensuring scalability and efficiency. Collaboration with product teams is crucial to align model capabilities with user needs and to troubleshoot real-world application challenges. Frequent communication and agile workflows are common, as updates or optimizations may be needed rapidly based on feedback from both teams.

What is a VLLM and what do they do?

VLLM stands for 'Virtual Large Language Model.' In the context of AI development, VLLM professionals work with optimized inference engines for large language models, enabling faster and more efficient deployment of AI models in production environments. Their responsibilities often include integrating LLMs into applications, optimizing model performance, and ensuring scalability for real-time use cases. They may also collaborate with data scientists and engineers to manage resources and streamline AI workflows.

What is the difference between Vllm vs Data Analyst?

AspectVllmData Analyst
Required CredentialsTypically requires knowledge of machine learning, AI, and programming languages like Python or RRequires skills in statistics, Excel, SQL, and data visualization tools
Work EnvironmentOften in tech companies, research labs, or AI-focused teamsCommonly in business, finance, healthcare, and marketing sectors
Industry UsageEmerging role in AI and machine learning projectsEstablished role in data-driven decision making
Common Search/ComparisonVllm vs Data Analyst

The main difference between Vllm and Data Analyst lies in their focus and skill set. Vllm professionals specialize in AI and machine learning models, often working in tech environments, while Data Analysts focus on interpreting data to inform business decisions. Both roles require analytical skills, but Vllm roles demand programming and AI expertise, whereas Data Analysts emphasize statistical analysis and data visualization.

What are the key skills and qualifications needed to thrive as a Machine Learning Engineer working with vLLM, and why are they important?

To thrive as a Machine Learning Engineer specializing in vLLM (a high-throughput LLM inference library), you need a strong understanding of machine learning principles, deep learning frameworks, and experience with Python programming. Familiarity with tools like PyTorch, CUDA, distributed computing, and cloud platforms, as well as relevant certifications in ML or data engineering, is highly valuable. Strong problem-solving, collaboration, and communication skills are essential for optimizing model performance and integrating with cross-functional teams. These capabilities ensure effective deployment and scaling of large language models, driving innovation and efficiency in AI applications.
More about Vllm jobs
What cities are hiring for Vllm jobs? Cities with the most Vllm job openings:
What states have the most Vllm jobs? States with the most job openings for Vllm jobs include:
Infographic showing various Vllm job openings in the United States as of May 2026, with employment types broken down into 1% Internship, 98% Full Time, and 1% Part Time. Highlights an 81% Physical, 4% Hybrid, and 15% Remote job distribution.
Senior AI Software Engineer, Kernel Libraries

Senior AI Software Engineer, Kernel Libraries

Nvidia Corporation

Santa Clara, CA • On-site

$143K - $189K/yr

Full-time

Posted 27 days ago


Job description

We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build innovative AI systems software to accelerate for AI inference. As a member of the team, you'll develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. This means designing and building things like new abstractions, efficient attention kernel implementations, new LLM inference runtimes components, and kernel code generators to accelerate large language models, agents, and other high-impact AI workloads.
What you'll be doing:
  • Innovating and developing new AI systems technologies for efficient inference
  • Designing, implementing, and optimizing kernels for high impact AI workloads
  • Designing and implementing extensible abstractions for LLM serving engines
  • Building efficient just-in-time domain specific compilers and runtimes
  • Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams
  • Contributing to open source communities like FlashInfer, vLLM, and SGLang

What we need to see:
  • Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred
  • 6+ years (academic/ industry) experience with ML/DL systems development preferable
  • Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.
  • Strong Python and C/C++ programming skills

Ways to stand out from the crowd:
  • Background in domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
  • Expertise in inference engines like vLLM and SGLang
  • Expertise in machine learning compilers (e.g. Apache TVM, MLIR)
  • Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
  • Open source project ownership or contributions

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until June 6, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993