OR · On-site
$122K - $161K/yr
We are seeking a Senior Software Engineer - AI Inference to advance opensource LLM serving by ... Profile and improve hot paths across layers-from Python orchestration to C++/CUDA kernels-using ...
OR · On-site
$122K - $161K/yr
We are seeking a Senior Software Engineer - AI Inference to advance opensource LLM serving by ... Profile and improve hot paths across layers-from Python orchestration to C++/CUDA kernels-using ...
OR · On-site
Product, Engineering, Marketing, Applied Research, etc. * Lead strategic relationships with key ... Experience with NVIDIA products and SDKs (CUDA, CUDA-X Libraries, PhysicsNeMo, Omniverse) * Hands ...
OR · On-site
CUDA programming and optimization experience. * Experience using data science in the energy industry. * Experience using GPGPU programming and design practices. * Background with network software ...
OR · On-site
$129K - $175K/yr
Experience with CUDA programming and NVIDIA GPUs. Knowledge of high-performance networks like InfiniBand, RoCE, NVLink, etc. * Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc.
OR · On-site
$172K - $204K/yr
Hands-on experience with multi-GPU or multi-node workloads and CUDA-aware distributed execution ... Strong Python and C/C++ programming skills. Ways to stand out from the crowd: * Hands-on experience ...
OR · On-site
CUDA programming and optimization experience. * Experience with network software development. * Experience with AI: Published record of thought leadership in a technical area or industry segment ...
As a Solutions Engineer based in our Corvallis, OR office, you will be part of our passionate and ... Basic knowledge of CUDA * Familiarity with Microsoft Visual Studio. * Knowledge of computer vision ...
Quick apply
As a Solutions Engineer based in our Corvallis, OR office, you will be part of our passionate and ... Basic knowledge of CUDA * Familiarity with Microsoft Visual Studio. * Knowledge of computer vision ...
OR · On-site
$63 - $83/hr
CUDA programming and optimization experience. * Background with network software development. * Experience with AI: Published record of thought leadership in a technical area or industry segment ...
OR · On-site
Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe) * Solid ... Experience with parallel programming on at least one communication runtime (NCCL, NVSHMEM, MPI)
OR · On-site
Experience with NVIDIA technologies and platforms such as CUDA, CUDA-X libraries, NVIDIA AI ... or developer enablement programs. * Experience presenting at academic conferences, research ...
OR · On-site
Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache) and CUDA programming models. Ways to Stand Out from the Crowd: * Framework Development:Hands-on experience developing within ...
OR · On-site
Experience with NVIDIA technologies and platforms such as CUDA, CUDA-X libraries, NVIDIA AI ... or developer enablement programs. * Experience presenting at academic conferences, research ...
OR · On-site
Experience with NVIDIA AI platforms, including CUDA, CUDA-X libraries, TensorRT-LLM, Triton ... or developer enablement programs. * Experience presenting at venues such as NeurIPS, ICML, ICLR ...
OR · On-site
$104K - $143K/yr
Provide advise and drive compiler and applications engineering development teams based on the ... Experience with OpenACC, OpenMP, MPI, and CUDA. * Strong skills in performance analysis and tuning ...
OR · On-site
Perform low-level CUDA optimization, including custom kernels to accelerate simulation and ... Engaging with life science executives, IT leaders, data scientists, and developers to drive ...
Systems Engineering Depth: Strong Python and C++ skills (Rust a plus), with a solid grasp of CUDA, GPU memory management, and high-performance I/O - including GPUDirect Storage (GDS), RDMA, and NVMe ...
Hillsboro, OR · On-site
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
Hillsboro, OR · On-site
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
OR · On-site
$104K - $143K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
$113K - $156K/yr
Familiarity with OpenACC, OpenMP, or CUDA * You have a real passion for compiler development With ... exclusive engineering teams are rapidly growing. If you're a creative and autonomous program ...
$12.71 - $18.16
4% of jobs
$18.16 - $23.61
9% of jobs
$27.53 is the 25th percentile. Wages below this are outliers.
$23.61 - $29.07
17% of jobs
$29.07 - $34.52
13% of jobs
The median wage is $37.70 / hr.
$34.52 - $39.97
13% of jobs
$39.97 - $45.42
10% of jobs
$45.42 - $50.88
9% of jobs
$51.90 is the 75th percentile. Wages above this are outliers.
$50.88 - $56.33
9% of jobs
$56.33 - $61.78
7% of jobs
$61.78 - $67.24
6% of jobs
$67.24 - $72.69
4% of jobs
$12
$41
$72
Cuda Programmers often encounter challenges related to optimizing code performance and efficiently managing memory on GPU architectures. Debugging and profiling can be complex, as issues may arise from both the code and hardware-specific elements, requiring close attention to parallelization and bottlenecks. Collaboration is key, as you’ll typically work closely with software engineers, data scientists, or researchers to integrate and optimize code for specialized workflows. Successfully navigating these challenges helps drive significant performance improvements and innovation in high-performance computing applications.
To thrive as a Cuda Programmer, you need strong programming skills in C/C++ and parallel computing, with a solid understanding of GPU architectures and CUDA development. Familiarity with CUDA libraries, performance profiling tools, and platforms like NVIDIA Nsight or Visual Studio is often required, while certifications from NVIDIA can be advantageous. Problem-solving abilities, attention to detail, and effective teamwork and communication skills help set candidates apart. These competencies ensure you can optimize complex algorithms, work efficiently on high-performance computing projects, and collaborate smoothly with multidisciplinary teams.
A CUDA Programmer develops high-performance parallel computing applications using NVIDIA's CUDA (Compute Unified Device Architecture) framework. They optimize algorithms to run efficiently on GPUs, accelerating tasks such as machine learning, scientific simulations, and real-time data processing. This role requires proficiency in C/C++, an understanding of GPU architectures, and experience with parallel computing concepts to maximize performance.

$122K - $161K/yr
Full-time
This job post has expired today. Applications are no longer accepted.
NVIDIA is the platform upon which every new AIpowered application is built. We are seeking a Senior Software Engineer - AI Inference to advance opensource LLM serving by contributing directly to upstream inference engines like vLLM and SGLang-ensuring they run bestinclass on NVIDIA GPUs and systems-and by improving the underlying stack that enables highthroughput, lowlatency inference at scale.
This is a hands-on role for an engineer who enjoys digging into performance bottlenecks, designing pragmatic runtime improvements, and shipping highquality changes that are broadly useful to the community and production deployments.
What you'll be doing:
Contribute features, fixes, and optimizations upstream to vLLM/SGLang: author PRs, participate in reviews, write benchmarks/tests, and help drive designs to completion.
Implement and optimize inferenceruntime capabilities: batching and scheduling policies, streaming, request lifecycle management, and KVcache efficiency (paging/sharding) to improve throughput and tail latency.
Profile and improve hot paths across layers-from Python orchestration to C++/CUDA kernels-using data to guide optimization work.
Improve multiGPU inference performance and reliability: parallelism strategies, communication patterns, and resource utilization across NVIDIA platforms.
Build and maintain performance and correctness regression tests to prevent slowdowns and ensure stable behavior across model and hardware configurations.
Collaborate with model, platform, and SRE teams to translate production requirements into upstreamable solutions with strong operability and maintainability.
What we need to see:
5+ years building production software with solid systems engineering fundamentals and a track record of delivering performance or reliability improvements.
Experience with LLM inference/serving stacks (e.g., vLLM, SGLang) and an understanding of the tradeoffs that drive real production performance.
Strong programming skills in Python plus C++ and/or CUDA; ability to debug and optimize performancecritical code.
Experience with profiling and performance investigation (microbenchmarks, flame graphs, GPU profiling) and a measurementdriven mindset.
Familiarity with distributed systems concepts and concurrency (queues/schedulers, multiprocess/multithreading, scaling across GPUs/nodes).
Strong communication skills and comfort working with opensource communities (issues, PR discussions, code review).
BS/MS in Computer Science, Computer Engineering, or related field (or equivalent experience).
Ways to stand out from the crowd:
Opensource contributions to vLLM, SGLang, PyTorch, Triton, NCCL, Dynamo or adjacent serving/runtime projects.
Shipped performance work such as improved attention/KV cache efficiency, speculative decoding, scheduler improvements, quantization-aware serving, or streaming latency reductions.
Experience building reproducible benchmarking and performance regression infrastructure for latency/throughput.
Systems performance background spanning memory bandwidth, kernel fusion, PCIe/NVLink effects, and network fabrics (e.g., InfiniBand).
We are widely considered to be one of the technology world's most desirable employers. We have some of the most forwardthinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993