OR ยท On-site
$122K - $161K/yr
Strong proficiency in C++ and Python programming. * Solid background in the fundamentals of Deep ... Hands-on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed ...
OR ยท On-site
$122K - $161K/yr
Strong proficiency in C++ and Python programming. * Solid background in the fundamentals of Deep ... Hands-on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed ...
$122K - $161K/yr
At the core of this platform are the CUDA Core Libraries. C++ and Python libraries that enable ... Strong programming skills inC++, Python, or both, with proven interest in systems-level software ...
$122K - $161K/yr
Strong programming skills in C++. * Existing knowledge of GPU hardware and/or motivated to learn to ... Expertise in CUDA kernel programming and profiling. * Outstanding interpersonal skills and the ...
$122K - $161K/yr
In this role, you will be working on CUDA Tile, a new tile-based programming model for our GPUs. CUDA Tile shipped with CUDA 13.1 and is a major addition to CUDA ( * You will design and implement ...
CUDA-Q is the open-source programming framework bridging classical accelerated computing and quantum processors, to enable fault-tolerant quantum-GPU supercomputing. This role sits where quantum ...
OR ยท On-site
NVIDIA seeks a Developer Relations Manager to lead our work in architecting impactful usage and adoption of our core CUDA Math Libraries. We are interested in finding a leader in high performance ...
$122K - $161K/yr
CUDA features improve both productivity and performance of AI applications. Your work in AI ... Experience with programming for compute & communication overlap in distributed runtime Your base ...
Join us in developing the CUDA-Q platform for programming powerful hybrid quantum-classical multi-processor systems. We are looking for a dedicated engineer with expertise building extensible ...
OR ยท On-site
$122K - $161K/yr
Excellent C/C++ programming and debugging skills, with experience in CUDA development. * Good exposure to PCIe and NVLINK. * Deep understanding of operating systems and data-center system ...
OR ยท On-site
$122K - $161K/yr
Exceptional CUDA programming skills. Exceptional C++ programming skills NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward ...
OR ยท On-site
$122K - $161K/yr
CUDA defines a unified programming model across a range of system configurations and hardware capabilities. Toaccomplishthis, the CUDA driver interacts with GPU hardware, kernel mode drivers, user ...
$122K - $161K/yr
Work with NVIDIA GPU Architecture and CUDA Programming model teams to build abstractions to expose new GPU features in portable and performant ways in PTX ISA. PTX Compiler (PTXAS) apart from ...
OR ยท On-site
$122K - $161K/yr
We are looking for a seasoned software professional to work on the CUDA Driver, a core component of ... Strong C and C++ programming skills * Minimum of 7 years of related development experience ...
Strong programming skills in Python and C++! * Hands-on experience with PyTorch or a similar tensor/autograd framework. * Experience optimizing GPU-accelerated workloads using CUDA, C++/CUDA ...
OR ยท On-site
$104K - $143K/yr
Excellent C++, Python, and CUDA programming skills * Strong collaboration, communication, and documentation habits and ideally experience with working in a globally distributed organization Ways to ...
OR ยท On-site
$139K/yr
Experience with NVIDIA GPUs, CUDA Programming, and Networking * Motivated self-starter with strong problem-solving skills and customer-facing communication skills * Passion for continuous learning.
$122K - $161K/yr
We are looking for a seasoned software professional to work on the CUDA Driver, a core component of ... Strong C and C++ programming skills * Minimum of 8 years of related development experience ...
$104K - $143K/yr
Strong programming skills in Python (C++ is a plus) * Experience with CI/CD systems and automation ... Understanding of compiler internals (LLVM, MLIR, CUDA compilation flow) * Experience building ...
OR ยท On-site
$104K - $143K/yr
Background with NVIDIA GPUs, CUDA Programming, NCCL and MLPerf benchmarking * Experience with Machine Learning and Deep Learning concepts, algorithms and models * Familiarity with InfiniBand with ...
OR ยท On-site
$139K/yr
Familiarity with CUDA programming and/or GPUs. * Experience with HPC or large-scale computing environments. Widely considered to be one of the technology world's most desirable employers, NVIDIA ...
$29.48 - $34.68
5% of jobs
$34.68 - $39.88
10% of jobs
$39.88 - $45.08
9% of jobs
$46.19 is the 25th percentile. Wages below this are outliers.
$45.08 - $50.28
7% of jobs
$50.28 - $55.48
15% of jobs
The median wage is $57.08 / hr.
$55.48 - $60.67
14% of jobs
$65.39 is the 75th percentile. Wages above this are outliers.
$60.67 - $65.87
17% of jobs
$65.87 - $71.07
14% of jobs
$71.07 - $76.27
6% of jobs
$76.27 - $81.47
3% of jobs
$81.47 - $86.67
0% of jobs
$29
$57
$86
| Aspect | Cuda Programming | GPU Developer |
|---|---|---|
| Required Credentials | Knowledge of CUDA, C/C++, parallel computing | Knowledge of GPU architecture, CUDA, OpenCL, C/C++ |
| Work Environment | High-performance computing, scientific research, AI | Graphics, gaming, scientific visualization, AI |
| Industry Usage | Tech companies, research labs, AI firms | Gaming, entertainment, tech, research |
While Cuda Programming focuses specifically on writing code using NVIDIA's CUDA platform for parallel processing, GPU Developers have a broader role that includes designing, optimizing, and implementing GPU-based solutions across various platforms and technologies. Both roles require knowledge of GPU architecture and programming languages like C/C++, but GPU Developers often work on a wider range of applications beyond CUDA-specific projects.

$122K - $161K/yr
Full-time
Posted 25 days ago
We are looking for an experienced and highly motivated software professional to work on pioneering initiatives and projects at the intersection of CUDA and Deep Learning Systems. As the complexity and scale of artificial intelligence continue to grow, the intersection of advanced deep learning architectures, massive-scale distributed computing, and low-level hardware optimization has never been more critical. Our team is dedicated to exploring and prototyping next-generation ideas that bridge the gap between deep learning algorithms and CUDA, pushing the boundaries of what is possible on modern accelerator architectures.
Join our dynamic, research-oriented team to help unlock maximum hardware performance for emerging AI workloads. You will be a crucial member of a highly technical group exploring uncharted territories in model optimization, custom kernel development, and cluster-scale AI systems design. If you are passionate about the fundamentals of deep learning and thrive on squeezing every ounce of performance out of advanced computing systems from a single GPU to supercomputer clusters, we want you on our team!
What you will be doing:
Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping.
Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments.
Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads.
Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines.
Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross-node network communication efficiency and programmability.
Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.
Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.
What we need to see:
BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
8+ years of relevant industry experience or equivalent academic experience after degree achievement.
Strong proficiency in C++ and Python programming.
Solid background in the fundamentals of Deep Learning with a focus on transformers.
Strong understanding of distributed computing principles, multi-node scaling, and the unique performance challenges of cluster-scale execution.
Proven experience in systems programming, computer architecture, and low-level systems performance optimization.
Familiarity with deep learning accelerator architectures such as the GPU and hands-on experience with CUDA programming and kernel optimization.
A strong analytical approach with experience using profiling tools to deeply understand software performance on hardware.
Experience profiling and optimizing innovative vision models, generative AI architectures, or diffusion models.
Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile)
Ways to stand out from the crowd:
Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.).
Hands-on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism).
Knowledge of numerical methods, low-precision arithmetic (e.g., NVFP4, MXFP4, FP8, INT8), and their implications on deep learning model accuracy and performance.
Familiarity with systems requirements for Reinforcement Learning (RL) or highly parallel simulation environments and/or research background in machine learning systems or adjacent fields.
Experience with machine learning, especially agentic systems, applied to systems problems.
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993