1

Cuda Engineer Jobs (NOW HIRING)

About the job CUDA Kernel Engineer Location: Remote US Start date: ASAP Languages: English (required) About the Role Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a Top 10 ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

Senior Software Engineer - CUDA Driver

Santa Clara, CA · On-site

$142K - $188K/yr

NVIDIA is seeking outstanding senior engineers to work on the CUDA driver, a key component of accelerated GPU computing. You will join a versatile software engineering team that delivers innovative ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale ...

Senior Software Engineer - CUDA Driver

Santa Clara, CA · On-site

$143K - $189K/yr

NVIDIA is seeking outstanding senior engineers to work on the CUDA driver, a key component of accelerated GPU computing. You will join a versatile software engineering team that delivers innovative ...

Software Engineer, CUDA-Q page is loaded## Software Engineer, CUDA-Qlocations: US, CA, Remote: US, WA, Remotetime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2011649NVIDIA ...

next page

Showing results 1-20

Cuda Engineer information

See salary details

$36.5K

$107.3K

$137.5K

How much do cuda engineer jobs pay per year?

As of Jun 4, 2026, the average yearly pay for cuda engineer in the United States is $107,282.00, according to ZipRecruiter salary data. Most workers in this role earn between $88,500.00 and $136,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a CUDA Engineer, and why are they important?

To thrive as a CUDA Engineer, you need a strong proficiency in C/C++ programming, parallel computing concepts, and deep knowledge of GPU architectures, often supported by a computer science or engineering degree. Experience with NVIDIA CUDA Toolkit, profiling/debugging tools, and sometimes certifications like NVIDIA DLI are highly valuable. Strong problem-solving, attention to detail, and effective communication skills help you optimize code and collaborate across teams. These skills ensure efficient development of high-performance GPU applications and successful project delivery in compute-intensive fields.

What are some common challenges faced by CUDA Engineers when optimizing GPU-accelerated applications?

CUDA Engineers frequently encounter challenges such as managing memory effectively between the host and the device, optimizing kernel performance, and minimizing data transfer bottlenecks. Debugging parallel code can also be complex due to race conditions and the difficulty of reproducing timing-related bugs. Collaborating closely with software developers and data scientists is essential to ensure that GPU resources are leveraged efficiently and that the application's overall performance meets project goals.

What are CUDA Engineers?

CUDA Engineers are software developers who specialize in using NVIDIA's CUDA (Compute Unified Device Architecture) platform to write programs that run on Graphics Processing Units (GPUs). They optimize and accelerate computational tasks by parallelizing code, making use of GPUs’ capabilities for high-performance computing. CUDA Engineers often work in fields like machine learning, scientific computing, and graphics, where large amounts of data need to be processed quickly. Their expertise includes proficiency in C/C++, CUDA programming, and understanding GPU hardware and parallel computing concepts.

What is the difference between Cuda Engineer vs GPU Developer?

AspectCuda EngineerGPU Developer
Required CredentialsBachelor's or Master's in Computer Science, Engineering, or related; knowledge of CUDA, C++, parallel programmingBachelor's or Master's in Computer Science, Engineering, or related; experience with GPU programming, CUDA, OpenCL
Work EnvironmentResearch labs, tech companies, hardware firms focusing on GPU accelerationSoftware development teams, gaming, AI, scientific computing sectors
Employer & Industry UsageHardware manufacturers, AI companies, high-performance computing firmsGame development, scientific research, machine learning applications

While both roles involve GPU programming and CUDA expertise, a Cuda Engineer primarily focuses on developing and optimizing CUDA-based solutions for hardware acceleration. In contrast, a GPU Developer works on broader GPU programming tasks, including application development across various platforms. The roles often overlap but differ in scope and specific focus areas.

More about Cuda Engineer jobs
What cities are hiring for Cuda Engineer jobs? Cities with the most Cuda Engineer job openings:
What states have the most Cuda Engineer jobs? States with the most job openings for Cuda Engineer jobs include:
What job categories do people searching Cuda Engineer jobs look for? The top searched job categories for Cuda Engineer jobs are:
Senior Software Engineer, CUDA Deep Learning Systems

Senior Software Engineer, CUDA Deep Learning Systems

Nvidia Corporation

Santa Clara, CA • On-site

$143K - $189K/yr

Full-time

Posted 21 days ago


Job description

We are looking for an experienced and highly motivated software professional to work on pioneering initiatives and projects at the intersection of CUDA and Deep Learning Systems. As the complexity and scale of artificial intelligence continue to grow, the intersection of advanced deep learning architectures, massive-scale distributed computing, and low-level hardware optimization has never been more critical. Our team is dedicated to exploring and prototyping next-generation ideas that bridge the gap between deep learning algorithms and CUDA, pushing the boundaries of what is possible on modern accelerator architectures.
Join our dynamic, research-oriented team to help unlock maximum hardware performance for emerging AI workloads. You will be a crucial member of a highly technical group exploring uncharted territories in model optimization, custom kernel development, and cluster-scale AI systems design. If you are passionate about the fundamentals of deep learning and thrive on squeezing every ounce of performance out of advanced computing systems from a single GPU to supercomputer clusters, we want you on our team!
What you will be doing:
  • Explore, research, and prototype novel systems optimizations for advanced deep learning models at the intersection of high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping.
  • Architect and optimize distributed computing systems that scale seamlessly from a single node to massive, cluster-scale supercomputing environments.
  • Design, implement, and optimize custom high-performance CUDA kernels tailored to emerging neural network architectures and workloads.
  • Analyze complex hardware-software interactions to identify and resolve performance bottlenecks in both training and inference pipelines.
  • Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and algorithms that improve accelerator compute utilization, memory bandwidth, cross-node network communication efficiency and programmability.
  • Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.
  • Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

What we need to see:
  • BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
  • 8+ years of relevant industry experience or equivalent academic experience after degree achievement.
  • Strong proficiency in C++ and Python programming.
  • Solid background in the fundamentals of Deep Learning with a focus on transformers.
  • Strong understanding of distributed computing principles, multi-node scaling, and the unique performance challenges of cluster-scale execution.
  • Proven experience in systems programming, computer architecture, and low-level systems performance optimization.
  • Familiarity with deep learning accelerator architectures such as the GPU and hands-on experience with CUDA programming and kernel optimization.
  • A strong analytical approach with experience using profiling tools to deeply understand software performance on hardware.
  • Experience profiling and optimizing innovative vision models, generative AI architectures, or diffusion models.
  • Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile)

Ways to stand out from the crowd:
  • Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.).
  • Hands-on experience with CUDA, communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism).
  • Knowledge of numerical methods, low-precision arithmetic (e.g., NVFP4, MXFP4, FP8, INT8), and their implications on deep learning model accuracy and performance.
  • Familiarity with systems requirements for Reinforcement Learning (RL) or highly parallel simulation environments and/or research background in machine learning systems or adjacent fields.
  • Experience with machine learning, especially agentic systems, applied to systems problems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until May 18, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993