Strong systems engineering background (C++, CUDA, Python)
Strong systems engineering background (C++, CUDA, Python)
Senior Engineer II, GPU Kernel and Performance
Seattle, WA · On-site
$167K - $209K/yr
Design and implement high-performance GPU kernels using Triton and CUDA C++ . * Precision Optimization: Develop and deploy state-of-the-art quantization techniques (FP8, INT8, and experimental FP4 ...
Senior Engineer II, GPU Kernel and Performance
Seattle, WA · On-site
$167K - $209K/yr
Design and implement high-performance GPU kernels using Triton and CUDA C++ . * Precision Optimization: Develop and deploy state-of-the-art quantization techniques (FP8, INT8, and experimental FP4 ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Solutions Architect, AI and ML
Seattle, WA · On-site
$71.75 - $94.50/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Solutions Architect, AI and ML
Seattle, WA · On-site
$71.75 - $94.50/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Senior Staff Software Engineer - HPC Integration
Bothell, WA · On-site
$187K - $245K/yr
CUDA Quantum) * Expert analysis skills in areas like statistical testing, modeling and general optimization * Knowledge of one or more domains of computational physics (PDEs, n-body, large-dimension ...
Senior Staff Software Engineer - HPC Integration
Bothell, WA · On-site
$187K - $245K/yr
CUDA Quantum) * Expert analysis skills in areas like statistical testing, modeling and general optimization * Knowledge of one or more domains of computational physics (PDEs, n-body, large-dimension ...
Solutions Architect, AI and ML
Redmond, WA · On-site
$70.50 - $93/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Solutions Architect, AI and ML
Redmond, WA · On-site
$70.50 - $93/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Solutions Architect, AI and ML
Seattle, WA · On-site
$71.75 - $94.50/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Solutions Architect, AI and ML
Seattle, WA · On-site
$71.75 - $94.50/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Responsibilities : • Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads • Develop and tune low-level CUDA kernels (GeMM, Attention, etc.), using ...
Solutions Architect, AI and ML
Redmond, WA · On-site
$70.50 - $93/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Solutions Architect, AI and ML
Redmond, WA · On-site
$70.50 - $93/hr
CUDA, RAPIDS, Triton etc.) • System-level experience specifically GPU-based systems • Experience with Deep Learning at scale • Familiarity with parallel programming and distributed computing ...
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Quick apply
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Software Engineer - C++ GPU Performance
$168K - $239K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Software Engineer - C++ GPU Performance
$168K - $239K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Quick apply
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
... CUDA or Triton is a significant plus • A systematic approach to profiling and optimization -- you measure first, then optimize • Curiosity about diffusion inference, speculative decoding ...
... CUDA or Triton is a significant plus • A systematic approach to profiling and optimization -- you measure first, then optimize • Curiosity about diffusion inference, speculative decoding ...
Senior Software Engineer - C++ GPU Performance
$217K - $307K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Senior Software Engineer - C++ GPU Performance
$217K - $307K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Senior Software Engineer - C++ GPU Performance
Seattle, WA · On-site
$217K - $307K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Senior Software Engineer - C++ GPU Performance
Seattle, WA · On-site
$217K - $307K/yr
Strong knowledge of CUDA as applied to recent GPU microarchitectures (e.g., Ampere, Blackwell) and experience debugging/optimizing GPU kernels using tools like Nsight. * Strong knowledge of C++ and ...
Senior Software Engineer, CUTLASS Platform
Redmond, WA · On-site
$137K - $180K/yr
Collaborate with GPU architecture, CUDA, and NVVM/PTX compiler teams to provide feedback on programming models and to assess the performance of future GPU hardware features. What we need to see:
Senior Software Engineer, CUTLASS Platform
Redmond, WA · On-site
$137K - $180K/yr
Collaborate with GPU architecture, CUDA, and NVVM/PTX compiler teams to provide feedback on programming models and to assess the performance of future GPU hardware features. What we need to see:
Our Compiler team is responsible for constructing and emitting the highest performance GPU machine instructions for Graphics (OpenGL, Vulkan, DX) and Compute (CUDA, PTX, OpenCL, Fortran, C++). This ...
Our Compiler team is responsible for constructing and emitting the highest performance GPU machine instructions for Graphics (OpenGL, Vulkan, DX) and Compute (CUDA, PTX, OpenCL, Fortran, C++). This ...
Senior Deep Learning Software Engineer
Redmond, WA · Hybrid
$137K - $180K/yr
... CUDA and/or Triton. This is an exceptional opportunity for passionate software engineers straddling the boundaries of research and engineering, with a strong background in both machine learning ...
Senior Deep Learning Software Engineer
Redmond, WA · Hybrid
$137K - $180K/yr
... CUDA and/or Triton. This is an exceptional opportunity for passionate software engineers straddling the boundaries of research and engineering, with a strong background in both machine learning ...
CUDA / Triton kernel work, even at a research or hobby scale * Publications or research projects in MLSys, model compression, or inference optimization * Familiarity with multimodal or streaming ...
CUDA / Triton kernel work, even at a research or hobby scale * Publications or research projects in MLSys, model compression, or inference optimization * Familiarity with multimodal or streaming ...
Cuda information
See Bothell, WA salary details
$124.6K - $134.2K
0% of jobs
$134.2K - $143.9K
0% of jobs
$143.9K - $153.5K
0% of jobs
$153.5K - $163.1K
4% of jobs
$163.1K - $172.7K
0% of jobs
$172.7K - $182.3K
0% of jobs
$182.3K - $191.9K
0% of jobs
$191.9K - $201.5K
2% of jobs
$201.5K - $211.1K
0% of jobs
$211.1K - $220.7K
0% of jobs
$222.6K is the 25th percentile. Wages below this are outliers.
$220.7K - $230.3K
94% of jobs
$124.6K
$230.3K
How much do cuda jobs pay per year?
What are some common challenges faced when working as a CUDA Developer, and how can they be addressed?
What are the key skills and qualifications needed to thrive as a CUDA Developer, and why are they important?
What is the difference between Cuda vs GPU Developer?
| Aspect | Cuda | GPU Developer |
|---|---|---|
| Required Credentials | Knowledge of CUDA programming, often with a background in computer science or engineering | Experience with GPU programming, CUDA, OpenCL, or similar; often requires a degree in computer science or related fields |
| Work Environment | Primarily focused on developing and optimizing CUDA-based applications for NVIDIA GPUs | Designing, developing, and maintaining GPU-accelerated applications across various platforms and hardware |
| Industry Usage | Used mainly in high-performance computing, AI, and scientific research involving NVIDIA GPUs | Applied across gaming, scientific computing, AI, and multimedia industries |
In summary, CUDA is a specialized skill set focused on programming NVIDIA GPUs using CUDA, while a GPU Developer has a broader role that may include using various GPU programming tools and working across multiple platforms. CUDA is a subset of the skills a GPU Developer might possess, making them closely related but distinct roles.
What is a Cuda job?
A CUDA job typically involves developing, optimizing, and implementing parallel computing applications using NVIDIA's CUDA platform. CUDA (Compute Unified Device Architecture) enables developers to leverage the power of GPUs for high-performance computing tasks such as deep learning, simulations, and scientific computing. Professionals in this role often work with C, C++, or Python, using CUDA libraries and frameworks to accelerate processing. Strong knowledge of parallel programming, memory management, and GPU architecture is essential for success in this field.
What are CUDA developers?

$122K - $160K/yr
Other
Posted 23 days ago
Job description
We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.
Responsibilities:
- Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM.
- Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.
- Automate checkpointing and failure recovery during month-long training runs.
Required Skills:
- Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
- Experience managing SLURM or Kubernetes-based GPU clusters.
- Strong systems engineering background (C++, CUDA, Python).