The roleWe are looking for a Staff ML Performance Engineer to join our Training Tech team working on optimizing large scale ML jobs to enable scaling our models to the next order of magnitude. A successful candidate will increase efficiency of training and inference workloads in order to allow Wayve to train larger models faster.
Key responsibilities:
- Profile ML workloads to identify their bottlenecks, e.g. using NVIDIA Nsight Systems
- Design and implement efficiency improvements to maximize MFU and throughput, e.g. parallelism, model compilation, mixed precision
- Design and implement observability tools to identify bottlenecks and drive performance improvements, e.g. to track MFU, throughput, latency, etc
- Design and implement benchmarking tools, e.g. to track efficiency gains or regressions
- Collaborate closely with Research teams to integrate training efficiency improvements and create a culture of performance optimization
About youIn order to set you up for success in this role, we're looking for the following skills and experience.
Essential
- 10+ years of industry experience driving performance engineering across ML systems, GPU compute infrastructure, distributed platforms or similar field.
- Experience optimizing large scale jobs on GPU compute clusters.
- Experience in working in platform teams and working with research teams.
- Experience in writing, reporting, and tracking performance benchmarks in an open and accessible way.
- Ability to write high quality, well-structured and tested Python code
- BS or MS in Machine Learning, Computer Science, Engineering, or a related technical discipline or equivalent experience
Desirable
- Experience working with concurrent, parallel and distributed computing.
- Experience using NVIDIA NSight Systems or other system profilers.
- Experience implementing GPU kernels (CUDA, Triton, etc).
- Knowledge of computing fundamentals - what makes code fast, secure and reliable.
This role is a full-time role based in Sunnyvale, CA (hybrid) and the reasonably estimated salary for this role ranges from $336,400 to $359,000, plus a competitive equity package. Actual compensation is based on the candidate's skills, qualifications, and experience.
#LI-HH1