Distillation Lead
$195K - $286K/yr
... quantization). - Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas. - Experience distilling large generative ...
$195K - $286K/yr
... quantization). - Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas. - Experience distilling large generative ...
$195K - $286K/yr
... quantization). - Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas. - Experience distilling large generative ...
$195K - $286K/yr
... quantization). - Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas. - Experience distilling large generative ...
Quick apply
$195K - $286K/yr
... quantization). - Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas. - Experience distilling large generative ...
$136K - $225K/yr
Solid understanding of fundamental deep learning concepts and computer vision techniques ... Experience with TensorRT and model quantization The salary range for this role is an estimate based ...
Quick apply
$136K - $225K/yr
Solid understanding of fundamental deep learning concepts and computer vision techniques ... Experience with TensorRT and model quantization The salary range for this role is an estimate based ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
Quick apply
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
The role spans both deep technical work and collaboration with teams closest to the customer. As a ... TensorRT, quantization) * Work with Operations and Product to understand customer needs and ...
Pittsburgh, PA · On-site +1
$141K - $249K/yr
Examples include designing new CUDA kernels, quantization-aware training and inference, and ... deep learning frameworks such as PyTorch or Jax. - Skilled in profiling CPU and GPU code using ...
Quick apply
Pittsburgh, PA · On-site +1
$141K - $249K/yr
Examples include designing new CUDA kernels, quantization-aware training and inference, and ... deep learning frameworks such as PyTorch or Jax. - Skilled in profiling CPU and GPU code using ...
Pittsburgh, PA · On-site +1
$141K - $249K/yr
Examples include designing new CUDA kernels, quantization-aware training and inference, and ... deep learning frameworks such as PyTorch or Jax. - Skilled in profiling CPU and GPU code using ...
Pittsburgh, PA · On-site +1
$141K - $249K/yr
Examples include designing new CUDA kernels, quantization-aware training and inference, and ... deep learning frameworks such as PyTorch or Jax. - Skilled in profiling CPU and GPU code using ...
| Aspect | Deep Learning Quantization | Machine Learning Engineer |
|---|---|---|
| Required Credentials | Advanced degrees in AI, Computer Science, or related fields; knowledge of neural networks | Bachelor's or Master's in CS, Data Science, or related fields; programming skills |
| Work Environment | Research labs, AI development teams, hardware optimization settings | Software development teams, data-driven projects, product-focused environments |
| Industry Usage | AI hardware optimization, model deployment, edge computing | Model development, data analysis, software solutions across industries |
Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

$195K - $286K/yr
Full-time
Medical, Dental, Vision, PTO
Posted 25 days ago
You will...
- Define and drive the technical strategy for model distillation and compression across Waabi's AI stack - spanning perception, world models, and planning - with an eye toward both onboard deployment and simulation use-cases.
- Design, implement, and scale state-of-the-art distillation and efficiency pipelines, which may include:
Distillation for generative models (diffusion, autoregressive, flow-matching, video models)
Quantization-aware training (QAT) and post-training quantization (PTQ)
Knowledge distillation (feature-level, response-based, and relation-based)
Structured and unstructured pruning and sparsification
Low-rank factorization and efficient architecture design
Speculative decoding and other inference-time efficiency techniques
- Collaborate closely with ML Platform, Infrastructure, Onboard, Autonomy, and Simulation teams to integrate compressed models into production pipelines and meet latency, memory, and throughput targets across deployment contexts.
- Define rigorous benchmarks and evaluation frameworks to characterize efficiency vs. quality trade-offs across models and hardware targets.
- Mentor and guide researchers and engineers working in the distillation and model efficiency space, setting a high technical bar and fostering a culture of rigorous experimentation.
- Champion best practices for model compression across the organization; disseminate knowledge through internal design reviews, documentation, and technical talks.
- Stay at the cutting edge of model efficiency research; contribute to the broader scientific community through publications and open-source contributions.
Qualifications:
- Deep distillation expertise: You have extensive hands-on experience designing and implementing distillation, quantization, pruning, and model compression techniques for large-scale neural networks, with demonstrated impact in production settings.
- Strong research and engineering foundation: A Bachelor's or Master's degree in Machine Learning, Computer Vision, Robotics, or a related field, or equivalent industry experience; relevant hands-on experience in model distillation and efficiency is what matters most. Expert Python and PyTorch (or JAX) skills with experience in large-scale distributed training.
- Technical leadership: You have a proven track record of setting technical direction and driving projects from conception to production. You inspire and elevate those around you through deep technical expertise and mentorship.
- Cross-functional collaboration: You have experience working closely with infrastructure, platform, and autonomy teams to deploy compressed models under real engineering constraints.
- Clear communicator: You can communicate complex technical trade-offs clearly to diverse audiences and drive alignment across research and engineering teams.
Bonus:
- Experience with hardware-aware optimization (TensorRT, ONNX, custom CUDA kernels, hardware-specific quantization).
- Publications at top-tier ML/CV venues (NeurIPS, ICML, CVPR, ICLR, ECCV) in model compression, efficient deep learning, or related areas.
- Experience distilling large generative models (diffusion models, LLMs, VLMs, or video models).
- Background in autonomous vehicles or robotics.