Deep Learning Compression Jobs (NOW HIRING)

Senior Machine Learning Engineer, Runtime and Serving

$123K - $169K/yr

You'll work across the entire ML stack from the system perspective, from efficient deep learning models, model compression, ML software (e.g. JAX, XLA, Triton, and CUDA), to . You will be pleasantly ...

Waymo

Senior Machine Learning Engineer, Runtime and Serving

Mountain View, CA · On-site

$123K - $169K/yr

Apple

Sr. Machine Learning Research Engineer, Siri Speech

Cupertino, CA

$181K - $318K/yr

We believe that the most impactful breakthroughs in deep learning emerge when we address real-world ... Preferred Qualifications Strong expertise in efficient machine learning, model compression and ...

Apple

Sr. Machine Learning Research Engineer, Siri Speech

Cupertino, CA

$181K - $318K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

... deep learning approaches. • Expertise in model acceleration, quantization, or compression (TensorRT, ONNX Runtime). • Familiarity with real-time frameworks and middleware such as ROS 2, GStreamer ...

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$190K - $235K/yr

Strong classical computer vision skills (geometry-based methods, feature extraction) complementing deep learning approaches. * Expertise in model acceleration, quantization, or compression (TensorRT ...

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$190K - $235K/yr

Apple

Machine Learning Video Codec Algorithm Engineer

Cupertino, CA · On-site

... deep passion for video coding and processing technologies ... Orchestrate R&D in emerging media compression technologies and standards. Design and implement ...

Apple

Machine Learning Video Codec Algorithm Engineer

Cupertino, CA · On-site

... deep passion for video coding and processing technologies ... Orchestrate R&D in emerging media compression technologies and standards. Design and implement ...

EnCharge AI

About the Role EnCharge AI is looking for an experienced AI Research Engineer to optimize deep learning models for deployment on edge AI platforms. You will work on model compression, quantization ...

EnCharge AI

Avride

Machine Learning Engineer

Austin, TX · On-site

Design, implement, and refine deep learning models to ensure efficiency, scalability, and ... Optimize inference performance, model compression, and deployment across various hardware platforms ...

Avride

Machine Learning Engineer

Austin, TX · On-site

Redhat

Principal Machine Learning Engineer

Boston, MA · On-site +1

$189K - $312K/yr

You will collaborate with our technical and research teams to develop LLM training and deployment pipelines, implement model compression algorithms, and productize deep learning research. If you are ...

Redhat

Principal Machine Learning Engineer

Boston, MA · On-site +1

$189K - $312K/yr

Avride

Senior / Staff Machine Learning Engineer

Austin, TX · On-site

$124K - $171K/yr

Optimize inference performance, model compression, and deployment across various hardware platforms. * Explore and Apply Cutting-Edge ML Techniques: Stay current with advancements in deep learning ...

Avride

Senior / Staff Machine Learning Engineer

Austin, TX · On-site

$124K - $171K/yr

Avride

Senior / Staff Machine Learning Engineer

Austin, TX · On-site

$124K - $171K/yr

Avride

Senior / Staff Machine Learning Engineer

Austin, TX · On-site

$124K - $171K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $168K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $168K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

Apptronik

Senior Perception Learning Engineer

Sunnyvale, CA · On-site

$122K - $167K/yr

Anduril Industries

Senior Machine Learning Engineer, Sentry Tower

Irvine, CA · On-site

$122K - $168K/yr

Train and deploy deep learning models for real-time applications * Collaborate cross-functionally ... Compression * Experience in one or more of the following: * Visual Odometry, SLAM, Multi-view ...

Anduril Industries

Senior Machine Learning Engineer, Sentry Tower

Irvine, CA · On-site

$122K - $168K/yr

Redhat

Senior Machine Learning Engineer

Boston, MA · On-site +1

$174K - $287K/yr

Redhat

Senior Machine Learning Engineer

Boston, MA · On-site +1

$174K - $287K/yr

Waymo

Senior Machine Learning Engineer, Runtime and Serving

Mountain View, CA · On-site

$213K - $263K/yr

Waymo

Senior Machine Learning Engineer, Runtime and Serving

Mountain View, CA · On-site

$213K - $263K/yr

Waymo

Senior Machine Learning Engineer, Runtime and Serving

Mountain View, CA · On-site +1

$213K - $263K/yr

Waymo

Senior Machine Learning Engineer, Runtime and Serving

Mountain View, CA · On-site +1

$213K - $263K/yr

Apple

Video Codec Machine Learning Engineer, Audio & Media Technologies

San Diego, CA · On-site

$139K - $258K/yr

Exploring and applying the latest advancements in deep learning, neural video compression, generative AI, and computer vision to unlock new possibilities in video coding and processing. Leading and ...

Apple

Video Codec Machine Learning Engineer, Audio & Media Technologies

San Diego, CA · On-site

$139K - $258K/yr

Apple

Machine Learning Architect - Conversational Speech

Cupertino, CA · On-site

Deep, hands-on proficiency in modern deep learning, including large language models and end-to-end ... Experience with on-device ML deployment, including model compression, quantization, and latency ...

Apple

Machine Learning Architect - Conversational Speech

Cupertino, CA · On-site

Ambarella

... compression, and low-power operation. If you enjoy great rewards, Ambarella has it all, great ... Training and optimization of deep learning/ML based computer vision algorithm for edge devices.

Ambarella

Showing results 1-20

People also search for

Explore

Ai Mod

Deep Learning Compression Jobs

Deep Learning Compression information

See salary details

$11K

$83.9K

$140K

How much do deep learning compression jobs pay per year?

As of Jun 7, 2026, the average yearly pay for deep learning compression in the United States is $83,885.00, according to ZipRecruiter salary data. Most workers in this role earn between $72,000.00 and $139,000.00 per year, depending on experience, location, and employer.

What are the typical challenges faced when working on deep learning compression projects?

Professionals in deep learning compression often encounter challenges balancing model size reduction with maintaining high accuracy. Adapting compression techniques—such as pruning, quantization, or knowledge distillation—to different architectures and datasets requires both strong technical knowledge and experimentation. Collaboration with data scientists and software engineers is common, as solutions must be integrated into production systems without sacrificing performance. Staying up to date with rapid advances in compression research is also essential to remain effective and innovative in this role.

What are the key skills and qualifications needed to thrive as a Deep Learning Compression Engineer, and why are they important?

To thrive as a Deep Learning Compression Engineer, you need a strong background in deep learning, machine learning, and mathematics, typically supported by a degree in computer science or a related field. Proficiency with frameworks like TensorFlow or PyTorch, experience with model compression techniques (such as pruning, quantization, and knowledge distillation), and familiarity with hardware accelerators are essential. Strong problem-solving skills, attention to detail, and effective communication help you innovate and collaborate with research and engineering teams. These skills are critical for developing efficient AI models that meet performance and resource constraints in real-world applications.

What is the difference between Deep Learning Compression vs Machine Learning Engineer?

Aspect	Deep Learning Compression	Machine Learning Engineer
Required Credentials	Bachelor's or Master's in Computer Science, AI, or related fields; knowledge of neural networks	Bachelor's or Master's in Computer Science, AI, or related fields; programming skills
Work Environment	Research labs, AI development teams, tech companies focusing on model optimization	Software development teams, AI startups, tech firms building ML applications
Industry Usage	AI model deployment, edge computing, mobile AI applications	Developing ML models, data analysis, AI product development

Deep Learning Compression focuses on reducing model size and improving efficiency of neural networks, often for deployment on limited hardware. Machine Learning Engineers develop, train, and optimize ML models across various applications. While both roles require knowledge of AI and neural networks, Deep Learning Compression specializes in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning compression?

Deep learning compression refers to techniques used to reduce the size, memory footprint, and computational requirements of deep neural networks without significantly sacrificing their performance. This is important for deploying models on resource-constrained devices such as smartphones or embedded systems. Common methods include pruning, quantization, knowledge distillation, and low-rank factorization. These approaches help make deep learning models more efficient and practical for real-world applications.

Infographic showing various Deep Learning Compression job openings in the United States as of May 2026, with employment types broken down into 100% Full Time. Highlights an 33% In-person, and 67% Hybrid job distribution, with an average salary of $83,885 per year, or $40.3 per hour.

Senior Machine Learning Engineer, Runtime and Serving

Waymo

Mountain View, CA • On-site

Apply

$123K - $169K/yr

Other

Posted 8 days ago

Job description

The ML Optimization team at Waymo provides a set of tools to support and automate the lifecycle of the machine learning workflow, including feature and experiment management, model development, optimization and monitoring. These efforts have resulted in making machine learning more accessible to teams at Waymo, including Perception, Planner, Research and Simulation.

We are looking for engineers with ML software & systems expertise to help build the next generation Waymo onboard ML inference engine for Waymo fundamental model. You'll work across the entire ML stack from the system perspective, from efficient deep learning models, model compression, ML software (e.g. JAX, XLA, Triton, and CUDA), to . You will be pleasantly challenged with deploying Waymo ML models on limited computation resources. In this hybrid role, you will report to the Senior Manager of Runtime and Optimization.

You will:

Architect and develop an efficient, high-performance ML runtime and serving system tailored for both onboard autonomous vehicle compute and large-scale, offboard data center environments.
Lead the integration and feature development for ML inference runtimes across both domains, balancing the strict real-time latency and memory constraints of onboard systems with the high-throughput, highly concurrent demands of offboard serving fleets.
Drive the strategic migration of ML workloads toward a JAX-native runtime architecture, which includes extending and modifying underlying ML compilers and runtimes (e.g., OpenXLA/PjRT, TensorRT).
Collaborate with world-class Waymo ML practitioners across perception, planner, and research to analyze system-level ML workloads and apply hardware-aware compute optimizations.
Design and build robust tooling for profiling, benchmarking, and identifying system-level bottlenecks across the end-to-end ML software stack.

You Have:

B.S. or M.S. in CS, EE, Deep Learning or a related field
5+ years of professional software engineering experience focused on building, scaling, or maintaining ML systems and infrastructure.
5+ years production programming in C++.
3+ years of production experience in Python and major deep learning frameworks (e.g., PyTorch, JAX).
Experience optimizing ML software for hardware accelerators (e.g., GPUs, TPUs, custom silicon).
Experience building low-latency, highly concurrent distributed backend systems.

We Prefer

PhD in CS, EE, Deep Learning or a related field.
Experience modifying ML compilers, runtimes, or inference engines (e.g., TensorRT, ONNX Runtime, OpenXLA/PjRT, TVM).
Experience building or scaling LLM serving systems, including expertise in distributed inference and performance optimization (e.g., KV/prefix caching, continuous batching).
Experience with custom kernel development (e.g., CUDA/CUDA Tile, Triton, JAX/Pallas).
Experience architecting unified serving APIs and optimizing tensor buffer management (e.g., zero-copy data transfer, shared memory) for complex, multi-model inference pipelines.

About Waymo

Sourced by ZipRecruiter

Industry

Internet and it

Company size

1,001 - 5,000 Employees

Headquarters location

Mountain View, CA, US

Year founded

2009

Website

waymo.com

Social media

View All Waymo Jobs

Apply

Deep Learning Compression Jobs (NOW HIRING)

Senior Machine Learning Engineer, Runtime and Serving

Senior Machine Learning Engineer, Runtime and Serving

Sr. Machine Learning Research Engineer, Siri Speech

Sr. Machine Learning Research Engineer, Siri Speech

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Machine Learning Video Codec Algorithm Engineer

Machine Learning Video Codec Algorithm Engineer

AI Research Engineer

AI Research Engineer

Machine Learning Engineer

Machine Learning Engineer

Principal Machine Learning Engineer

Principal Machine Learning Engineer

Senior / Staff Machine Learning Engineer

Senior / Staff Machine Learning Engineer

Senior / Staff Machine Learning Engineer

Senior / Staff Machine Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Perception Learning Engineer

Senior Machine Learning Engineer, Sentry Tower

Senior Machine Learning Engineer, Sentry Tower

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Senior Machine Learning Engineer, Runtime and Serving

Senior Machine Learning Engineer, Runtime and Serving

Senior Machine Learning Engineer, Runtime and Serving

Senior Machine Learning Engineer, Runtime and Serving

Video Codec Machine Learning Engineer, Audio & Media Technologies

Video Codec Machine Learning Engineer, Audio & Media Technologies

Machine Learning Architect - Conversational Speech

Machine Learning Architect - Conversational Speech

Staff System Software Engineer - Embedded AI & Algorithm

Staff System Software Engineer - Embedded AI & Algorithm

People also search for

Deep Learning Compression information

See salary details

How much do deep learning compression jobs pay per year?

What are the typical challenges faced when working on deep learning compression projects?

What are the key skills and qualifications needed to thrive as a Deep Learning Compression Engineer, and why are they important?

What is the difference between Deep Learning Compression vs Machine Learning Engineer?

What is deep learning compression?

Senior Machine Learning Engineer, Runtime and Serving

Share this job

Job description

About Waymo

Industry

Company size

Headquarters location

Year founded

Website

Social media

Share this job