1

Deep Learning Compression Jobs (NOW HIRING)

Leverage innovative model compression techniques to optimize performance-per-joule on custom silicon. * Be a core contributor to our deep learning and DSP development platform, advancing novel ...

Senior Deep Learning Engineer

Austin, TX · On-site +1

$130K - $180K/yr

We're hiring 3 Senior Deep Learning Engineers to join our Neural Networks team. Your primary focus ... Familiarity with model compression techniques like quantization, pruning, etc. These are permanent ...

Deep Learning Engineer II POSITION DUTIES: Lead the research, development, and deployment of ... Drive innovation in model compression, quantization, and efficient inference techniques to optimize ...

... compression techniques to transition large models for on-board compute usage. Responsibilities ... at least one deep learning framework (PyTorch, Tensorflow, Jax) $19 - $65 an hour Your ...

... compression techniques to transition large models for on-board compute usage. Responsibilities ... at least one deep learning framework (PyTorch, Tensorflow, Jax) $19 - $65 an hour Your ...

Optimize inference performance, model compression, and deployment across various hardware platforms. * Explore and Apply Cutting-Edge ML Techniques: Stay up to date with advancements in deep learning ...

Optimize inference performance, model compression, and deployment across various hardware platforms. * Explore and Apply Cutting-Edge ML Techniques: Stay up to date with advancements in deep learning ...

next page

Showing results 1-20

Deep Learning Compression information

See salary details

$11K

$83.9K

$140K

How much do deep learning compression jobs pay per year?

As of Jun 7, 2026, the average yearly pay for deep learning compression in the United States is $83,885.00, according to ZipRecruiter salary data. Most workers in this role earn between $72,000.00 and $139,000.00 per year, depending on experience, location, and employer.

What are the typical challenges faced when working on deep learning compression projects?

Professionals in deep learning compression often encounter challenges balancing model size reduction with maintaining high accuracy. Adapting compression techniques—such as pruning, quantization, or knowledge distillation—to different architectures and datasets requires both strong technical knowledge and experimentation. Collaboration with data scientists and software engineers is common, as solutions must be integrated into production systems without sacrificing performance. Staying up to date with rapid advances in compression research is also essential to remain effective and innovative in this role.

What are the key skills and qualifications needed to thrive as a Deep Learning Compression Engineer, and why are they important?

To thrive as a Deep Learning Compression Engineer, you need a strong background in deep learning, machine learning, and mathematics, typically supported by a degree in computer science or a related field. Proficiency with frameworks like TensorFlow or PyTorch, experience with model compression techniques (such as pruning, quantization, and knowledge distillation), and familiarity with hardware accelerators are essential. Strong problem-solving skills, attention to detail, and effective communication help you innovate and collaborate with research and engineering teams. These skills are critical for developing efficient AI models that meet performance and resource constraints in real-world applications.

What is the difference between Deep Learning Compression vs Machine Learning Engineer?

AspectDeep Learning CompressionMachine Learning Engineer
Required CredentialsBachelor's or Master's in Computer Science, AI, or related fields; knowledge of neural networksBachelor's or Master's in Computer Science, AI, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, tech companies focusing on model optimizationSoftware development teams, AI startups, tech firms building ML applications
Industry UsageAI model deployment, edge computing, mobile AI applicationsDeveloping ML models, data analysis, AI product development

Deep Learning Compression focuses on reducing model size and improving efficiency of neural networks, often for deployment on limited hardware. Machine Learning Engineers develop, train, and optimize ML models across various applications. While both roles require knowledge of AI and neural networks, Deep Learning Compression specializes in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning compression?

Deep learning compression refers to techniques used to reduce the size, memory footprint, and computational requirements of deep neural networks without significantly sacrificing their performance. This is important for deploying models on resource-constrained devices such as smartphones or embedded systems. Common methods include pruning, quantization, knowledge distillation, and low-rank factorization. These approaches help make deep learning models more efficient and practical for real-world applications.
Infographic showing various Deep Learning Compression job openings in the United States as of May 2026, with employment types broken down into 100% Full Time. Highlights an 33% In-person, and 67% Hybrid job distribution, with an average salary of $83,885 per year, or $40.3 per hour.
Deep Learning Research Intern

Deep Learning Research Intern

FUTUREWEI TECHNOLOGIES INC

San Jose, CA • On-site

$18 - $59/hr

Other

Posted 9 days ago


Job description

Deep Learning Research Intern

(Embodied AI, Multimodal Foundation Models & Efficient Systems)

About Us

Futurewei is a well-funded independent research organization with a long history of R&D innovation in Silicon Valley. We are committed to open-source development, fundamental research, and advancing next-generation intelligent systems through collaboration and standards development.

About the Role

We are seeking a strong deep learning research intern to join our ASID team in San Jose, CA. This role focuses on building learning systems for embodied intelligence, emphasizing how multimodal foundation models can be trained, compressed, and deployed efficiently in embodied and interactive environments.

Our work goes beyond static perception. We study intelligence grounded in embodied experience-the interaction of perception, action, and environment over time-while ensuring models remain efficient, scalable, and deployable in real-world systems.

Core Research Focus Areas

The intern will contribute to one or more of the following interconnected research directions:

1. Multimodal Foundation Models

  •   Fine-tuning and adaptation of large language models (LLMs), vision-language models (VLMs), and vision-language-action (VLA) models

  • Multimodal representation learning across vision, language, and action

  • Grounding foundation models in embodied experience and temporal interaction

2. Neural (Generative) Image and Video Compression

  • Learning-based image and video compression models

  • Efficient visual representations for perception and downstream embodied tasks

  • Joint optimization of compression efficiency, reconstruction quality, and task relevance

3. Embodied AI

  • Learning frameworks that couple perception, action, and environment dynamics

  • World models, predictive learning, and agent-centric representations

  • Embodied learning in simulation or real-world-inspired environments

4. Model Compression & Inference Acceleration for Embodied Systems

  • Model compression, pruning, quantization, and distillation

  • Efficient inference and deployment strategies for embodied and real-time applications

  • Hardware- and system-aware optimization for edge or robotic platforms

Responsibilities

  • Conduct research in one or more of the focus areas above

  • Design and implement learning algorithms and experimental pipelines

  • Develop prototype systems or demos for embodied and multimodal AI applications

  • Collaborate closely with researchers in a fast-paced, research-driven environment

Qualifications

  • MS or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Robotics, Mathematics, or a related field

  • Strong foundation in machine learning and deep learning

  • Experience or strong interest in multimodal models, embodied AI, compression, or efficient inference

  • Proficiency with PyTorch; experience with HuggingFace or similar frameworks is a plus

  • Solid Python programming skills

  • Research experience with publications in top conferences or journals preferred

  • Strong communication skills and ability to work effectively in a global research team

Location: San Jose, CA

Hourly interns pay range: $18 to $59, depending on degree-seeking academic program (PhD, Master's, Bachelor's, etc.), years of relevant experience, year in school, geographic location, credentials, qualifications, and other job-related factors.

Housing allowance and relocation benefit might be provided to intern candidates who meet the qualifications.  Additional details on the compensation package will be provided to candidates during the interview process.

Employment Type: Intern