1

Deep Learning Quantization Jobs in California (NOW HIRING)

Deep Learning Engineer II POSITION DUTIES: Lead the research, development, and deployment of ... Drive innovation in model compression, quantization, and efficient inference techniques to optimize ...

next page

Showing results 1-20

Deep Learning Quantization information

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.
What cities in California are hiring for Deep Learning Quantization jobs? Cities in California with the most Deep Learning Quantization job openings:
Infographic showing various Deep Learning Quantization job openings in California as of May 2026, with employment types broken down into 6% Internship, and 94% Full Time. Highlights an 94% In-person, and 6% Remote job distribution.
Deep Learning Research Intern

Deep Learning Research Intern

FUTUREWEI TECHNOLOGIES INC

San Jose, CA • On-site

$18 - $59/hr

Other

Posted 9 days ago


Job description

Deep Learning Research Intern

(Embodied AI, Multimodal Foundation Models & Efficient Systems)

About Us

Futurewei is a well-funded independent research organization with a long history of R&D innovation in Silicon Valley. We are committed to open-source development, fundamental research, and advancing next-generation intelligent systems through collaboration and standards development.

About the Role

We are seeking a strong deep learning research intern to join our ASID team in San Jose, CA. This role focuses on building learning systems for embodied intelligence, emphasizing how multimodal foundation models can be trained, compressed, and deployed efficiently in embodied and interactive environments.

Our work goes beyond static perception. We study intelligence grounded in embodied experience-the interaction of perception, action, and environment over time-while ensuring models remain efficient, scalable, and deployable in real-world systems.

Core Research Focus Areas

The intern will contribute to one or more of the following interconnected research directions:

1. Multimodal Foundation Models

  •   Fine-tuning and adaptation of large language models (LLMs), vision-language models (VLMs), and vision-language-action (VLA) models

  • Multimodal representation learning across vision, language, and action

  • Grounding foundation models in embodied experience and temporal interaction

2. Neural (Generative) Image and Video Compression

  • Learning-based image and video compression models

  • Efficient visual representations for perception and downstream embodied tasks

  • Joint optimization of compression efficiency, reconstruction quality, and task relevance

3. Embodied AI

  • Learning frameworks that couple perception, action, and environment dynamics

  • World models, predictive learning, and agent-centric representations

  • Embodied learning in simulation or real-world-inspired environments

4. Model Compression & Inference Acceleration for Embodied Systems

  • Model compression, pruning, quantization, and distillation

  • Efficient inference and deployment strategies for embodied and real-time applications

  • Hardware- and system-aware optimization for edge or robotic platforms

Responsibilities

  • Conduct research in one or more of the focus areas above

  • Design and implement learning algorithms and experimental pipelines

  • Develop prototype systems or demos for embodied and multimodal AI applications

  • Collaborate closely with researchers in a fast-paced, research-driven environment

Qualifications

  • MS or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, Robotics, Mathematics, or a related field

  • Strong foundation in machine learning and deep learning

  • Experience or strong interest in multimodal models, embodied AI, compression, or efficient inference

  • Proficiency with PyTorch; experience with HuggingFace or similar frameworks is a plus

  • Solid Python programming skills

  • Research experience with publications in top conferences or journals preferred

  • Strong communication skills and ability to work effectively in a global research team

Location: San Jose, CA

Hourly interns pay range: $18 to $59, depending on degree-seeking academic program (PhD, Master's, Bachelor's, etc.), years of relevant experience, year in school, geographic location, credentials, qualifications, and other job-related factors.

Housing allowance and relocation benefit might be provided to intern candidates who meet the qualifications.  Additional details on the compensation package will be provided to candidates during the interview process.

Employment Type: Intern