1

Deep Learning Quantization Jobs (NOW HIRING)

OR · Hybrid

$122.40K - $161.30K/yr

... like quantization, scheduling, memory management, and distributed inference to set the gold ... Scale performance of deep learning models across different architectures and types of NVIDIA ...

OR

$104.40K - $143.40K/yr

Work with deep learning compiler and architecture teams to analyze and validate sophisticated ... DL model internals depth: experience with quantization, operator fusion, mixed-precision, or graph ...

OR

$139.90K/yr

... like quantization, scheduling, memory management, and distributed inference to set the gold ... Scale performance of deep learning models across different architectures and types of NVIDIA ...

next page

Showing results 1-20

Deep Learning Quantization information

See salary details

$11K

$83.9K

$140K

How much do deep learning quantization jobs pay per year?

As of Jun 3, 2026, the average yearly pay for deep learning quantization in the United States is $83,885.00, according to ZipRecruiter salary data. Most workers in this role earn between $72,000.00 and $139,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

More about Deep Learning Quantization jobs
What cities are hiring for Deep Learning Quantization jobs? Cities with the most Deep Learning Quantization job openings:
What states have the most Deep Learning Quantization jobs? States with the most job openings for Deep Learning Quantization jobs include:
What job categories do people searching Deep Learning Quantization jobs look for? The top searched job categories for Deep Learning Quantization jobs are:
Infographic showing various Deep Learning Quantization job openings in the United States as of May 2026, with employment types broken down into 67% Full Time, and 33% Contract. Highlights an 67% In-person, and 33% Remote job distribution, with an average salary of $83,885 per year, or $40.3 per hour.

Staff Machine Learning Engineer - Autonomous Driving Model Quantization & Deployment

XPENG

Santa Clara, CA

Other

Posted 18 days ago


Job description

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.
The Mission: The challenge of Vision-Language-Action (VLA) models and Foundation Models isn't just their intelligence-it's their real-time execution at the edge. We are seeking a high-caliber Staff Machine Learning Engineer to bridge the gap between massive research models and production-ready L4 autonomous driving systems. You will lead the effort to optimize and deploy our VLA models onto vehicle-grade compute platforms for our global fleet.
Key Responsibilities:
  • Lead Optimization Strategy: Own the end-to-end quantization and optimization roadmap for large-scale multimodal models (Transformers, VLMs).
  • Model Compression: Apply and innovate in PTQ (Post-Training Quantization), QAT (Quantization-Aware Training), and pruning techniques to fit VLA models into strict memory and power envelopes.
  • Hardware-Software Co-design: Collaborate directly with model researchers to ensure architectures are "deployment-friendly" and with platform teams to influence future hardware requirements.
  • Production Excellence: Develop and maintain robust, safety-critical deployment stacks in Modern C++, ensuring 24/7 stability and deterministic performance on the road.
Basic Qualifications:
  • Proven Track Record: 5-8 years of experience in model deployment, quantization, or high-performance computing (HPC).
  • Core Technical Skills: Mastery of Modern C++ and deep experience with CUDA or other hardware acceleration libraries.
  • Deep Learning Expertise: Strong familiarity with PyTorch and deep knowledge of inference engines like TensorRT, ONNX Runtime, or TVM.
  • Quantization Depth: Hands-on experience with INT8/FP8/INT4 quantization and knowledge of the unique challenges in quantizing Large Language Models (LLMs) or Transformers.
  • Platform Knowledge: Solid understanding of computer architecture (Cache, Memory Bandwidth, SIMD) and experience with embedded/edge compute constraints.
  • Systems Thinking: Ability to debug complex performance bottlenecks across the entire software stack.
Preferred Qualifications:
  • Experience with VLA/VLM or other Foundation Model deployment.
  • Background in autonomous driving, robotics, or real-time safety-critical systems.
  • Contributions to open-source inference or compiler projects.
What do we provide:
  • A fun, supportive and engaging environment
  • Infrastructures and computational resources to support your ML model development/research.
  • Opportunity to work on cutting edge technologies with the top talent in the field.
  • Opportunity to make significant impact on transportation revolution by the means of advancing autonomous driving
  • Competitive compensation package
  • Snacks, lunches, dinners, and fun activities

The base salary range for this full-time position is $215,280-$364,320, in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.
We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.