1

Machine Learning Engineer Quantization Jobs (NOW HIRING)

As a Machine Learning Engineer, you will play a central role in translating cutting-edge machine ... Hands on experience with quantization techniques (AWQ, GPTQ, FP8/GGUF)

Sr. Machine Learning Engineer Location: New York, NY Sponsorship: Yes Relocation: Yes Industry ... model quantization, fixed point neural networks (CNN and RNN) * Excellent research and problem ...

As a Machine Learning Engineer, you will play a critical role in shaping the future of cooking ... Understanding of model optimization techniques such as quantization, pruning, and inference ...

Optimize models for production deployment, including ONNX / TensorRT / quantization / inference ... Electrical Engineering, Robotics, Computer Vision, Machine Learning, or a related field. * 3-5 ...

Optimize models for production deployment, including ONNX / TensorRT / quantization / inference ... Electrical Engineering, Robotics, Computer Vision, Machine Learning, or a related field. * 3-5 ...

We're seeking a skilled Machine Learning Engineer to build and deploy production ML systems for the ... Experience with model compression techniques (quantization, pruning, distillation) * Contributions ...

As a Machine Learning at BetterHelp, you'll join a diverse team of licensed clinicians, engineers ... Optimize inference performance through quantization, distillation, batching, and model serving ...

As a Machine Learning at BetterHelp, you'll join a diverse team of licensed clinicians, engineers ... Optimize inference performance through quantization, distillation, batching, and model serving ...

Machine Learning Engineer, Senior

Austin, TX · On-site

$103K - $142K/yr

The company is seeking a Senior Machine Learning Engineer to design, train, and maintain models for ... quantization-aware training and knowledge distillation. • Prior experience in defense or other ...

... quantization / inference acceleration. • Work with deployment and platform teams to validate ... Engineering, Robotics, Computer Vision, Machine Learning, or a related field. • 3-5 years of ...

Machine Learning Engineer - Computer Vision & Robotics Tycho.AI is redefining the future of ... CUDA kernel development and model optimization (quantization, pruning, distillation). * Experience ...

Machine Learning Engineer - Edge

Dover, NH · On-site +1

$86K - $135K/yr

Machine Learning Engineer - Edge *Please consider before applying: This is a hybrid role, and ... Apply techniques such as model compression, quantization, pruning, and distillation to improve ...

next page

Showing results 1-20

Machine Learning Engineer Quantization information

See salary details

$31.5K

$128.8K

$193.5K

How much do machine learning engineer quantization jobs pay per year?

As of Jul 4, 2026, the average yearly pay for machine learning engineer quantization in the United States is $128,769.00, according to ZipRecruiter salary data. Most workers in this role earn between $101,500.00 and $155,000.00 per year, depending on experience, location, and employer.

What are some common challenges Machine Learning Engineers face when implementing quantization techniques in production models?

Machine Learning Engineers working on quantization often encounter challenges such as balancing reduced model size and computational efficiency with maintaining acceptable accuracy levels. Adapting quantization methods to different hardware platforms can also require significant testing and optimization. Additionally, engineers must frequently address compatibility issues with existing deployment pipelines and ensure that quantization-aware training is properly integrated to minimize performance degradation. Collaboration with hardware and software teams is essential to streamline deployment and achieve optimal results.

What are the key skills and qualifications needed to thrive as a Machine Learning Engineer Quantization, and why are they important?

To thrive as a Machine Learning Engineer Quantization, you need a solid background in machine learning, deep learning, and computer science, typically supported by a degree in a related field. Familiarity with quantization techniques, frameworks such as TensorFlow Lite or PyTorch, and experience with hardware accelerators are crucial. Strong problem-solving skills, attention to detail, and effective collaboration set top performers apart. These capabilities are vital for efficiently deploying high-performing models on resource-constrained devices and ensuring scalable, real-world AI solutions.

What does a Machine Learning Engineer Quantization do?

A Machine Learning Engineer specializing in quantization focuses on optimizing machine learning models by reducing their size and computational requirements without significantly sacrificing accuracy. This involves converting model parameters and computations from high-precision formats (like 32-bit floating point) to lower-precision formats (such as 8-bit integers). Quantization enables faster inference, lower memory usage, and allows models to run efficiently on edge devices and mobile platforms. These engineers work closely with data scientists and hardware teams to implement, test, and validate quantized models in production environments.

What is the difference between Machine Learning Engineer Quantization vs Data Scientist?

AspectMachine Learning Engineer QuantizationData Scientist
Required CredentialsBachelor's or master's in CS, ML, or related; certifications in ML or AIBachelor's or master's in statistics, CS, or related; certifications in data analysis or statistics
Work EnvironmentDeveloping optimized ML models, deploying quantized models for efficiencyAnalyzing data, building predictive models, interpreting results
Industry UsageTech companies, AI hardware firms, embedded systemsFinance, healthcare, marketing, research institutions

Machine Learning Engineer Quantization focuses on optimizing ML models for deployment efficiency, often working closely with hardware and software teams. Data Scientists analyze data and build models for insights. While both roles require ML knowledge, quantization engineers specialize in model compression techniques, whereas data scientists focus on data analysis and interpretation.

More about Machine Learning Engineer Quantization jobs
What cities are hiring for Machine Learning Engineer Quantization jobs? Cities with the most Machine Learning Engineer Quantization job openings:
What states have the most Machine Learning Engineer Quantization jobs? States with the most job openings for Machine Learning Engineer Quantization jobs include:
Infographic showing various Machine Learning Engineer Quantization job openings in the United States as of June 2026, with employment types broken down into 2% As Needed, 95% Full Time, 1% Part Time, and 2% Nights. Highlights an 87% Physical, 2% Hybrid, and 11% Remote job distribution, with an average salary of $128,769 per year, or $61.9 per hour.
Staff Machine Learning Engineer - Model Optimization & Quantization

Staff Machine Learning Engineer - Model Optimization & Quantization

Qualcomm

San Diego, CA • On-site

Full-time

Posted 23 days ago


Qualcomm rating

9.6

Company rating: 9.6 out of 10

Based on 5 frontline employees who took The Breakroom Quiz

5th of 202 rated software companies


Job description

Job Summary:
Qualcomm Technologies, Inc. is seeking a Staff Machine Learning Engineer to join their AI Hub team. The role involves developing tools for optimizing and deploying machine learning models on edge and mobile hardware, focusing on model quantization and compression techniques.
Responsibilities:
• Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, AdaScale etc.)
• Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
• Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
• Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
• Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
• Integrate AIMET in AI Hub Workbench Quantize job to enable Quantization at large scale.
• Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
• Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
• Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub's growing model catalog
Qualifications:
Required:
• Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
• OR Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
• OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Preferred:
• 3+ years of industry experience in machine learning, deep learning, or AI infrastructure
• Strong proficiency in Python, with hands-on experience in PyTorch, ONNX and/or TensorFlow
• Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
• Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
• Hands-on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
• Familiarity with AIMET, GPTQ, AWQ, SmoothQuant, or similar quantization frameworks is a strong plus
• Experience working with ONNX, TFLite/LiteRT, or other model interchange formats
• Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
• Experience collaborating across teams or BUs to drive technical alignment and model delivery
• Proficiency with git and software development best practices
• Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
• Experience with C++ for performance-critical components is a bonus
• Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
• Experience with automated evaluation pipelines and model benchmarking at scale is a plus
Company:
Qualcomm designs wireless technologies and semiconductors that power connectivity, communication, and smart devices. Founded in 1985, the company is headquartered in San Diego, USA, with a team of 10001+ employees. The company is currently Late Stage.

What Qualcomm employees say

Pay

Benefits

Workplace

Get the full story on Breakroom


Qualcomm logo

About Qualcomm

Sourced by ZipRecruiter

Qualcomm is enabling a world where everyone and everything can be intelligently connected. You interact with products and technologies made possible by Qualcomm every day, including 5G-enabled smartphones that double as pro-level cameras and gaming devices, smarter vehicles and cities, and the technology behind the smart, connected factories that manufactured your latest purchase. Our powerful connectivity solutions keep you connected—even in remote areas. Qualcomm 5G and AI innovations are the power behind the connected intelligent edge. You’ll find our technologies behind and inside the innovations that deliver significant value across multiple industries and to billions of people every day.

Industry

Technology, communication and media

Company size

10,000+ Employees

Headquarters location

San Diego, CA, US

Year founded

1985