1

Machine Learning Engineer Quantization Jobs in Sunnyvale, CA

As a Machine Learning Engineer, you will play a central role in translating cutting-edge machine ... Hands on experience with quantization techniques (AWQ, GPTQ, FP8/GGUF)

As a Machine Learning Engineer, you will play a central role in translating cutting-edge machine ... Hands on experience with quantization techniques (AWQ, GPTQ, FP8/GGUF)

Improve inference efficiency and model compression techniques, including quantization, pruning, and ... Engineering, or related fields. * Strong experience in machine learning, with a focus on edge AI ...

Machine Learning Engineer At Advex, we're working on solving the hardest problem in all of computer ... Pruning and model quantization We are seeking an individual who is strategic, capable of ...

Improve inference efficiency and model compression techniques, including quantization, pruning, and ... Engineering, Machine Learning, or related fields. * Must have prior experience managing a team ...

Improve inference efficiency and model compression techniques, including quantization, pruning, and ... Engineering, Machine Learning, or related fields. * Must have prior experience managing a team ...

next page

Showing results 1-20

Machine Learning Engineer Quantization information

See Sunnyvale, CA salary details

$37K

$151.1K

$227.1K

How much do machine learning engineer quantization jobs pay per year?

As of May 28, 2026, the average yearly pay for machine learning engineer quantization in Sunnyvale, CA is $151,130.00, according to ZipRecruiter salary data. Most workers in this role earn between $119,100.00 and $181,900.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Machine Learning Engineer Quantization, and why are they important?

To thrive as a Machine Learning Engineer Quantization, you need a solid background in machine learning, deep learning, and computer science, typically supported by a degree in a related field. Familiarity with quantization techniques, frameworks such as TensorFlow Lite or PyTorch, and experience with hardware accelerators are crucial. Strong problem-solving skills, attention to detail, and effective collaboration set top performers apart. These capabilities are vital for efficiently deploying high-performing models on resource-constrained devices and ensuring scalable, real-world AI solutions.

What are some common challenges Machine Learning Engineers face when implementing quantization techniques in production models?

Machine Learning Engineers working on quantization often encounter challenges such as balancing reduced model size and computational efficiency with maintaining acceptable accuracy levels. Adapting quantization methods to different hardware platforms can also require significant testing and optimization. Additionally, engineers must frequently address compatibility issues with existing deployment pipelines and ensure that quantization-aware training is properly integrated to minimize performance degradation. Collaboration with hardware and software teams is essential to streamline deployment and achieve optimal results.

What does a Machine Learning Engineer Quantization do?

A Machine Learning Engineer specializing in quantization focuses on optimizing machine learning models by reducing their size and computational requirements without significantly sacrificing accuracy. This involves converting model parameters and computations from high-precision formats (like 32-bit floating point) to lower-precision formats (such as 8-bit integers). Quantization enables faster inference, lower memory usage, and allows models to run efficiently on edge devices and mobile platforms. These engineers work closely with data scientists and hardware teams to implement, test, and validate quantized models in production environments.

What is the difference between Machine Learning Engineer Quantization vs Data Scientist?

AspectMachine Learning Engineer QuantizationData Scientist
Required CredentialsBachelor's or master's in CS, ML, or related; certifications in ML or AIBachelor's or master's in statistics, CS, or related; certifications in data analysis or statistics
Work EnvironmentDeveloping optimized ML models, deploying quantized models for efficiencyAnalyzing data, building predictive models, interpreting results
Industry UsageTech companies, AI hardware firms, embedded systemsFinance, healthcare, marketing, research institutions

Machine Learning Engineer Quantization focuses on optimizing ML models for deployment efficiency, often working closely with hardware and software teams. Data Scientists analyze data and build models for insights. While both roles require ML knowledge, quantization engineers specialize in model compression techniques, whereas data scientists focus on data analysis and interpretation.

What are popular job titles related to Machine Learning Engineer Quantization jobs in Sunnyvale, CA? For Machine Learning Engineer Quantization jobs in Sunnyvale, CA, the most frequently searched job titles are:
What job categories do people searching Machine Learning Engineer Quantization jobs in Sunnyvale, CA look for? The top searched job categories for Machine Learning Engineer Quantization jobs in Sunnyvale, CA are:
What cities near Sunnyvale, CA are hiring for Machine Learning Engineer Quantization jobs? Cities near Sunnyvale, CA with the most Machine Learning Engineer Quantization job openings:

Machine Learning Engineer

Nace AI

Palo Alto, CA • On-site

Other

This job post has expired today. Applications are no longer accepted.


Job description

Role Overview:
As a Machine Learning Engineer, you will play a central role in translating cutting-edge machine learning research into scalable, production-ready solutions. You will collaborate closely with cross-functional teams to identify opportunities where ML can drive product value, architect robust model-centric systems, and ensure their seamless integration into real-world applications. The role requires a strong balance between theoretical understanding and engineering execution, with a focus on building reliable, maintainable, and high-impact AI-driven features that align with Nace.AI's strategic objectives.
Key Responsibilities:
  • Design, build, and maintain end-to-end ML systems, including synthetic data pipelines, model training, debugging, and performance evaluation.
  • Fine-tune large language models (LLMs) and implement meta-learning methods to enhance model generalization and efficiency.
  • Improve existing Nace.AI models by incorporating advancements from recent ML research.
Qualifications:
  • Hands-on experience training and fine-tuning large language models (LLMs) and vision-language models (VLMs), including practical work with pre-training, instruction tuning, and alignment techniques (GRPO,RLHF/DPO/PPO).
  • Hands-on Experience with Deep Learning Models, especially Transformers.
  • Ability to translate cutting-edge research from papers into clean, production-ready code (Paper to Code).
  • Proven experience scaling inference infrastructure for LLMs/VLMs, including expertise in model serving frameworks like vLLM, TGI.
  • Proficient in Python with a strong track record of building substantial projects.
  • Solid foundation in computer science fundamentals (data structures, algorithms, design patterns).
  • BS degree in CS or related technical field.
  • Solid Experience with ML frameworks and libraries (PyTorch, TensorFlow).
  • Self-starter comfortable working in a fast-paced, dynamic environment.
Preferred Qualifications:
  • MS/PhD in CS or related technical field.
  • Familiarity with data processing stacks such as Spark and Airflow.
  • Experience with multi-node GPU training.
  • Contributor to open-source ML projects.
  • Deep knowledge in Linear Programming.
  • Experience with advanced NLP and Multimodal post-training experience (e.g., model distillation, quantization, deployment optimization).
  • Experienced in inference time optimization, deep understanding of LLM serving optimizations for LLMs/VLMs.
  • Hands on experience with quantization techniques (AWQ, GPTQ, FP8/GGUF).