1

Deep Learning Quantization Jobs in Arizona (NOW HIRING)

Sr. Advanced AI Software Engineer

Nogales, AZ · On-site

$120K - $159K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Avondale, AZ · On-site

$119K - $157K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Sun City, AZ · On-site

$118K - $155K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Laveen, AZ · On-site

$116K - $154K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Phoenix, AZ · On-site

$115K - $152K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Phoenix, AZ · On-site

$121K - $160K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Mesa, AZ · On-site

$121K - $160K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Phoenix, AZ · On-site

$121K - $159K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Tolleson, AZ · On-site

$120K - $158K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Phoenix, AZ · On-site

$121K - $160K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Tempe, AZ · On-site

$119K - $157K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Guadalupe, AZ · On-site

$120K - $158K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

Sr. Advanced AI Software Engineer

Youngtown, AZ · On-site

$117K - $154K/yr

Core AI / Machine Learning * Deep expertise in : Machine learning fundamentals (supervised ... Edge AI or model compression/quantization * AI safety research and explainability techniques

next page

Showing results 1-20

Deep Learning Quantization information

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.
What cities in Arizona are hiring for Deep Learning Quantization jobs? Cities in Arizona with the most Deep Learning Quantization job openings:
Research Engineer -- Post-Training & Small Language Models (SLMs), Healthcare AI

Research Engineer -- Post-Training & Small Language Models (SLMs), Healthcare AI

Deloitte

Gilbert, AZ • On-site

Full-time

Posted 9 days ago


Deloitte rating

8.1

Company rating: 8.1 out of 10

Based on 86 frontline employees who took The Breakroom Quiz

58th of 138 rated financial services


Job description

Job Summary:
Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.
Responsibilities:
• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
• Curate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality control.
• Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale.
• Implement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalent.
• Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGI.
• Train and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollar.
• Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metrics.
• Build healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical experts.
• Apply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical use.
• Perform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoning.
Qualifications:
Required:
• Bachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related field.
• Demonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or research.
• Hands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflows.
• Strong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflows.
• Experience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architectures.
• Strong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or Ray.
• Experience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflows.
• Deep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offs.
• Strong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelines.
• Strong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environments.
• Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve.
• Limited immigration sponsorship may be available.
Preferred:
• Experience building or optimizing reasoning models, agentic models, or tool-using LLM systems.
• Familiarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or Ollama.
• Experience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed compute.
• Contributions to open-source AI projects, research publications, benchmark development, or model releases.
• Familiarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sector.
Company:
Deloitte drives progress. Our firms around the world help clients become leaders wherever they choose to compete. Founded in 2008, the company is headquartered in Arlington, USA, with a team of 10001+ employees. The company is currently Late Stage.

What Deloitte employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom