Job Summary:
Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.
Responsibilities:
• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
• Curate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality control.
• Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale.
• Implement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalent.
• Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGI.
• Train and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollar.
• Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metrics.
• Build healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical experts.
• Apply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical use.
• Perform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoning.
Qualifications:
Required:
• Bachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related field.
• Demonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or research.
• Hands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflows.
• Strong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflows.
• Experience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architectures.
• Strong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or Ray.
• Experience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflows.
• Deep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offs.
• Strong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelines.
• Strong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environments.
• Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve.
• Limited immigration sponsorship may be available.
Preferred:
• Experience building or optimizing reasoning models, agentic models, or tool-using LLM systems.
• Familiarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or Ollama.
• Experience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed compute.
• Contributions to open-source AI projects, research publications, benchmark development, or model releases.
• Familiarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sector.
Company:
Deloitte drives progress. Our firms around the world help clients become leaders wherever they choose to compete. Founded in 2008, the company is headquartered in Arlington, USA, with a team of 10001+ employees. The company is currently Late Stage.