1

Deepspeed Jobs (NOW HIRING)

next page

Showing results 1-20

Deepspeed information

What are some common challenges faced by engineers working with DeepSpeed and how can they be addressed?

Engineers working with DeepSpeed often encounter challenges related to optimizing large-scale model training, such as managing memory efficiency and tuning distributed training parameters. Troubleshooting issues like gradient accumulation, parallelism strategies, and ensuring compatibility with different hardware setups can be complex. Collaborating closely with data scientists, DevOps, and research teams is essential for addressing these challenges, as is staying updated with the latest DeepSpeed releases and documentation. Regular participation in code reviews and knowledge-sharing sessions can also help engineers overcome technical hurdles and continuously improve model performance.

What is Deepspeed?

Deepspeed is an open-source deep learning optimization library developed by Microsoft, designed to enable distributed training of large-scale models efficiently. It helps researchers and engineers train models that are too large to fit in the memory of a single GPU by offering features like ZeRO optimization, mixed-precision training, and advanced parallelism techniques. Deepspeed is widely used in the machine learning community for its scalability and performance improvements, making it easier to train state-of-the-art models on vast datasets. The library integrates seamlessly with PyTorch and supports training on multiple GPUs and even across multiple machines.

What is the difference between Deepspeed vs Data Scientist?

AspectDeepspeedData Scientist
Required credentialsKnowledge of machine learning frameworks, programming skills in Python, experience with AI model trainingDegree in Data Science, Statistics, Computer Science, or related fields; strong analytical skills
Work environmentAI research labs, tech companies, cloud computing environmentsBusiness, tech companies, research institutions
Industry usageAI model training, deep learning optimizationData analysis, predictive modeling, business insights

Deepspeed focuses on optimizing large-scale AI model training and deep learning performance, while Data Scientists analyze data to generate insights and build predictive models. Both roles require technical skills but serve different purposes within the AI and data ecosystem.

What are the key skills and qualifications needed to thrive as a DeepSpeed Engineer, and why are they important?

To thrive as a DeepSpeed Engineer, you need a solid background in machine learning, deep learning frameworks (such as PyTorch), and distributed systems, often supported by a degree in computer science or a related field. Proficiency with DeepSpeed, parallel computing libraries, and cloud platforms, along with familiarity with tools like CUDA and NCCL, is typically expected. Strong problem-solving abilities, collaboration, and adaptability are crucial soft skills for optimizing large-scale AI models and working with cross-functional teams. Mastering these skills ensures efficient development and deployment of high-performance, scalable AI solutions in demanding environments.
More about Deepspeed jobs
What cities are hiring for Deepspeed jobs? Cities with the most Deepspeed job openings:
What states have the most Deepspeed jobs? States with the most job openings for Deepspeed jobs include:
Infographic showing various Deepspeed job openings in the United States as of June 2026, with employment types broken down into 100% Full Time. Highlights an 75% Physical, 6% Hybrid, and 19% Remote job distribution.
Research Engineer -- Post-Training & Small Language Models (SLMs), Healthcare AI

Research Engineer -- Post-Training & Small Language Models (SLMs), Healthcare AI

Deloitte

Arlington, VA • On-site

Full-time

This job post has expired today. Applications are no longer accepted.


Deloitte rating

8.1

Company rating: 8.1 out of 10

Based on 86 frontline employees who took The Breakroom Quiz

58th of 138 rated financial services


Job description

Job Summary:
Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.
Responsibilities:
• Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
• Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
• Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
• Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
• Curate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality control.
• Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale.
• Implement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalent.
• Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGI.
• Train and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollar.
• Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metrics.
• Build healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical experts.
• Apply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical use.
• Perform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoning.
Qualifications:
Required:
• Bachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related field.
• Demonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or research.
• Hands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflows.
• Strong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflows.
• Experience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architectures.
• Strong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or Ray.
• Experience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflows.
• Deep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offs.
• Strong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelines.
• Strong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environments.
• Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve.
• Limited immigration sponsorship may be available.
Preferred:
• Experience building or optimizing reasoning models, agentic models, or tool-using LLM systems.
• Familiarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or Ollama.
• Experience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed compute.
• Contributions to open-source AI projects, research publications, benchmark development, or model releases.
• Familiarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sector.
Company:
Deloitte drives progress. Our firms around the world help clients become leaders wherever they choose to compete. Founded in 2008, the company is headquartered in Arlington, USA, with a team of 10001+ employees. The company is currently Late Stage.

What Deloitte employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom