Palo Alto, CA or Seattle, WA (Hybrid/Remote) About the Team Centific AI Research advances ... DPO, IPO, KTO, offline preference optimization • Group-based methods: GRPO, RLOO, sample ...
Palo Alto, CA or Seattle, WA (Hybrid/Remote) About the Team Centific AI Research advances ... DPO, IPO, KTO, offline preference optimization • Group-based methods: GRPO, RLOO, sample ...
Accounts Receivable Specialist
Phoenix, AZ · Remote
$20 - $26.50/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and ... This is a remote position that requires the selected candidate to report to our East Phoenix office ...
Accounts Receivable Specialist
Phoenix, AZ · Remote
$20 - $26.50/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and ... This is a remote position that requires the selected candidate to report to our East Phoenix office ...
Accounts Receivable Specialist
Phoenix, AZ · Remote
$20 - $26.50/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires the selected candidate to report to our East Phoenix office ...
Accounts Receivable Specialist
Phoenix, AZ · Remote
$20 - $26.50/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires the selected candidate to report to our East Phoenix office ...
... DPO, GRPO), and joint embedding spaces, as well as speech and audio intelligence capabilities such ... This is a remote position. All communication and resumes must be in English. Responsibilities: The ...
... DPO, GRPO), and joint embedding spaces, as well as speech and audio intelligence capabilities such ... This is a remote position. All communication and resumes must be in English. Responsibilities: The ...
... DPO, GRPO), and joint embedding spaces, as well as speech and audio intelligence capabilities such ... This is a remote position. All communication and resumes must be in English. Responsibilities: The ...
... DPO, GRPO), and joint embedding spaces, as well as speech and audio intelligence capabilities such ... This is a remote position. All communication and resumes must be in English. Responsibilities: The ...
Principal Applied Scientist, Agentic AI
$181K - $290K/yr
We collaborate closely with platform, product, and operations partners in a fast-moving, remote ... DPO for multi-objective optimization. * Develop reward models and objective formulations that ...
Principal Applied Scientist, Agentic AI
$181K - $290K/yr
We collaborate closely with platform, product, and operations partners in a fast-moving, remote ... DPO for multi-objective optimization. * Develop reward models and objective formulations that ...
Research Intern - Applied Reinforcement Learning
$35 - $45/hr
... RLHF, DPO, PPO, and emerging methods - Design of reward functions, verifiers, and evaluation ... Palo Alto, CA (Preferred), Redmond, WA (Preferred) or Remote Duration: 3-6 months What We Offer ...
Research Intern - Applied Reinforcement Learning
$35 - $45/hr
... RLHF, DPO, PPO, and emerging methods - Design of reward functions, verifiers, and evaluation ... Palo Alto, CA (Preferred), Redmond, WA (Preferred) or Remote Duration: 3-6 months What We Offer ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA · On-site +1
$270K - $300K/yr
Drive hands-on RL training runs and optimize system design; guide customers through LoRA, SFT, DPO ... remote work. The US base salary range for this full-time position is: $270,000 - $300,000 OTE ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA · On-site +1
$270K - $300K/yr
Drive hands-on RL training runs and optimize system design; guide customers through LoRA, SFT, DPO ... remote work. The US base salary range for this full-time position is: $270,000 - $300,000 OTE ...
Regional Manager for Pediatrics
NY · Remote
$90/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... This is a remote position with required travel across the Northeast region. * Compensation: * $90 ...
Regional Manager for Pediatrics
NY · Remote
$90/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... This is a remote position with required travel across the Northeast region. * Compensation: * $90 ...
... RLHF, DPO, PPO, and emerging methods - Design of reward functions, verifiers, and evaluation ... Palo Alto, CA (Preferred), Redmond, WA (Preferred) or Remote Duration: 3-6 months What We Offer ...
... RLHF, DPO, PPO, and emerging methods - Design of reward functions, verifiers, and evaluation ... Palo Alto, CA (Preferred), Redmond, WA (Preferred) or Remote Duration: 3-6 months What We Offer ...
Regional Manager for Pediatrics
NY · Remote
$90/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking ... This is a remote position with required travel across the Northeast region. * Compensation: * $90 ...
Regional Manager for Pediatrics
NY · Remote
$90/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking ... This is a remote position with required travel across the Northeast region. * Compensation: * $90 ...
Formal DPO experience is a plus but not required. * The ability to work cross-functionally with ... We're remote-first and async-heavy. Most of your influence will come through clear documentation ...
Formal DPO experience is a plus but not required. * The ability to work cross-functionally with ... We're remote-first and async-heavy. Most of your influence will come through clear documentation ...
Formal DPO experience is a plus but not required. * The ability to work cross-functionally with ... We're remote-first and async-heavy. Most of your influence will come through clear documentation ...
Quick apply
Formal DPO experience is a plus but not required. * The ability to work cross-functionally with ... We're remote-first and async-heavy. Most of your influence will come through clear documentation ...
Software Engineer 5 - Model Runtime, AI Platform
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Software Engineer 5 - Model Runtime, AI Platform
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Software Engineer 5 - Model Runtime, AI Platform
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Software Engineer 5 - Model Runtime, AI Platform
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Security Practice Lead
OR · On-site +1
Familiarity with DPO workflows, privacy-by-design principles, and working with regulatory bodies What we offer: * Relocation to Bologna (Italy) or remote work. We are a hybrid company. * Italian and ...
Security Practice Lead
OR · On-site +1
Familiarity with DPO workflows, privacy-by-design principles, and working with regulatory bodies What we offer: * Relocation to Bologna (Italy) or remote work. We are a hybrid company. * Italian and ...
Software Engineer 5 - Model Runtime, AI Platform
OR · On-site +1
$466K - $750K/yr
... remote environments Preferred Qualifications Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO) Inference ...
Software Engineer 5 - Model Runtime, AI Platform
OR · On-site +1
$466K - $750K/yr
... remote environments Preferred Qualifications Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO) Inference ...
Director of Operations
Chicago, IL · Remote
$80K - $100K/yr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires travel up to 50% of the time. Must reside in one of the ...
Quick apply
Director of Operations
Chicago, IL · Remote
$80K - $100K/yr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires travel up to 50% of the time. Must reside in one of the ...
Director of Operations
Denver, CO · Remote
$80K - $100K/yr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires travel up to 50% of the time. Must reside in one of the ...
Quick apply
Director of Operations
Denver, CO · Remote
$80K - $100K/yr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007 and based in ... This is a remote position that requires travel up to 50% of the time. Must reside in one of the ...
Software Engineer 5 - Model Runtime, AI Platform
OR · On-site +1
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Software Engineer 5 - Model Runtime, AI Platform
OR · On-site +1
$466K - $750K/yr
... and remote environments Preferred Qualifications * Deep experience with distributed training at scale (FSDP, parallelism strategies, checkpointing) or LLM post-training (SFT, RLHF, DPO/GRPO)
Remote Dpo information
See salary details
$39K - $46.5K
5% of jobs
$46.5K - $54K
5% of jobs
$61.5K is the 25th percentile. Wages below this are outliers.
$54K - $61.5K
15% of jobs
$61.5K - $69K
11% of jobs
The median wage is $73.6K / yr.
$69K - $76.5K
22% of jobs
$76.5K - $84K
10% of jobs
$87.8K is the 75th percentile. Wages above this are outliers.
$84K - $91.5K
13% of jobs
$91.5K - $99K
6% of jobs
$99K - $106.5K
5% of jobs
$106.5K - $114K
4% of jobs
$114K - $121.5K
3% of jobs
$39K
$77.4K
$121.5K
How much do remote dpo jobs pay per year?
What is the difference between Remote Dpo vs Data Privacy Analyst?
| Aspect | Remote Dpo | Data Privacy Analyst |
|---|---|---|
| Required Credentials | GDPR certification, legal or privacy background | Data protection certifications, analytical skills |
| Work Environment | Remote, compliance-focused | Remote or on-site, data analysis and reporting |
| Industry Usage | Legal, healthcare, finance, tech | Tech, finance, healthcare, consulting |
The Remote Dpo primarily oversees data protection compliance and legal requirements, often requiring legal or privacy certifications. In contrast, a Data Privacy Analyst focuses on analyzing data practices, ensuring privacy policies are followed, and may not need legal credentials. Both roles can be remote and are vital in industries handling sensitive data, but the Dpo has a broader compliance and legal oversight scope.
What are the key skills and qualifications needed to thrive as a Remote Data Protection Officer (DPO), and why are they important?
What is a Remote DPO?
How does a Remote Data Protection Officer (DPO) typically collaborate with cross-functional teams while ensuring data privacy compliance?
Full-time
Posted 11 days ago
Job description
Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem-comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets-to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.
Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.
About Job
Role: Applied Reinforcement Learning Engineer
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
About the Team
Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. Our mission is to transform data, signals, and human insight into next-generation intelligent systems that redefine enterprise intelligence.
We're building a governed RL environment platform that enables enterprises to safely iterate and improve AI agent workflows through simulation-based learning, bridging human-labeled signal creation with automated RL training for high-stakes operations.
Role Overview
As an Applied RL Engineer, you will design and build RL environments that simulate complex enterprise workflows and train intelligent agents within them. You'll work at the intersection of RL research and production systems, translating customer requirements into bespoke simulation environments and post-training pipelines that deliver measurable improvements to AI agent performance.
This role requires deep expertise in both classical RL methodologies and modern LLM-based agent architectures. You'll shape our product direction and help make RL accessible to enterprise customers who need safe, compliant ways to improve their AI systems.
Core RL Competencies
Foundational RL
• MDPs & value methods: State/action spaces, Q-learning, DQN, Double DQN, Dueling DQN
• Policy gradient methods: REINFORCE, Actor-Critic, A2C/A3C, variance reduction
• Advanced optimization: PPO, TRPO, SAC, trust regions, entropy regularization
• TD learning: TD(0), TD(λ), eligibility traces, bootstrapping methods
LLM Alignment & Post-Training
• RLHF pipelines: Reward model training, preference learning, human feedback integration
• Direct optimization: DPO, IPO, KTO, offline preference optimization
• Group-based methods: GRPO, RLOO, sample-efficient policy improvement
• Reward modeling: Bradley-Terry models, reward hacking mitigation, KL constraints
Environment Design
• Gymnasium/OpenAI Gym: Custom environments, observation/action spaces, wrapper patterns
• Reward engineering: Sparse vs. dense rewards, potential-based shaping, intrinsic motivation
• Verifier design: Programmatic reward functions, outcome verification, ground-truth evaluation
• Simulation: Sim-to-real transfer, domain randomization, multi-agent dynamics
Advanced Techniques
• Offline RL: CQL, BCQ, IQL for learning from fixed datasets without environment interaction
• Model-based RL: World models, Dreamer, MuZero, learned dynamics
• Hierarchical RL: Options framework, goal-conditioned policies, temporal abstraction
• Imitation & exploration: Behavioral cloning, GAIL, curiosity-driven exploration, UCB
Key Responsibilities
• Design and build custom RL environments (digital twins) simulating enterprise workflows: document processing, compliance, onboarding, support automation
• Post-train LLM-based agents on domain-specific tasks using PPO, GRPO, DPO, and RLHF
• Build end-to-end pipelines converting human-labeled traces into RL training data
• Architect multi-step reasoning agents with tool-calling and closed learning loops
• Design reward functions, verifiers, and validation frameworks for pre-deployment testing
• Translate cutting-edge RL research into production systems; contribute to publications
Required Qualifications
• Deep RL expertise: 3+ years hands-on experience with environment design, reward engineering, policy optimization
• LLM post-training: Experience fine-tuning LLMs using RLHF, DPO, PPO, or similar
• Production skills: Software engineering beyond research with scalable pipelines and training infrastructure
• Agentic AI: Experience with LLM-based agents, tool use, multi-step reasoning
• Technical stack: Strong Python; Gymnasium, RLlib, Stable Baselines; PyTorch/JAX/TensorFlow
• Education: MS/PhD in CS, ML, or related field (or equivalent experience)
Preferred Qualifications
• Publications at NeurIPS, ICML, ICLR, ACL, or similar venues
• Enterprise workflow experience in healthcare, finance, logistics, or compliance
• Open-source contributions to CleanRL, TRL, veRL, or agent frameworks
• Experience with world models, synthetic data generation, and simulation
• Distributed training and large-scale RL experimentation
Why Join Centific
• Lead the frontier: Shape a new discipline at the intersection of RL, simulation, and enterprise AI
• Ship your science: See your research power real systems across healthcare, finance, and safety
• Collaborate with leaders: Work alongside NVIDIA, Microsoft, and the global AI community
• Build what matters: Create governed, compliant AI systems enterprises can trust.
Salary: $150K - $300K Annually
Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.
About Centific
Sourced by ZipRecruiter
Industry
It services
Company size
5,001 - 10,000 Employees
Headquarters location
Redmond, WA, US
Year founded
2020