... DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement. • Design the ... research, internships, or open-source -- with frameworks such as verl, ms-swift, OpenRLHF, or ...
... DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement. • Design the ... research, internships, or open-source -- with frameworks such as verl, ms-swift, OpenRLHF, or ...
PhD Internships at TikTok aim to provide students with the opportunity to actively contribute to ... DPO). - Excellent communication and teamwork skills, capable of thriving in a fast-paced work ...
PhD Internships at TikTok aim to provide students with the opportunity to actively contribute to ... DPO). - Excellent communication and teamwork skills, capable of thriving in a fast-paced work ...
Member of Technical Staff - RL Research (New PhD Grad)
Seattle, WA · On-site
$250K - $350K/yr
Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF ... Exposure to RL/post-training pipelines through research, internships, or open-source - with ...
Member of Technical Staff - RL Research (New PhD Grad)
Seattle, WA · On-site
$250K - $350K/yr
Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF ... Exposure to RL/post-training pipelines through research, internships, or open-source - with ...
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
Trackman Baseball Data Operations Employee
Stamford, CT · On-site
$17 - $18/hr
The internship starts in May and finishes at the conclusion of the major league baseball season ... For additional information about the position, please contact Dan Poeltl - DPO@trackman.com - 203 ...
Trackman Baseball Data Operations Employee
Stamford, CT · On-site
$17 - $18/hr
The internship starts in May and finishes at the conclusion of the major league baseball season ... For additional information about the position, please contact Dan Poeltl - DPO@trackman.com - 203 ...
LLM Post-training Engineer Intern (Research & Product) - 2026 Summer (BS/MS)
San Jose, CA · On-site
$45/hr
Internships at Our Company aims to provide students with hands-on experience in developing ... SFT/DPO/PPO), and model alignment. - Assist in building robust evaluation pipelines to measure ...
LLM Post-training Engineer Intern (Research & Product) - 2026 Summer (BS/MS)
San Jose, CA · On-site
$45/hr
Internships at Our Company aims to provide students with hands-on experience in developing ... SFT/DPO/PPO), and model alignment. - Assist in building robust evaluation pipelines to measure ...
Member of Technical Staff (intern)
New York, NY · On-site
$18.25 - $23.75/hr
About the role This is an open internship role within our Technical Staff. If any of the below ... Develop and execute an experiment analyzing nuances between DPO and PPO in a fair and systematic ...
Member of Technical Staff (intern)
New York, NY · On-site
$18.25 - $23.75/hr
About the role This is an open internship role within our Technical Staff. If any of the below ... Develop and execute an experiment analyzing nuances between DPO and PPO in a fair and systematic ...
Agentic AI Engineer
Cary, IL · On-site
Hands-on with LoRA / PEFT, instruction tuning, preference optimization (DPO/GRPO), and rigorous ... Practical hands-on experience (coursework, internships, OSS, or serious side projects) with at ...
Agentic AI Engineer
Cary, IL · On-site
Hands-on with LoRA / PEFT, instruction tuning, preference optimization (DPO/GRPO), and rigorous ... Practical hands-on experience (coursework, internships, OSS, or serious side projects) with at ...
Agentic AI Engineer
Cary, NC · On-site
Hands-on with LoRA / PEFT, instruction tuning, preference optimization (DPO/GRPO), and rigorous ... Practical hands-on experience (coursework, internships, OSS, or serious side projects) with at ...
Agentic AI Engineer
Cary, NC · On-site
Hands-on with LoRA / PEFT, instruction tuning, preference optimization (DPO/GRPO), and rigorous ... Practical hands-on experience (coursework, internships, OSS, or serious side projects) with at ...
Research Scientist - Driven Agent Self-Evolution - Global Frontier Tech Recruitment Program - 202...
San Jose, CA · On-site
... RLHF, DPO, GRPO, self-play). • Strong programming skills in Python and proficiency with ML ... scale. • Internship experience at technology companies or research organizations. Company
Research Scientist - Driven Agent Self-Evolution - Global Frontier Tech Recruitment Program - 202...
San Jose, CA · On-site
... RLHF, DPO, GRPO, self-play). • Strong programming skills in Python and proficiency with ML ... scale. • Internship experience at technology companies or research organizations. Company
Data Engineering Intern - AI & Analytics (Fall 2026)
Palo Alto, CA · On-site
$19.75 - $25.50/hr
We are looking for talented interns to join various teams within our Data, AI & Analytics ... DPO). * Curate, preprocess, and manage multimodal datasets, including audio, image, video, text ...
Data Engineering Intern - AI & Analytics (Fall 2026)
Palo Alto, CA · On-site
$19.75 - $25.50/hr
We are looking for talented interns to join various teams within our Data, AI & Analytics ... DPO). * Curate, preprocess, and manage multimodal datasets, including audio, image, video, text ...
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking a ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work.
Quick apply
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking a ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work.
Jr. Graphic Designer
$19.50 - $26/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Jr. Graphic Designer
$19.50 - $26/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
Quick apply
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking a ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is seeking a ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
Research Intern, Agent RL Training
Mountain View, CA · On-site
$35 - $50/hr
... during your internship What We're Looking For Requirements * Highly motivated and committed ... DPO, PPO, GRPO, etc.) * Excellent taste in model behavior: able to reason about what "good" looks ...
OR · On-site
... RLHF/DPO/RL), reward modeling, multi-agent or interactive simulation, behavioral or cognitive ... Our internship hourly rates are a standard pay based on the position, your location, year in school ...
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Jr. Graphic Designer
Carrollton, TX · On-site
$19.50 - $26/hr
Overview MB2 Dental, a first-of-its-kind Dental Partnership Organization (DPO) founded in 2007, is ... Qualifications: * 1-2 years of graphic design experience, including internships or freelance work
Higharc is seeking Research Interns (PhD) to join our Special Projects team. You'll work at the ... RLHF, DPO, or other preference optimization methods * Experience with multi-GPU training and large ...
Higharc is seeking Research Interns (PhD) to join our Special Projects team. You'll work at the ... RLHF, DPO, or other preference optimization methods * Experience with multi-GPU training and large ...
Agentic AI Research Intern
Santa Clara, CA · On-site
$40 - $50/hr
The internship will focus on building intelligent agents, generating high-quality trajectories ... Familiarity with training or adapting LLMs using SFT, RL, DPO/RLHF methods, or trajectory data.
Agentic AI Research Intern
Santa Clara, CA · On-site
$40 - $50/hr
The internship will focus on building intelligent agents, generating high-quality trajectories ... Familiarity with training or adapting LLMs using SFT, RL, DPO/RLHF methods, or trajectory data.
Internship Dpo information
See salary details
$5.53 - $7.30
2% of jobs
$7.30 - $9.07
0% of jobs
$9.07 - $10.84
3% of jobs
$10.84 - $12.61
7% of jobs
$12.61 - $14.38
10% of jobs
$14.52 is the 25th percentile. Wages below this are outliers.
$14.38 - $16.15
33% of jobs
$16.15 - $17.92
19% of jobs
$17.99 is the 75th percentile. Wages above this are outliers.
$17.92 - $19.69
13% of jobs
$19.69 - $21.46
9% of jobs
$21.46 - $23.23
3% of jobs
$23.23 - $25
1% of jobs
$5
$16
$25
How much do internship dpo jobs pay per hour?
How much do CPS interns make?
What are Internship DPO positions?
What jobs pay $2000 a day?
What are the key skills and qualifications needed to thrive as an Internship DPO (Data Protection Officer), and why are they important?
What are the big 4 internships?
Is a 3.4 GPA good for internships?
What types of projects and responsibilities can I expect during an Internship as a Data Protection Officer (DPO)?
What is the difference between Internship Dpo vs Data Privacy Analyst?
| Aspect | Internship Dpo | Data Privacy Analyst |
|---|---|---|
| Required Credentials | Typically pursuing or recent graduate, no formal certification required | Relevant certifications like CIPP, CIPM often preferred |
| Work Environment | Entry-level, learning-focused, often in a corporate or consultancy setting | Full-time, professional role with independent responsibilities |
| Employer & Industry Usage | Internships offered by companies, law firms, or consultancies in various industries | Established role in organizations handling data privacy compliance |
The main difference is that an Internship Dpo is an entry-level, learning position aimed at gaining experience, while a Data Privacy Analyst is a full-time professional role with more responsibilities and required expertise. Internships serve as a stepping stone toward becoming a Data Privacy Analyst.

Full-time
Posted 5 days ago
Job description
Nuance Labs is a pioneering company focused on building photorealistic, real-time AI avatars with emotional intelligence. They are seeking a deeply technical Member of Technical Staff to lead reinforcement learning and post-training for large-scale omni models, requiring a PhD graduate who can develop and scale their RL/post-training stack.
Responsibilities:
• Build Nuance’s RL/post-training stack from 0→1: rollout generation, policy optimization, reward/reference model serving, data feedback loops, evaluation, checkpointing, observability, and debugging.
• Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement.
• Design the systems abstractions that connect research ideas to production-scale RL runs: trainers, rollout workers, reward models, evaluators, data queues, experience buffers, and checkpoint promotion.
• Build evaluation and feedback loops for omni behavior: turn-taking, interruption, timing, emotional response, audiovisual coherence, instruction following, and real-time interaction quality.
• Optimize the end-to-end post-training loop across rollout throughput, serving latency, GPU utilization, policy update efficiency, queueing, checkpoint overhead, and research iteration speed.
• Evolve the platform as algorithms, model architectures, reward definitions, data sources, and evaluation methods change.
Qualifications:
Required:
• A PhD — completed, or in its final stretch — in ML, RL, or a related field, with research depth shown through publications, a strong lab/advisor, or substantial open-source work.
• Solid understanding of RL/post-training methods: policy optimization, reward modeling, preference optimization, rejection sampling, KL control, evaluation, and data feedback loops.
• Ability to reason about model behavior and training dynamics: reward hacking, unstable rewards, distribution shift, stale policies, mode collapse, over-optimization, noisy preferences, and evaluation mismatch.
• Exposure to RL/post-training pipelines through research, internships, or open-source — with frameworks such as verl, ms-swift, OpenRLHF, or equivalent, and familiarity with rollout serving systems such as vLLM. You don’t need to have run these at production scale yet; you need to learn fast and go deep.
• Strong software engineering fundamentals and the appetite to build real systems, not just prototypes.
• Curiosity and adaptability toward new RL algorithms, model architectures, serving systems, evaluation methods, and research ideas.
Preferred:
• Hands-on experience with omni or multimodal post-training for audio-video-language models, especially long-context or real-time interactive systems.
• Experience with PPO, GRPO, DPO, online RL, RLHF/RLAIF, reward modeling, preference data, synthetic data generation, or model-based data improvement.
• Prior 0→1 experience building post-training systems, RL pipelines, agent training systems, evaluation platforms, or model improvement loops.
• Experience with adjacent areas such as distributed pretraining, data infrastructure, inference serving, simulation, human/AI feedback collection, or evaluation infrastructure.
• Publications or substantial open-source contributions in RL, post-training, alignment, evaluation, ML systems, or model behavior.
Company:
Nuance Labs an AI research company is developing the first human foundation model that understands and displays emotion in real time. Founded in 2024, the company is headquartered in Seattle, USA, with a team of 2-10 employees. The company is currently Early Stage.