Online Rlhf Jobs (NOW HIRING)

Research Scientist, RL for Dexterous Manipulation, Atlas

$175K - $220K/yr

Research reward modeling, and offline-to-online RL for large multimodal policies * Close the sim-2 ... Experience fine-tuning foundation models with RLHF, DPO, GRPO, or related methods * Familiarity ...

Bostondynamics

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site +1

$175K - $220K/yr

SAP

AI/ML Applied Data Scientist - Generative AI

Newport Beach, CA · On-site

Build offline and online evaluation harnesses, task success, trajectory quality, tool-call accuracy ... Preferred : • Familiarity with continued pretraining, fine-tuning pipelines (SFT, DPO, RLHF), or ...

SAP

AI/ML Applied Data Scientist - Generative AI

Newport Beach, CA · On-site

Apple

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX

... from Retail, Online, and Resellers. These solutions are based on cutting edge enterprise ... RLHF, PPO, GRPO) Demonstrated ability to quickly master emerging AI tools and integrate them into ...

Apple

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX

Boston Dynamics

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site

$175K - $220K/yr

Boston Dynamics

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site

$175K - $220K/yr

Epsilon Health

Research Engineer - ML Infrastructure

San Francisco, CA · On-site

$126K - $166K/yr

Experience with reinforcement learning training pipelines (e.g., RLHF, reward modeling, or online learning systems) * Support A/B testing and experimentation workflows for model rollouts, including ...

Epsilon Health

Research Engineer - ML Infrastructure

San Francisco, CA · On-site

$126K - $166K/yr

Amazon

Applied Scientist, Core Search

Seattle, WA · On-site

Apply Reinforcement Learning (RLVR, RLHF), Direct Preference Optimization (DPO), and customer ... well as online experiments. About the team Core Search builds the next-generation LLM-powered ...

Amazon

Applied Scientist, Core Search

Seattle, WA · On-site

ByteDance

Research Scientist - Driven Agent Self-Evolution - Global Frontier Tech Recruitment Program - 2027 S

San Jose, CA · On-site

... RLHF, DPO, GRPO, self-play). • Strong programming skills in Python and proficiency with ML ... online learning, bandit methods, or evolutionary strategies applied to agent improvement. • ...

ByteDance

Research Scientist - Driven Agent Self-Evolution - Global Frontier Tech Recruitment Program - 2027 S

San Jose, CA · On-site

Jobs for Humanity

Spanish Search Quality Rater - (USA)

Miami, FL · Remote

Prompt engineering, SFT, RLHF, red teaming and adversarial model training, model output ranking ... Enjoy researching topics online Project Details * Job Title: Search Quality Rater * Location: US ...

Jobs for Humanity

Spanish Search Quality Rater - (USA)

Miami, FL · Remote

Amazon

Applied Scientist, Core Search

Seattle, WA · On-site

Amazon

Applied Scientist, Core Search

Seattle, WA · On-site

DoorDash

Software Engineer, Machine Learning Infrastructure - Generative AI

San Francisco, CA · On-site

... offline/online eval pipelines, agent simulations, data pipelines, backend services, and ... RLHF/RLVR evaluation) - enabling the next generation of AI-powered products, agents, automation ...

DoorDash

Software Engineer, Machine Learning Infrastructure - Generative AI

San Francisco, CA · On-site

Hop

Senior AI Engineer

Los Angeles, CA

$112K - $154K/yr

End-to-End ML Lifecycle: Own requirements → data prep → feature engineering → classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) → offline/online evaluation → MLflow registry, with ...

Quick apply

Hop

Senior AI Engineer

Los Angeles, CA

$112K - $154K/yr

Apple

Senior Manager - AI Data Operations

Austin, TX · On-site

Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Preferred Qualifications Experience designing RLHF or structured human feedback programs.

Apple

Senior Manager - AI Data Operations

Austin, TX · On-site

pony.ai