Online Rlhf Jobs (NOW HIRING)

Member of Technical Staff -- RL Research (Experienced)

Seattle, WA · On-site

... RLHF/RLAIF, online RL, and model-based data improvement. • Design the systems abstractions that connect research ideas to production-scale RL runs: trainers, rollout workers, reward models ...

Member of Technical Staff -- RL Research (Experienced)

Seattle, WA · On-site

Member of Technical Staff - RL Research (Experienced)

Seattle, WA · On-site

$300K - $500K/yr

Develop and scale post-training methods such as PPO, GRPO, DPO, rejection sampling, RLHF/RLAIF, online RL, and model-based data improvement. * Design the systems abstractions that connect research ...

Member of Technical Staff - RL Research (Experienced)

Seattle, WA · On-site

$300K - $500K/yr

Staff Software Engineer, AI/ML

Seattle, WA · Hybrid

Experience with agent evaluation, offline/online experiments, and human feedback loops in production. * Direct experience with RLHF, RLAIF, DPO, PPO, GRPO, or related optimization techniques. * Prior ...

Staff Software Engineer, AI/ML

Seattle, WA · Hybrid

Member of Technical Staff -- RL Research (New PhD Grad)

Seattle, WA · On-site

Member of Technical Staff -- RL Research (New PhD Grad)

Seattle, WA · On-site

Sciforium

LLM Training Engineer

San Francisco, CA · On-site

$155K - $220K/yr

Design offline + online environments that support RL-style training at scale * Instrument ... Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops) * Building RL ...

Sciforium

LLM Training Engineer

San Francisco, CA · On-site

$155K - $220K/yr

Design offline + online environments that support RL-style training at scale * Instrument ... Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops) * Building RL ...

Member of Technical Staff - RL Research (New PhD Grad)

Seattle, WA · On-site

$250K - $350K/yr

Member of Technical Staff - RL Research (New PhD Grad)

Seattle, WA · On-site

$250K - $350K/yr

Staff Software Engineer, AI/ML

Seattle, WA · On-site

Staff Software Engineer, AI/ML

Seattle, WA · On-site

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site

... online RL for large multimodal policies • Close the sim-2-real gap through tactile sensing ... models with RLHF, DPO, GRPO, or related methods • Familiarity with tactile sensing, multi ...

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site

Applied Scientist II, Alexa International Team

Bellevue, WA · On-site

Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...

Applied Scientist II, Alexa International Team

Bellevue, WA · On-site

Applied Scientist II, Alexa International Team

Applied Scientist II, Alexa International Team

Applied Scientist II, Alexa International Team

Applied Scientist II, Alexa International Team

Applied Scientist II, Alexa International Team

Bellevue, WA · On-site

Applied Scientist II, Alexa International Team

Bellevue, WA · On-site

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX · On-site

... from Retail, Online, and Resellers. These solutions are based on cutting edge enterprise ... RLHF, PPO, GRPO) Demonstrated ability to quickly master emerging AI tools and integrate them into ...

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX · On-site

Member of Technical Staff - RL Research (Experienced)

Seattle, WA · On-site

Member of Technical Staff - RL Research (Experienced)

Seattle, WA · On-site

Zillow

Principal Applied Scientist, Agentic AI

$181K - $290K/yr

... RLHF, RLAIF, or DPO for multi-objective optimization. * Develop reward models and objective ... online and batch adaptation loops with strong guardrails. * Translate conversational logs ...

Zillow

Principal Applied Scientist, Agentic AI

$181K - $290K/yr

Bostondynamics

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site +1

$175K - $220K/yr

Research reward modeling, and offline-to-online RL for large multimodal policies * Close the sim-2 ... Experience fine-tuning foundation models with RLHF, DPO, GRPO, or related methods * Familiarity ...

Bostondynamics

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site +1

$175K - $220K/yr

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX

AI/ML Engineer (GenAI), G&A Solutions Engineering (GSE)

Austin, TX

Research Scientist, RL for Dexterous Manipulation, Atlas

Waltham, MA · On-site

$175K - $220K/yr