Machine Learning Engineer
Atlanta, GA · On-site
... RLHF, PPO, DPO, or reward model training -- and understanding of how training data quality affects model behavior Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or ...
Atlanta, GA · On-site
... RLHF, PPO, DPO, or reward model training -- and understanding of how training data quality affects model behavior Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or ...
Atlanta, GA · On-site
... RLHF, PPO, DPO, or reward model training -- and understanding of how training data quality affects model behavior Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or ...
Kennesaw, GA · On-site
... RLHF, PPO, DPO, or reward model training -- and understanding of how training data quality affects model behavior Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or ...
Kennesaw, GA · On-site
... RLHF, PPO, DPO, or reward model training -- and understanding of how training data quality affects model behavior Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or ...
Atlanta, GA · On-site +1
$80/hr
Refine RLHF Frameworks: provide the high-quality human feedback necessary to align models with human intent, safety, and helpfulness. * Analyze Model Reasoning: critically assess how an AI model ...
Quick apply
Atlanta, GA · On-site +1
$80/hr
Refine RLHF Frameworks: provide the high-quality human feedback necessary to align models with human intent, safety, and helpfulness. * Analyze Model Reasoning: critically assess how an AI model ...
Atlanta, GA · On-site
Advance Workday's proprietary capabilities in pre-training, post-training (RLHF, DPO), and domain-specific alignment for HR and Finance workflows. * Publish & Open Source: Lead Workday's contribution ...
Atlanta, GA · On-site
Advance Workday's proprietary capabilities in pre-training, post-training (RLHF, DPO), and domain-specific alignment for HR and Finance workflows. * Publish & Open Source: Lead Workday's contribution ...
Atlanta, GA · On-site
... RLHF to improve model accuracy, robustness, and business relevance. • Develop and deploy AI agents and agentic workflows using frameworks such as LangChain, LangGraph, AgentSpace to automate multi ...
Atlanta, GA · On-site
... RLHF to improve model accuracy, robustness, and business relevance. • Develop and deploy AI agents and agentic workflows using frameworks such as LangChain, LangGraph, AgentSpace to automate multi ...
Atlanta, GA · On-site
$140K - $160K/yr
Apply advanced techniques including prompt engineering, Retrieval-Augmented Generation (RAG), model fine-tuning, and RLHF to improve model accuracy, robustness, and business relevance. * Develop and ...
Atlanta, GA · On-site
$140K - $160K/yr
Apply advanced techniques including prompt engineering, Retrieval-Augmented Generation (RAG), model fine-tuning, and RLHF to improve model accuracy, robustness, and business relevance. * Develop and ...
... RLHF, RAG and Knowledge graph etc. • Experience in designing and implementing Model Context Protocol (MCP) servers to enable seamless integration between AI agents, enterprise systems, and external ...
... RLHF, RAG and Knowledge graph etc. • Experience in designing and implementing Model Context Protocol (MCP) servers to enable seamless integration between AI agents, enterprise systems, and external ...
Specialized expertise in other topics like fine-tuning, RLHF, RAG and Knowledge graph etc. * Experience in designing and implementing Model Context Protocol (MCP) servers to enable seamless ...
Specialized expertise in other topics like fine-tuning, RLHF, RAG and Knowledge graph etc. * Experience in designing and implementing Model Context Protocol (MCP) servers to enable seamless ...
Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
| Aspect | Rlhf | Rn |
|---|---|---|
| Required Credentials | Licensed healthcare professional, often with specialized training in mental health or behavioral health | Licensed practical nurse or registered nurse, with nursing licensure and possibly additional certifications |
| Work Environment | Behavioral health facilities, clinics, hospitals, or community health settings | Hospitals, clinics, long-term care facilities, and community health settings |
| Employer & Industry Usage | Behavioral health and mental health services | General healthcare and nursing services |
| Common Search & Comparison | Rlhf vs Rn | Rlhf vs Rn |
While Rlhf (Registered Licensed Mental Health Facilitator) focuses on mental health support and behavioral health interventions, Rn (Registered Nurse) provides broader nursing care across various medical settings. Both roles require licensure, but Rlhf specializes in mental health, whereas Rn covers general patient care.
An RLHF (Reinforcement Learning with Human Feedback) job involves training AI models using human feedback to improve their responses. Professionals in this role analyze model outputs, provide evaluations, and refine AI behavior through reinforcement learning techniques. These roles are common in AI research, content moderation, and chatbot development.
Full-time
This job post has expired today. Applications are no longer accepted.
About Us
We are AI researchers and builders who understand how to curate data and RL environments that truly improve models. We curated OpenThoughts, one of the best open reasoning datasets, and have trained SOTA models such as Bespoke-MiniCheck and Bespoke-MiniChart.
We are embarked on a journey to build Environments that are entire digital worlds that can be used to push the frontier of agents.
What You'll Be Working On
You will work directly with our research team on RL environment and task creation for agent training. This means designing observation spaces, action spaces, reward signals, and success criteria for new environments — and building the infrastructure that makes world-scale RL training possible. This is a high-ownership role; you will be building novel systems, not maintaining legacy ones.
Must-Have Skills
3+ years of ML engineering experience — model training, fine-tuning, or post-training pipelines in research or production
Strong Python and deep learning proficiency (PyTorch preferred; familiar with training loops, optimizers, mixed precision)
Hands-on experience with LLM post-training — SFT, RLHF, PPO, DPO, or reward model training — and understanding of how training data quality affects model behavior
Familiarity with RL frameworks (Gymnasium, dm_env) and the ability to design or modify reward functions for agent training objectives
Experience running experiments at scale on cloud or HPC (AWS, GCP, SLURM, or Ray)
Solid understanding of evaluation methodology — held-out sets, benchmark design, avoiding train/eval contamination