Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Go - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Go - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Java)
Miami, FL · On-site +1
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Java)
Miami, FL · On-site +1
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Rust - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Rust - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Responsibilities : • Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities. • ...
Software Engineer, AI (TypeScript)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (TypeScript)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Responsibilities : • You will work on the most critical post-training and reinforcement learning challenges at any given time -- including reward modeling, preference optimization (RLHF/DPO), and ...
Responsibilities : • You will work on the most critical post-training and reinforcement learning challenges at any given time -- including reward modeling, preference optimization (RLHF/DPO), and ...
Software Engineer, AI (Go)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Go)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Python)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Python)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Rust)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Rust)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Java)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Software Engineer, AI (Java)
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc. * Hands-on experience training reward models and finetuning LLM/VLM/VLA * Knowledge of distributed RL training at scale * Proficiency with ...
Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc. * Hands-on experience training reward models and finetuning LLM/VLM/VLA * Knowledge of distributed RL training at scale * Proficiency with ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc. * Hands-on experience training reward models and finetuning LLM/VLM/VLA * Knowledge of distributed RL training at scale * Proficiency with ...
Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc. * Hands-on experience training reward models and finetuning LLM/VLM/VLA * Knowledge of distributed RL training at scale * Proficiency with ...
Required : • Hands-on experience with data generation and evaluation for LLM post-training • Experience training or fine-tuning models using SFT, instruction tuning, RLHF, DPO, or similar ...
Required : • Hands-on experience with data generation and evaluation for LLM post-training • Experience training or fine-tuning models using SFT, instruction tuning, RLHF, DPO, or similar ...
C++ - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
C++ - Software Engineer, AI
Miami, FL · On-site
$30 - $70/hr
RLHF in one line: Generate code → expert engineers rank, edit, and justify → convert that feedback into reward signals → reinforcement learning tunes the model toward code you'd actually ship.
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Preferred : • Demonstrated expertise in post-training methods and/or next generation use cases for large language models including instruction tuning, RLHF, tool use, reasoning, agents, and ...
Rlhf information
What are some common challenges faced by professionals working in Reinforcement Learning from Human Feedback (RLHF) roles?
What are RLHF jobs?
What are the key skills and qualifications needed to thrive as a Reinforcement Learning from Human Feedback (RLHF) Engineer, and why are they important?
What is the difference between Rlhf vs Rn?
| Aspect | Rlhf | Rn |
|---|---|---|
| Required Credentials | Licensed healthcare professional, often with specialized training in mental health or behavioral health | Licensed practical nurse or registered nurse, with nursing licensure and possibly additional certifications |
| Work Environment | Behavioral health facilities, clinics, hospitals, or community health settings | Hospitals, clinics, long-term care facilities, and community health settings |
| Employer & Industry Usage | Behavioral health and mental health services | General healthcare and nursing services |
| Common Search & Comparison | Rlhf vs Rn | Rlhf vs Rn |
While Rlhf (Registered Licensed Mental Health Facilitator) focuses on mental health support and behavioral health interventions, Rn (Registered Nurse) provides broader nursing care across various medical settings. Both roles require licensure, but Rlhf specializes in mental health, whereas Rn covers general patient care.
What is an RLHF job?
An RLHF (Reinforcement Learning with Human Feedback) job involves training AI models using human feedback to improve their responses. Professionals in this role analyze model outputs, provide evaluations, and refine AI behavior through reinforcement learning techniques. These roles are common in AI research, content moderation, and chatbot development.
Full-time
Posted 20 days ago
Job description
Scale AI is a company focused on developing reliable AI systems for significant decisions. They are looking for a Machine Learning Research Scientist to research and develop novel post-training techniques to enhance large language model capabilities, collaborating with researchers and engineers to optimize AI development practices.
Responsibilities:
• Research and develop novel post-training techniques, including SFT, RLHF, and reward modeling, to enhance LLM core capabilities in both text and multimodal modalities.
• Design and experiment new approaches to preference optimization.
• Analyze model behavior, identify weaknesses, and propose solutions for bias mitigation and model robustness.
• Publish research findings in top-tier AI conferences.
Qualifications:
Required:
• Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field.
• Deep understanding of deep learning, reinforcement learning, and large-scale model fine-tuning.
• Experience with post-training techniques such as RLHF, preference modeling, or instruction tuning.
• Excellent written and verbal communication skills
• Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
Preferred:
• Previous experience in a customer facing role.
Company:
Scale’s mission is to develop reliable AI systems for the world’s most important decisions. Founded in 2016, the company is headquartered in San Francisco, USA, with a team of 501-1000 employees. The company is currently Late Stage.
About Scale AI
Sourced by ZipRecruiter
Industry
Software development
Company size
201 - 500 Employees
Headquarters location
San Francisco, CA, US
Year founded
2016