... preferences (RLHF) and related paradigms--turning preference/judgment signals into repeatable ... online RL pipelines). • Depth in deep learning, sequence modeling, and generative models. • ...
... preferences (RLHF) and related paradigms--turning preference/judgment signals into repeatable ... online RL pipelines). • Depth in deep learning, sequence modeling, and generative models. • ...
Apply Reinforcement Learning (RLVR, RLHF), Direct Preference Optimization (DPO), and customer ... well as online experiments. About the team Core Search builds the next-generation LLM-powered ...
Apply Reinforcement Learning (RLVR, RLHF), Direct Preference Optimization (DPO), and customer ... well as online experiments. About the team Core Search builds the next-generation LLM-powered ...
Apply Reinforcement Learning (RLVR, RLHF), Direct Preference Optimization (DPO), and customer ... well as online experiments. About the team Core Search builds the next-generation LLM-powered ...
Apply Reinforcement Learning (RLVR, RLHF), Direct Preference Optimization (DPO), and customer ... well as online experiments. About the team Core Search builds the next-generation LLM-powered ...
Prompt engineering, SFT, RLHF, red teaming and adversarial model training, model output ranking ... Enjoy researching topics online Project Details * Job Title: Search Quality Rater * Location: US ...
Prompt engineering, SFT, RLHF, red teaming and adversarial model training, model output ranking ... Enjoy researching topics online Project Details * Job Title: Search Quality Rater * Location: US ...
Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Experience designing RLHF or structured human feedback programs.Background in large language model ...
Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Experience designing RLHF or structured human feedback programs.Background in large language model ...
... from Retail, Online, and Resellers. These solutions are based on cutting edge enterprise ... techniques (RLHF, PPO, GRPO) * Demonstrated ability to quickly master emerging AI tools and ...
... from Retail, Online, and Resellers. These solutions are based on cutting edge enterprise ... techniques (RLHF, PPO, GRPO) * Demonstrated ability to quickly master emerging AI tools and ...
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Machine Learning Engineer - Reinforcement Learning
Fremont, CA · On-site
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Quick apply
Machine Learning Engineer - Reinforcement Learning
Fremont, CA · On-site
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Preferred Qualifications Experience designing RLHF or structured human feedback programs.
Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Preferred Qualifications Experience designing RLHF or structured human feedback programs.
Machine Learning Engineer - Reinforcement Learning
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Machine Learning Engineer - Reinforcement Learning
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Senior AI Engineer
$112K - $154K/yr
Own requirements data prep feature engineering classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) offline/online evaluation MLflow registry, with automated drift and quality alerts. * Data & Storage ...
Senior AI Engineer
$112K - $154K/yr
Own requirements data prep feature engineering classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) offline/online evaluation MLflow registry, with automated drift and quality alerts. * Data & Storage ...
Design experiments, define success metrics, and run rigorous offline and online evaluations (A/B ... Familiarity with LLM fine-tuning techniques (LoRA, RLHF, instruction tuning) and serving ...
Design experiments, define success metrics, and run rigorous offline and online evaluations (A/B ... Familiarity with LLM fine-tuning techniques (LoRA, RLHF, instruction tuning) and serving ...
Machine Learning Engineer - Reinforcement Learning
Fremont, CA · On-site
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Machine Learning Engineer - Reinforcement Learning
Fremont, CA · On-site
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Senior AI Engineer
$112K - $154K/yr
End-to-End ML Lifecycle: Own requirements → data prep → feature engineering → classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) → offline/online evaluation → MLflow registry, with ...
Quick apply
Senior AI Engineer
$112K - $154K/yr
End-to-End ML Lifecycle: Own requirements → data prep → feature engineering → classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) → offline/online evaluation → MLflow registry, with ...
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Build novel online & offline evaluation metrics and methodologies for multimodal personal digital assistants. * Fine-tune/post-train LLMs using techniques like SFT, DPO, RLHF, and RLAIF. * Set up ...
Design experiments, define success metrics, and run rigorous offline and online evaluations (A/B ... Familiarity with LLM fine-tuning techniques (LoRA, RLHF, instruction tuning) and serving ...
Design experiments, define success metrics, and run rigorous offline and online evaluations (A/B ... Familiarity with LLM fine-tuning techniques (LoRA, RLHF, instruction tuning) and serving ...
Machine Learning Engineer - Reinforcement Learning
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Machine Learning Engineer - Reinforcement Learning
$150K - $250K/yr
Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and ... online RL pipelines). * Depth in deep learning, sequence modeling, and generative models.
Research Scientist
Charlottesville, VA · On-site
Experience with fine-tuning techniques (supervised fine-tuning, instruction tuning, RLHF, domain ... This position will not sponsor applicants requiring a visa How to Apply Please apply online through ...
Research Scientist
Charlottesville, VA · On-site
Experience with fine-tuning techniques (supervised fine-tuning, instruction tuning, RLHF, domain ... This position will not sponsor applicants requiring a visa How to Apply Please apply online through ...
Senior Data Scientist - Generative AI
San Mateo, CA · On-site +1
Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI ... Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data ...
Senior Data Scientist - Generative AI
San Mateo, CA · On-site +1
Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI ... Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data ...
Online Rlhf information
See salary details
$17.5K - $23.7K
12% of jobs
$28.2K is the 25th percentile. Wages below this are outliers.
$23.7K - $30K
18% of jobs
$30K - $36.2K
15% of jobs
The median wage is $37.1K / yr.
$36.2K - $42.4K
33% of jobs
$42.4K - $48.6K
12% of jobs
$48.6K - $54.9K
0% of jobs
$54.9K - $61.1K
0% of jobs
$61.1K - $67.3K
0% of jobs
$67.3K - $73.5K
0% of jobs
$73.5K - $79.8K
0% of jobs
$79.8K - $86K
9% of jobs
$17.5K
$40.6K
$86K
How much do online rlhf jobs pay per year?
What are some common challenges faced by Online RLHF (Reinforcement Learning from Human Feedback) specialists when collaborating with cross-functional teams?
What is the difference between Online Rlhf vs Online Rlhf?
| Aspect | Online Rlhf | Online Rlhf |
|---|---|---|
| Credentials | Typically requires certification in online health coaching or related fields | Typically requires certification in online health coaching or related fields |
| Work Environment | Remote, online platform-based | Remote, online platform-based |
| Industry Usage | Common in health and wellness sectors | Common in health and wellness sectors |
| Job Focus | Providing health guidance and support online | Providing health guidance and support online |
Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.
What are Online RLHF jobs?
What are the key skills and qualifications needed to thrive as an Online RLHF (Reinforcement Learning from Human Feedback) Specialist, and why are they important?
Full-time
Posted 10 days ago
Job description
Pony.ai is a global leader in autonomous mobility, recognized for its innovative technologies and services in the field. The role involves building scalable systems for training large generative models, implementing reinforcement learning methods, and shipping deep learning solutions to enhance self-driving behaviors.
Responsibilities:
• Build scalable systems for training and fine-tuning large generative models that produce realistic, informative driving behaviors for evaluation and scenario coverage.
• Implement and iterate on RL-style methods: algorithms, reward / preference objectives, and training setups suited to high-fidelity, insightful behaviors in simulation-aligned workflows (closed-loop evaluation mindset).
• Ship deep learning solutions (including LLM / VLM where appropriate) that improve human-led triaging, automate high-volume workflows, and support nuanced analysis of self-driving behavior to surface critical anomalies.
• Own production-oriented ML for fleet-scale assessment: training, optimization, monitoring, and iteration of models used to judge performance across large real-world exposure.
• Design and evolve data + evaluation systems inspired by RL from human preferences (RLHF) and related paradigms—turning preference/judgment signals into repeatable, scalable training and evaluation loops.
• Partner broadly with teams such as Prediction, Planning, Research, and platform/engineering leads to land cross-cutting improvements with clear metrics.
Qualifications:
Required:
• M.S. or Ph.D. in Computer Science, Machine Learning, AI, or a related field—or equivalent practical experience.
• Hands-on experience building and applying ML in production-grade settings, with a strong RL component (policy learning, preference/feedback optimization, or offline/online RL pipelines).
• Depth in deep learning, sequence modeling, and generative models.
• Demonstrated impact via strong publications or a clear history of shipping impactful ML systems end-to-end.
• Experience with large-scale distributed training and large-scale data processing.
• Ability to lead ambiguous technical work from problem framing through reliable delivery.
Preferred:
• Background in autonomous vehicles, robotics, or complex simulation environments.
• Strong grasp of modern RL and post-training techniques in LLM, dLLM, VLA and video generations.
• Hands-on integration of simulation platforms with ML training and evaluation workflows.
• Python fluency and frameworks such as PyTorch.
• Experience defining and operating metrics for complex, safety-critical AI systems.
• Technical leadership: influencing stakeholders, aligning teams, and raising the bar for evaluation rigor.
• Excellent communication—simple explanations of complex trade-offs.
Company:
Pony.ai develops autonomous driving technology for vehicles that operates using artificial intelligence and machine learning. Founded in 2016, the company is headquartered in Fremont, USA, with a team of 1001-5000 employees. The company is currently Late Stage.
About pony.ai
Sourced by ZipRecruiter
Industry
It services
Company size
51 - 200 Employees
Headquarters location
Fremont, CA, US
Year founded
2016