1

Online Rlhf Jobs (NOW HIRING)

... RLHF, RLAIF, or DPO for multi-objective optimization. * Develop reward models and objective ... online and batch adaptation loops with strong guardrails. * Translate conversational logs ...

Here on the Apple Store Online team, we are responsible for Apple's largest store. Our main goal is ... Experience designing RLHF or structured human feedback programs.Background in large language model ...

Senior AI Engineer

Los Angeles, CA · On-site

$112.60K - $154.60K/yr

Own requirements data prep feature engineering classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) offline/online evaluation MLflow registry, with automated drift and quality alerts. * Data & Storage ...

Design experiments, define success metrics, and run rigorous offline and online evaluations (A/B ... Familiarity with LLM fine-tuning techniques (LoRA, RLHF, instruction tuning) and serving ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Passaic, NJ · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Hoboken, NJ · Remote

$143K - $286K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Hoboken, NJ · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

next page

Showing results 1-20

Online Rlhf information

See salary details

$17.5K

$40.6K

$86K

How much do online rlhf jobs pay per year?

As of May 29, 2026, the average yearly pay for online rlhf in the United States is $40,596.00, according to ZipRecruiter salary data. Most workers in this role earn between $25,000.00 and $43,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an Online RLHF (Reinforcement Learning from Human Feedback) Specialist, and why are they important?

To thrive as an Online RLHF Specialist, you need a strong background in machine learning, reinforcement learning, and data analysis, typically supported by a degree in computer science or a related field. Familiarity with technical tools like Python, PyTorch or TensorFlow, and experience with human feedback systems or annotation platforms are highly valuable. Strong problem-solving, attention to detail, and the ability to communicate complex concepts clearly are crucial soft skills. These qualifications ensure the effective training and evaluation of AI models, leading to more accurate and reliable machine learning systems.

What are some common challenges faced by Online RLHF (Reinforcement Learning from Human Feedback) specialists when collaborating with cross-functional teams?

Online RLHF specialists often work closely with machine learning engineers, data annotators, and product managers. A common challenge is ensuring that feedback from human annotators is accurately integrated into model training, which requires clear communication and well-defined annotation guidelines. Additionally, balancing the pace of model updates with the need for high-quality human feedback can be demanding. Effective collaboration and regular syncs are essential to maintain alignment and achieve project goals.

What are Online RLHF jobs?

Online RLHF (Reinforcement Learning from Human Feedback) jobs typically involve helping to train AI models by providing human feedback on their outputs. Workers in these roles might review model responses, rate the quality of generated text, or suggest improvements to help the AI learn to produce better results. These jobs are often remote and can be done part-time or as contract work. They play a crucial role in improving the safety, usefulness, and accuracy of AI systems by aligning them more closely with human preferences.

What is the difference between Online Rlhf vs Online Rlhf?

AspectOnline RlhfOnline Rlhf
CredentialsTypically requires certification in online health coaching or related fieldsTypically requires certification in online health coaching or related fields
Work EnvironmentRemote, online platform-basedRemote, online platform-based
Industry UsageCommon in health and wellness sectorsCommon in health and wellness sectors
Job FocusProviding health guidance and support onlineProviding health guidance and support online

Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.

More about Online Rlhf jobs
What cities are hiring for Online Rlhf jobs? Cities with the most Online Rlhf job openings:
What are the most commonly searched types of Rlhf jobs? The most popular types of Rlhf jobs are:
What states have the most Online Rlhf jobs? States with the most job openings for Online Rlhf jobs include:
Infographic showing various Online Rlhf job openings in the United States as of May 2026, with employment types broken down into 86% Full Time, and 14% Part Time. Highlights an 50% In-person, and 50% Remote job distribution, with an average salary of $40,596 per year, or $19.5 per hour.
Principal Applied Scientist, Agentic AI

Principal Applied Scientist, Agentic AI

Zillow

Remote

$181.80K - $290.40K/yr

Full-time

Posted 24 days ago


Zillow rating

8.8

Company rating: 8.8 out of 10

Based on 21 frontline employees who took The Breakroom Quiz

10th of 152 rated real estate companies


Job description

About the team
Zillow is investing deeply in next-generation AI and machine learning to power intelligent experiences across our products, helping customers and partners make better decisions in a complex, real-world domain. Our team brings together Applied Scientists, ML engineers, and Software engineers who own the full lifecycle of large-scale systems that combine modern foundation models with applied ML-from data and modeling through evaluation and deployment. We collaborate closely with platform, product, and operations partners in a fast-moving, remote-first environment where experimentation, learning, and shipping are core to how we work.
About the role
As a Principal Applied Scientist focused on RL post-training, you will lead the design and deployment of learning systems that shape how our models behave in real products. You will own the technical direction and strategy for post-training and adaptation of large models to align behavior with user value, safety, and business objectives. This is a high-impact principal IC role with broad influence across Zillow, working closely with senior leadership to ensure our investments translate into safer, more capable, and more trusted AI-powered experiences.
You will get to:
  • Lead the technical direction and strategy for RL post-training of production models, partnering with other scientists, engineers, and product leaders to align models with customer and business needs.
  • Design and implement post-training pipelines that combine techniques such as supervised fine-tuning on curated demonstrations, preference modeling and pairwise ranking, and RL-based alignment approaches like RLHF, RLAIF, or DPO for multi-objective optimization.
  • Develop reward models and objective formulations that balance constraints such as helpfulness, safety, fairness, compliance, and customer satisfaction, and iterate on them using human and AI feedback at scale through online and batch adaptation loops with strong guardrails.
  • Translate conversational logs, behavioral signals, and structured attributes into training, reward, and evaluation signals for post-training and reinforcement learning, turning heterogeneous data into actionable supervision.
  • Partner with model and platform teams to improve the efficiency and robustness of training and evaluation, including off-policy evaluation, replay strategies, controlled rollouts, and metrics and evaluation frameworks such as win-rates versus baselines, safety and quality metrics, and expert-review programs.
  • Mentor applied scientists and engineers, raising the technical bar in RL, post-training, and evaluation, and contributing to the broader AI roadmap at Zillow through thought leadership and guidance.
  • When appropriate, represent Zillow's work externally through talks, publications, or open-source contributions.

This role has been categorized as a Remote position. "Remote" employees do not have a permanent corporate office workplace and, instead, work from a physical location of their choice, which must be identified to the Company. U.S. employees may live in any of the 50 United States, with limited exceptions.
In California, Connecticut, Maryland, Massachusetts, New Jersey, New York, Washington state, and Washington DC the standard base pay range for this role is $191,300.00 - $305,700.00 annually. This base pay range is specific to these locations and may not be applicable to other locations.In Colorado, Hawaii, Illinois, Minnesota, Nevada, Ohio, Rhode Island, and Vermont the standard base pay range for this role is $181,800.00 - $290,400.00 annually. The base pay range is specific to these locations and may not be applicable to other locations.
In addition to a competitive base salary this position is also eligible for equity awards based on factors such as experience, performance and location. Actual amounts will vary depending on experience, performance and location. Employees in this role will not be paid below the salary threshold for exempt employees in the state where they reside.
Who you are
  • You are an applied scientist who is excited to use reinforcement learning and post-training methods to shape how AI systems behave in complex, high-judgment settings, and you are comfortable owning ambiguous problems end-to-end-from framing the objective and data strategy to shipping models into production and measuring their impact.
  • You have a PhD or equivalent experience in Computer Science, Electrical Engineering, Statistics, or a related field, with emphasis in areas such as reinforcement learning, bandits, large language models, or applied machine learning.
  • You have strong, current expertise in post-training techniques (such as supervised fine-tuning, DPO, RLHF/RLAIF, preference modeling, and multi-objective optimization), in evaluation and monitoring of aligned models (including win-rate experiments, human and AI feedback loops, long-horizon evaluation, and safety or guardrail metrics), and in modern transformer-based models and tooling such as LLMs, multimodal models, vector search, and orchestration frameworks.
  • You have experience working with cross-functional partners (for example, engineering, product, design, operations, legal, and compliance) in domains where safety, trust, or regulation matter, such as marketplaces, finance, healthcare, or other high-stakes verticals.
  • You demonstrate technical leadership and mentorship, helping senior engineers and scientists grow, creating clarity amid ambiguity, and driving alignment across teams, and you communicate complex technical ideas clearly to both expert and non-expert audiences in writing and verbally.

Here at Zillow, we value the experience and perspective of candidates with non-traditional backgrounds. We encourage you to apply if you have transferable skills or related experiences.
Get to know us
At Zillow, we're reimagining how people move-through the real estate market and through their careers. As the most-visited real estate platform in the U.S., we help customers navigate buying, selling, financing and renting with greater ease and confidence. Whether you're working in tech, sales, operations, or design, you'll be part of a company that's reshaping an industry and helping more people make home a reality.
Zillow is honored to be recognized among the best workplaces in the country. Zillow was named one of FORTUNE 100 Best Companies to Work For® in 2025, and included on the PEOPLE Companies That Care® 2025 list, reflecting our commitment to creating an innovative, inclusive, and engaging culture where employees are empowered to grow.
No matter where you sit in the organization, your work will help drive innovation, support our customers, and move the industry-and your career-forward, together.
Zillow Group is an equal opportunity employer committed to fostering an inclusive, innovative environment with the best employees. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please contact your recruiter directly.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable state and local law.
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company's reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

What Zillow employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom