1

Online Rlhf Jobs (NOW HIRING)

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Senior AI Engineer

Los Angeles, CA

$112.60K - $154.60K/yr

End-to-End ML Lifecycle:  Own requirements → data prep → feature engineering → classical ML or LLM fine-tuning (LoRA, PEFT, RLHF) → offline/online evaluation → MLflow registry, with ...

Principal, Software Engineer

Yonkers, NY · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Bayonne, NJ · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Bronx, NY · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Kearny, NJ · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

Principal, Software Engineer

Queens, NY · Remote

$132K - $264K/yr

Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). Expertise in coding standards, multiple programming languages, and secure software ...

next page

Showing results 1-20

Online Rlhf information

See salary details

$17.5K

$40.6K

$86K

How much do online rlhf jobs pay per year?

As of May 29, 2026, the average yearly pay for online rlhf in the United States is $40,596.00, according to ZipRecruiter salary data. Most workers in this role earn between $25,000.00 and $43,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an Online RLHF (Reinforcement Learning from Human Feedback) Specialist, and why are they important?

To thrive as an Online RLHF Specialist, you need a strong background in machine learning, reinforcement learning, and data analysis, typically supported by a degree in computer science or a related field. Familiarity with technical tools like Python, PyTorch or TensorFlow, and experience with human feedback systems or annotation platforms are highly valuable. Strong problem-solving, attention to detail, and the ability to communicate complex concepts clearly are crucial soft skills. These qualifications ensure the effective training and evaluation of AI models, leading to more accurate and reliable machine learning systems.

What are some common challenges faced by Online RLHF (Reinforcement Learning from Human Feedback) specialists when collaborating with cross-functional teams?

Online RLHF specialists often work closely with machine learning engineers, data annotators, and product managers. A common challenge is ensuring that feedback from human annotators is accurately integrated into model training, which requires clear communication and well-defined annotation guidelines. Additionally, balancing the pace of model updates with the need for high-quality human feedback can be demanding. Effective collaboration and regular syncs are essential to maintain alignment and achieve project goals.

What are Online RLHF jobs?

Online RLHF (Reinforcement Learning from Human Feedback) jobs typically involve helping to train AI models by providing human feedback on their outputs. Workers in these roles might review model responses, rate the quality of generated text, or suggest improvements to help the AI learn to produce better results. These jobs are often remote and can be done part-time or as contract work. They play a crucial role in improving the safety, usefulness, and accuracy of AI systems by aligning them more closely with human preferences.

What is the difference between Online Rlhf vs Online Rlhf?

AspectOnline RlhfOnline Rlhf
CredentialsTypically requires certification in online health coaching or related fieldsTypically requires certification in online health coaching or related fields
Work EnvironmentRemote, online platform-basedRemote, online platform-based
Industry UsageCommon in health and wellness sectorsCommon in health and wellness sectors
Job FocusProviding health guidance and support onlineProviding health guidance and support online

Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.

More about Online Rlhf jobs
What cities are hiring for Online Rlhf jobs? Cities with the most Online Rlhf job openings:
What are the most commonly searched types of Rlhf jobs? The most popular types of Rlhf jobs are:
What states have the most Online Rlhf jobs? States with the most job openings for Online Rlhf jobs include:
Infographic showing various Online Rlhf job openings in the United States as of May 2026, with employment types broken down into 86% Full Time, and 14% Part Time. Highlights an 50% In-person, and 50% Remote job distribution, with an average salary of $40,596 per year, or $19.5 per hour.
Principal, Software Engineer

Principal, Software Engineer

Walmart

Manhattan, NY • Remote

$132K - $264K/yr

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 26 days ago


Walmart rating

6.0

Company rating: 6.0 out of 10

Based on 21,562 frontline employees who took The Breakroom Quiz

22nd of 39 rated national retailers


Job description

Position Summary...What you'll do...Role summary
As a Principal Software Engineer (ML), you will lead the design and development of production-grade AI systems that transform large-scale data into intelligent, deployable solutions. You will operate at the intersection of machine learning, Python engineering, and scalable infrastructure, building E2E pipelines that power real-world applications.
Your mission is to develop deployable ML systems - not just models. This includes designing feedback-driven learning systems (e.g., reinforcement learning loops), building robust data pipelines, and creating AI services capable of continuous learning and adaptation.
About the team
The Sandman/Ad-tech team consists of experienced Machine Learning Engineers, Data Scientists, and Full Stack technologists focused on delivering scalable, data-driven solutions for Walmart. This multidisciplinary group integrates AI, software engineering, and data science to develop autonomous AI agents, deploy reliable applications, and build core engineering capabilities at enterprise scale. Utilizing data, AI, and agile methods, the team drives innovation that enhances operational efficiency and supports Walmart’s mission. Collaboration within the team fosters the creation of intelligent systems that improve business performance across the organization.
What you'll do 
Design and deploy end-to-end ML pipelines using Python (data ingestion → training → evaluation → deployment → monitoring). 
Build production-ready, deployable code for ML services (APIs, batch + real-time inference systems). 
Develop and implement reinforcement learning / feedback loop systems (e.g., human-in-the-loop, reward modeling, online learning). 
Architect computer vision and image analysis solutions (classification, embeddings, multimodal systems). 
Integrate ML models into scalable distributed systems serving millions of users in real time. 
Lead development of AI agents and multi-step reasoning systems powered by LLMs and structured ML pipelines. 
Establish MLOps best practices (CI/CD for ML, model versioning, monitoring). 
Continuously improve system performance through experimentation, feedback loops, and optimization
What you'll bring
Extensive experience in software architecture, distributed systems, and scalable design patterns.
Deep understanding of machine learning lifecycle: training, evaluation, deployment, monitoring. 
Hands-on experience with reinforcement learning or feedback-driven systems (bandits, RLHF, online learning). 
Expertise in coding standards, multiple programming languages, and secure software development lifecycle practices. 
Ability to conduct thorough requirement analysis, risk assessment, and solution scoping aligned with business objectives. 
Proven skills in test strategy development, automation tools, and defect management processes. 
Leadership in guiding technical teams, mentoring, and driving continuous improvement initiatives. At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see https://one.walmart.com/notices. Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.
For information about benefits and eligibility, see One.Walmart.
Hoboken, New Jersey US-10279: The annual salary range for this position is $132,000.00 - $264,000.00
Sunnyvale, California US-11349: The annual salary range for this position is $143,000.00 - $286,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include :
- Stock

‎ 

Minimum Qualifications...

Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.

Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 5 years’ experience in software engineering or related area.
Option 2: 7 years’ experience in software engineering or related area.Preferred Qualifications...

Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.

Master’s degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related area., We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.Primary Location...221 River St, Hoboken, NJ 07030, United States of AmericaWalmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment.

What Walmart employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Walmart logo

About Walmart

Sourced by ZipRecruiter

From our humble beginnings as a small discount retailer in Rogers, Ark., Walmart has opened thousands of stores in the U.S. and expanded internationally. Through innovation, we're creating a seamless experience to let customers shop anytime and anywhere online and in stores. We are creating opportunities and bringing value to customers and communities around the globe. Walmart operates approximately 10,500 stores and clubs in 19 countries and eCommerce websites. We employ 2.1 million associates around the world — nearly 1.6 million in the U.S. alone.

Industry

Retail, professional, labor and political organizations, specialized design services, transportation and warehousing and fitness and sports centers

Company size

10,000+ Employees

Headquarters location

Bentonville, AR, US

Social media