1

Online Rlhf Jobs (NOW HIRING)

AI Applied Scientist

$225K - $280K/yr

... online * Drive measurable improvements to LLM judge quality (calibration, fine-tuning where ... Direct experience with LLM-based systems: judge models, RAG, prompt engineering, fine-tuning, RLHF ...

At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, prompt engineering, or agentic architectures - Experience with LLM/VLM serving ...

... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...

At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, or agentic architectures Amazon is an equal opportunity employer and does not ...

... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...

Software Engineering LMTS

Palo Alto, CA

$114K - $156K/yr

Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...

Software Engineering LMTS

Palo Alto, CA

$159K - $213K/yr

Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...

Software Engineering LMTS

Palo Alto, CA ยท On-site

$114K - $156K/yr

Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...

Software Engineering LMTS

Palo Alto, CA

$114K - $157K/yr

Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...

next page

Showing results 1-20

Online Rlhf information

See salary details

$17.5K

$40.6K

$86K

How much do online rlhf jobs pay per year?

As of Jun 21, 2026, the average yearly pay for online rlhf in the United States is $40,596.00, according to ZipRecruiter salary data. Most workers in this role earn between $25,000.00 and $43,500.00 per year, depending on experience, location, and employer.

What are some common challenges faced by Online RLHF (Reinforcement Learning from Human Feedback) specialists when collaborating with cross-functional teams?

Online RLHF specialists often work closely with machine learning engineers, data annotators, and product managers. A common challenge is ensuring that feedback from human annotators is accurately integrated into model training, which requires clear communication and well-defined annotation guidelines. Additionally, balancing the pace of model updates with the need for high-quality human feedback can be demanding. Effective collaboration and regular syncs are essential to maintain alignment and achieve project goals.

What is the difference between Online Rlhf vs Online Rlhf?

AspectOnline RlhfOnline Rlhf
CredentialsTypically requires certification in online health coaching or related fieldsTypically requires certification in online health coaching or related fields
Work EnvironmentRemote, online platform-basedRemote, online platform-based
Industry UsageCommon in health and wellness sectorsCommon in health and wellness sectors
Job FocusProviding health guidance and support onlineProviding health guidance and support online

Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.

What are Online RLHF jobs?

Online RLHF (Reinforcement Learning from Human Feedback) jobs typically involve helping to train AI models by providing human feedback on their outputs. Workers in these roles might review model responses, rate the quality of generated text, or suggest improvements to help the AI learn to produce better results. These jobs are often remote and can be done part-time or as contract work. They play a crucial role in improving the safety, usefulness, and accuracy of AI systems by aligning them more closely with human preferences.

What are the key skills and qualifications needed to thrive as an Online RLHF (Reinforcement Learning from Human Feedback) Specialist, and why are they important?

To thrive as an Online RLHF Specialist, you need a strong background in machine learning, reinforcement learning, and data analysis, typically supported by a degree in computer science or a related field. Familiarity with technical tools like Python, PyTorch or TensorFlow, and experience with human feedback systems or annotation platforms are highly valuable. Strong problem-solving, attention to detail, and the ability to communicate complex concepts clearly are crucial soft skills. These qualifications ensure the effective training and evaluation of AI models, leading to more accurate and reliable machine learning systems.
More about Online Rlhf jobs
What cities are hiring for Online Rlhf jobs? Cities with the most Online Rlhf job openings:
What are the most commonly searched types of Rlhf jobs? The most popular types of Rlhf jobs are:
What states have the most Online Rlhf jobs? States with the most job openings for Online Rlhf jobs include:
Principal AI Engineer

Principal AI Engineer

BMC Software, Inc.

Santa Clara, CA โ€ข On-site

Full-time

Posted 5 days ago


Job description

Basic Information
Job Name
Principal Innovator - USA (B)
Country
United States
State
California
City
Santa Clara
Date Published
16-Jun-2026
Job ID
47117
Travel
up to 10%
Additional Locations
Milpitas - Mountain View - East Foothills - Los Altos - Stanford - Santa Clara
Looking for more details about our benefits? You can also learn all about them by clicking HERE
Description and Requirements
BMC empowers nearly 80% of the Forbes Global 100 to accelerate business value, faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world's largest organizations so they can seize a competitive advantage.
We're looking for a Principal AI Engineer to architect, build, and harden the agentic AI systems that power our products. This is a hands-on, full-stack AI role: you'll work across the entire AI stack from the foundation (model lifecycle, experimentation, infrastructure) through shared services (agent orchestration, RAG and grounding, gateways and routing) up to the agents and workflows that reach customers. You'll set the technical direction for how we design agents, prove what works through rigorous evaluation, and turn promising prototypes into reliable, observable, governed production systems.
Here is how, through this role, you will contribute to BMC's and your own success:
  • Own the architecture for agentic systems end to end - reasoning and planning, tool/function calling, multi-agent coordination, routing, memory, and human-in-the-loop handoffs and define reusable patterns and reference implementations.
  • Prototype new agentic capabilities quickly, then drive the ones that prove into production-grade systems; write production-quality code and set the engineering bar.
  • Build and integrate across the stack: RAG and knowledge services, model/tool routing, prompt and context management, orchestration runtimes, and inference serving. Evaluate and adopt emerging models, frameworks, and protocols.
  • Stand up the evaluation and experimentation strategy that gates what we ship, offline suites and golden datasets, regression tests in CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B experiments to ensure safe and reliable deployments of agents in production.
  • Define the metrics that matter (task success, grounded-ness, cost, safety, etc.) and define and build the tracing and observability to measure them across multi-turn interactions, closing the loop from error analysis to continuous improvement.
  • Set technical direction across teams, mentor engineers, and translate complex architectures into clear guidance for partners and customers.

To ensure you're set up for success, you will bring the following skillset & experience:
  • 8+ years of building and operating scalable, production-grade software, with significant recent depth in AI/ML.
  • Proven experience designing, building, and shipping agentic AI systems (autonomous/multi-step agentic workflows, multi-agent frameworks, and Generative AI copilots) in production.
  • Hands-on experience building evaluation and experimentation frameworks for LLM/agentic systems (offline + online evals, LLM-as-a-judge, benchmarking, CI/CD gates, production monitoring).
  • Strong agentic engineering skills, with modern agent/LLM frameworks (e.g., LangGraph, Google ADK, or comparable).
  • Experience with RAG and retrieval (vector stores, grounding, citations) and familiarity with AgentOps/LLMOps and production monitoring.
  • Solid software engineering fundamentals with productization tooling: Git, Docker, Kubernetes, CI/CD, and cloud (AWS / Azure / GCP).
  • The judgment to operate in fast-moving, ambiguous environments and the communication skills to lead without authority.
  • Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and bridging applied research and engineering is a nice to have.

Our commitment to you!
BMC's culture is built around its people. We have 6000+ brilliant minds working together across the globe. You won't be known just by your employee number, but for your true authentic self. BMC lets you be YOU!
If after reading the above, You're unsure if you meet the qualifications of this role but are deeply excited about BMC and this team, we still encourage you to apply! We want to attract talents from diverse backgrounds and experience to ensure we face the world together with the best ideas!
BMC is committed to equal opportunity employment regardless of race, age, sex, creed, color, religion, citizenship status, sexual orientation, gender, gender expression, gender identity, national origin, disability, marital status, pregnancy, disabled veteran or status as a protected veteran. If you need a reasonable accommodation for any part of the application and hiring process, visit the accommodation request page.
BMC Software maintains a strict policy of not requesting any form of payment in exchange for employment opportunities, upholding a fair and ethical hiring process.
The annual base salary range represents the low and high end of the BMC salary range for this position. Actual salaries depend on a wide range of factors that are considered in making compensation decisions, including but not limited to skill sets; experience and training, licensure, and certifications; and other business and organizational needs.
The range listed is just one component of BMC's employee compensation package. Other rewards may include a variable plan and country specific benefits.
At BMC, it is not typical for an individual to be hired at /near the top of the range. A reasonable estimate of the current range is $175,800 - $293,000