1

Online Rlhf Jobs (NOW HIRING)

AI Research Engineer

New York, NY ยท On-site

$300K - $400K/yr

Stay current on LLM agents, RL (offline/online, RLHF/RLAIF), constrained decoding, and program synthesis. What Makes You A Great Fit: * PhD in CS/AI/ML (or equivalent research experience) with ...

AI Engineer

New York, NY ยท On-site

$200K - $300K/yr

Research or applied experience with LLM agents, RL (offline/online, RLHF/RLAIF), constrained decoding, or program synthesis. * Open-source contributions or publications in AI/ML venues. * Skill in ...

LLM Training Engineer

San Francisco, CA ยท On-site

$155K - $220K/yr

Design offline + online environments that support RL-style training at scale * Instrument ... Post-training pipelines (SFT, RLHF/RLAIF, preference optimization, eval loops) * Building RL ...

... RLHF, RLAIF, or DPO for multi-objective optimization. * Develop reward models and objective ... online and batch adaptation loops with strong guardrails. * Translate conversational logs ...

next page

Showing results 1-20

Online Rlhf information

See salary details

$17.5K

$40.6K

$86K

How much do online rlhf jobs pay per year?

As of Jun 21, 2026, the average yearly pay for online rlhf in the United States is $40,596.00, according to ZipRecruiter salary data. Most workers in this role earn between $25,000.00 and $43,500.00 per year, depending on experience, location, and employer.

What are some common challenges faced by Online RLHF (Reinforcement Learning from Human Feedback) specialists when collaborating with cross-functional teams?

Online RLHF specialists often work closely with machine learning engineers, data annotators, and product managers. A common challenge is ensuring that feedback from human annotators is accurately integrated into model training, which requires clear communication and well-defined annotation guidelines. Additionally, balancing the pace of model updates with the need for high-quality human feedback can be demanding. Effective collaboration and regular syncs are essential to maintain alignment and achieve project goals.

What is the difference between Online Rlhf vs Online Rlhf?

AspectOnline RlhfOnline Rlhf
CredentialsTypically requires certification in online health coaching or related fieldsTypically requires certification in online health coaching or related fields
Work EnvironmentRemote, online platform-basedRemote, online platform-based
Industry UsageCommon in health and wellness sectorsCommon in health and wellness sectors
Job FocusProviding health guidance and support onlineProviding health guidance and support online

Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.

What are Online RLHF jobs?

Online RLHF (Reinforcement Learning from Human Feedback) jobs typically involve helping to train AI models by providing human feedback on their outputs. Workers in these roles might review model responses, rate the quality of generated text, or suggest improvements to help the AI learn to produce better results. These jobs are often remote and can be done part-time or as contract work. They play a crucial role in improving the safety, usefulness, and accuracy of AI systems by aligning them more closely with human preferences.

What are the key skills and qualifications needed to thrive as an Online RLHF (Reinforcement Learning from Human Feedback) Specialist, and why are they important?

To thrive as an Online RLHF Specialist, you need a strong background in machine learning, reinforcement learning, and data analysis, typically supported by a degree in computer science or a related field. Familiarity with technical tools like Python, PyTorch or TensorFlow, and experience with human feedback systems or annotation platforms are highly valuable. Strong problem-solving, attention to detail, and the ability to communicate complex concepts clearly are crucial soft skills. These qualifications ensure the effective training and evaluation of AI models, leading to more accurate and reliable machine learning systems.
More about Online Rlhf jobs
What cities are hiring for Online Rlhf jobs? Cities with the most Online Rlhf job openings:
What are the most commonly searched types of Rlhf jobs? The most popular types of Rlhf jobs are:
What states have the most Online Rlhf jobs? States with the most job openings for Online Rlhf jobs include:

AI Research Engineer

Normal Computing

New York, NY โ€ข On-site

$300K - $400K/yr

Full-time

Posted 15 days ago


Job description

Normal Computing | Incredible Opportunities
The Normal Team builds foundational software and hardware that help move technology forward - supporting the semiconductor industry, critical AI infrastructure, and the broader systems that power our world. We work as one team across New York, San Francisco, Copenhagen, Seoul, and London.
Your Role in Our Mission:
We're hiring an AI Researcher / AI Research Engineer to push the frontier of agentic LLMs and reinforcement learning for our agentic code generation tool, nectar. You'll design and run experiments, build agents, curate datasets from complex technical documents (e.g., chip specifications), and create rigorous evaluations. You'll write productionโ€‘quality research code and work closely with engineering to ship improvements to customers. Leadership not required-impact through research and building is.
Responsibilities:
  • Design and implement multiโ€‘agent and RL approaches for agentic code generation and toolโ€‘use.
  • Build research prototypes that integrate with nectar; collaborate to productionize wins.
  • Create evaluation suites: task specs, pass/fail checkers, coverage, cost/latency dashboards.
  • Acquire and curate datasets from PDFs/logs/tables; generate synthetic data where appropriate; maintain data cards and licensing.
  • Analyze experiments with disciplined ablations; document results and decisions.
  • Stay current on LLM agents, RL (offline/online, RLHF/RLAIF), constrained decoding, and program synthesis.

What Makes You A Great Fit:
  • PhD in CS/AI/ML (or equivalent research experience) with publications ideally in multiโ€‘agent RL, agentic AI, or RL for language/code.
  • Strong Python and ML framework experience (PyTorch preferred; JAX/HF a plus).
  • Demonstrated ability to turn research into working systems; reproducibility mindset (tests, seeds, configs, logging).
  • Experience designing eval harnesses and success metrics for sequential/agentic tasks.
  • Comfortable with data acquisition/curation from documents/logs; good instincts about data quality and licenses.

Bonus Points For:
  • Research on program synthesis/codegen, constrained decoding, or executionโ€‘based rewards.
  • Experience with offline RL from tool traces or human corrections.
  • Openโ€‘source contributions (e.g., CleanRL, RLlib, AutoGen, LangGraph, CrewAI, Transformers).
  • Familiarity with semiconductor/chip domains or other complex technical specs.
  • Track record of shipping research to production and measuring impact.

Equal Employment Opportunity Statement
Normal Computing is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other legally protected status.
Accessibility Accommodations
Normal Computing is committed to providing reasonable accommodations to individuals with disabilities. If you need assistance or an accommodation due to a disability, please let us know at accommodations@normalcomputing.com.
Privacy Notice
By submitting your application, you agree that Normal Computing may collect, use, and store your personal information for employment-related purposes in accordance with our Privacy Policy.