Principal AI Engineer
Santa Clara, CA ยท On-site
... CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B ... Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and ...
Santa Clara, CA ยท On-site
... CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B ... Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and ...
Santa Clara, CA ยท On-site
... CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B ... Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and ...
... online learning and recommendation systemsExperience working with machine learning or LLM model ... techniques (e.g RLHF, Reward model, DPO, PPO, GRPO etc.), Parameter efficient fine-tuning ...
... online learning and recommendation systemsExperience working with machine learning or LLM model ... techniques (e.g RLHF, Reward model, DPO, PPO, GRPO etc.), Parameter efficient fine-tuning ...
Experience in modeling user behavior including personalization, online learning and recommendation ... techniques (e.g RLHF, Reward model, DPO, PPO, GRPO etc.), Parameter efficient fine-tuning ...
Experience in modeling user behavior including personalization, online learning and recommendation ... techniques (e.g RLHF, Reward model, DPO, PPO, GRPO etc.), Parameter efficient fine-tuning ...
Santa Clara, CA ยท On-site
... CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B ... Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and ...
Santa Clara, CA ยท On-site
... CI/CD, online evaluators on production traffic, calibrated LLM-as-a-judge graders, and A/B ... Experience with model customization (SFT, RLHF, DPO/GRPO), eval/observability platforms and ...
$225K - $280K/yr
... online * Drive measurable improvements to LLM judge quality (calibration, fine-tuning where ... Direct experience with LLM-based systems: judge models, RAG, prompt engineering, fine-tuning, RLHF ...
$225K - $280K/yr
... online * Drive measurable improvements to LLM judge quality (calibration, fine-tuning where ... Direct experience with LLM-based systems: judge models, RAG, prompt engineering, fine-tuning, RLHF ...
Seattle, WA ยท On-site
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, prompt engineering, or agentic architectures - Experience with LLM/VLM serving ...
Seattle, WA ยท On-site
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, prompt engineering, or agentic architectures - Experience with LLM/VLM serving ...
Palo Alto, CA ยท On-site
... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...
Palo Alto, CA ยท On-site
... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...
Los Gatos, CA ยท On-site
D. in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
Los Gatos, CA ยท On-site
D. in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
Seattle, WA ยท On-site
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, or agentic architectures Amazon is an equal opportunity employer and does not ...
Seattle, WA ยท On-site
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying ... tuning, RLHF, or agentic architectures Amazon is an equal opportunity employer and does not ...
Palo Alto, CA ยท On-site
... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...
Palo Alto, CA ยท On-site
... RLHF, DPO). * Drive model selection decisions (SLMs vs. larger models) based on use-case ... Develop offline and online evaluation loops - including LLM-as-judge frameworks - that guide rapid ...
Los Angeles, CA ยท On-site
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
Los Angeles, CA ยท On-site
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
Cupertino, CA ยท On-site +1
$147K - $272K/yr
... online metrics, covering reasoning, tool use, and task success. Design and maintain verifiers ... RLHF, DPO, PPO). Strong software engineering fundamentals: debugging, testing, code reviews, and ...
Cupertino, CA ยท On-site +1
$147K - $272K/yr
... online metrics, covering reasoning, tool use, and task success. Design and maintain verifiers ... RLHF, DPO, PPO). Strong software engineering fundamentals: debugging, testing, code reviews, and ...
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
$114K - $156K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
$114K - $156K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
$159K - $213K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
$159K - $213K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
Palo Alto, CA ยท On-site
$114K - $156K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
Palo Alto, CA ยท On-site
$114K - $156K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
$114K - $157K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
$114K - $157K/yr
Design, implement, and iterate on reinforcement learning (RL) and continuous learning pipelines (e.g., RLHF, RLAIF, offline/online feedback loops). * Conduct rigorous experimentation, ablation ...
New York, NY ยท On-site
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
New York, NY ยท On-site
D in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
Cupertino, CA ยท On-site +1
$147K - $272K/yr
... online metrics, covering reasoning, tool use, and task success. Design and maintain verifiers ... RLHF, DPO, PPO). Strong software engineering fundamentals: debugging, testing, code reviews, and ...
Cupertino, CA ยท On-site +1
$147K - $272K/yr
... online metrics, covering reasoning, tool use, and task success. Design and maintain verifiers ... RLHF, DPO, PPO). Strong software engineering fundamentals: debugging, testing, code reviews, and ...
New York, NY ยท On-site
D. in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
New York, NY ยท On-site
D. in Computer Science or a related field with a specialization in post-training LLMs for downstream tasks, especially using RL (e.g., RLVR, RLHF, offline or online, policy- or value-based), and ...
$17.5K - $23.7K
12% of jobs
$28.2K is the 25th percentile. Wages below this are outliers.
$23.7K - $30K
18% of jobs
$30K - $36.2K
15% of jobs
The median wage is $37.1K / yr.
$36.2K - $42.4K
33% of jobs
$42.4K - $48.6K
12% of jobs
$48.6K - $54.9K
0% of jobs
$54.9K - $61.1K
0% of jobs
$61.1K - $67.3K
0% of jobs
$67.3K - $73.5K
0% of jobs
$73.5K - $79.8K
0% of jobs
$79.8K - $86K
9% of jobs
$17.5K
$40.6K
$86K
| Aspect | Online Rlhf | Online Rlhf |
|---|---|---|
| Credentials | Typically requires certification in online health coaching or related fields | Typically requires certification in online health coaching or related fields |
| Work Environment | Remote, online platform-based | Remote, online platform-based |
| Industry Usage | Common in health and wellness sectors | Common in health and wellness sectors |
| Job Focus | Providing health guidance and support online | Providing health guidance and support online |
Online Rlhf and Online Rlhf are the same role, often used interchangeably. Both involve providing health and wellness support remotely, requiring similar certifications and working within the online health industry. The key difference is often in terminology rather than job function.
Sourced by ZipRecruiter
Computer and computer peripheral equipment and software wholesalers
5,001 - 10,000 Employees
Houston, TX, US
1980