Palo Alto, CA or Seattle, WA (Hybrid/Remote) Type: Full-time Role Overview As a Research Scientist, LLM Evaluation & Post-Training , you will be at the frontier of how evaluation design, measurement ...
Palo Alto, CA or Seattle, WA (Hybrid/Remote) Type: Full-time Role Overview As a Research Scientist, LLM Evaluation & Post-Training , you will be at the frontier of how evaluation design, measurement ...
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
Test Engineer-AI/LLM
Palo Alto, CA · On-site
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
Quick apply
Test Engineer-AI/LLM
Palo Alto, CA · On-site
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
Test Engineer-AI/LLM
Palo Alto, CA · On-site
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
Test Engineer-AI/LLM
Palo Alto, CA · On-site
OPPO US Research Center is seeking a full-time meticulous and innovative AI/LLM Test Engineer to join our cutting-edge AI team. In this critical role, you will evaluate the performance, reliability ...
Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship Here at Siemens, we take ... The position is a full-time role for at least 3 months with the possibility of extension. Key ...
Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship Here at Siemens, we take ... The position is a full-time role for at least 3 months with the possibility of extension. Key ...
Role Description This is a full-time, on-site role located in Miami, FL, for a Senior AI/ML ... Research and evaluate new technologies and methodologies in the LLM space to continuously improve ...
Role Description This is a full-time, on-site role located in Miami, FL, for a Senior AI/ML ... Research and evaluate new technologies and methodologies in the LLM space to continuously improve ...
MLE SpeechLLM Evaluations
San Francisco, CA · On-site
$250K - $350K/yr
Hybrid (3 days onsite) Full-time / Permanent DeepRec has partnered with a high-growth AI company ... The Opportunity You'll join an early Speech LLM team where your work shapes research decisions ...
MLE SpeechLLM Evaluations
San Francisco, CA · On-site
$250K - $350K/yr
Hybrid (3 days onsite) Full-time / Permanent DeepRec has partnered with a high-growth AI company ... The Opportunity You'll join an early Speech LLM team where your work shapes research decisions ...
Research Engineer 5 - LLM-Driven Product Understanding
$466K - $750K/yr
Research, develop, and iterate on LLM/ML evaluation and simulation systems to improve Netflix ... Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation ...
Research Engineer 5 - LLM-Driven Product Understanding
$466K - $750K/yr
Research, develop, and iterate on LLM/ML evaluation and simulation systems to improve Netflix ... Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation ...
Technical Architect - AI/ML & LLM
Santa Clara, CA · On-site
$78.25 - $94.50/hr
Santa Clara, CA 95054 (Onsite) Full-Time Overview: We are hiring a Technical Architect - AI/ML with ... Research emerging AI/ML technologies and recommend innovative approaches * Integrate AI models into ...
Technical Architect - AI/ML & LLM
Santa Clara, CA · On-site
$78.25 - $94.50/hr
Santa Clara, CA 95054 (Onsite) Full-Time Overview: We are hiring a Technical Architect - AI/ML with ... Research emerging AI/ML technologies and recommend innovative approaches * Integrate AI models into ...
Research Engineer 5 - LLM-Driven Product Understanding
$466K - $750K/yr
Research, develop, and iterate on LLM/ML evaluation and simulation systems to improve Netflix ... Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation ...
Research Engineer 5 - LLM-Driven Product Understanding
$466K - $750K/yr
Research, develop, and iterate on LLM/ML evaluation and simulation systems to improve Netflix ... Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation ...
Technical Architect - AI/ML & LLM
Santa Clara, CA · On-site
$140K - $160K/yr
Santa Clara, CA 95054 (Onsite) Full-Time Overview: We are hiring a Technical Architect - AI/ML with ... Research emerging AI/ML technologies and recommend innovative approaches * Integrate AI models into ...
Quick apply
Technical Architect - AI/ML & LLM
Santa Clara, CA · On-site
$140K - $160K/yr
Santa Clara, CA 95054 (Onsite) Full-Time Overview: We are hiring a Technical Architect - AI/ML with ... Research emerging AI/ML technologies and recommend innovative approaches * Integrate AI models into ...
NLP Research Scientist
Palo Alto, CA · On-site
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
Quick apply
NLP Research Scientist
Palo Alto, CA · On-site
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
NLP Research Scientist
Palo Alto, CA · On-site
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
NLP Research Scientist
Palo Alto, CA · On-site
Develop LLM model with parameter number around 1 billion, with special focus on developing ... The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term ...
LLM Evaluation Engineering Lead (Redwood City)
Redwood City, CA · On-site
$125K - $165K/yr
LLM Evaluations Engineering Lead - SF Bay Area (Onsite) Full-time / Permanent We're partnering with ... Comfortable operating between research experimentation and production systems Why join * Work on ...
LLM Evaluation Engineering Lead (Redwood City)
Redwood City, CA · On-site
$125K - $165K/yr
LLM Evaluations Engineering Lead - SF Bay Area (Onsite) Full-time / Permanent We're partnering with ... Comfortable operating between research experimentation and production systems Why join * Work on ...
AI Researcher
New York, NY · On-site
$160K - $300K/yr
LLM & Agent Research: Prototype and evaluate prompting strategies, reasoning workflows, and tool ... S. base salary range for this full-time, in-person role in New York is $160,000-$300,000, plus ...
AI Researcher
New York, NY · On-site
$160K - $300K/yr
LLM & Agent Research: Prototype and evaluate prompting strategies, reasoning workflows, and tool ... S. base salary range for this full-time, in-person role in New York is $160,000-$300,000, plus ...
LLM Algorithmic Optimization Engineer
San Jose, CA · On-site
$143K - $186K/yr
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... The US base salary range for this full-time position is $143,200.00 - $186,000.00. * Within the ...
LLM Algorithmic Optimization Engineer
San Jose, CA · On-site
$143K - $186K/yr
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... The US base salary range for this full-time position is $143,200.00 - $186,000.00. * Within the ...
LLM Algorithmic Optimization Engineer
$143K - $186K/yr
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... The US base salary range for this full-time position is $143,200.00 - $186,000.00. * Within the ...
LLM Algorithmic Optimization Engineer
$143K - $186K/yr
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... The US base salary range for this full-time position is $143,200.00 - $186,000.00. * Within the ...
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... Along with competitive pay, as a full-time NIO employee, you are eligible for the following ...
Quick apply
Conduct research and apply cutting-edge technologies to optimize Large Language Models (LLMs) and ... Along with competitive pay, as a full-time NIO employee, you are eligible for the following ...
LLM Inference Frameworks and Optimization Engineer
San Francisco, CA · On-site
$160K - $230K/yr
Work closely with AI researchers and infrastructure engineers to develop efficient model execution ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
LLM Inference Frameworks and Optimization Engineer
San Francisco, CA · On-site
$160K - $230K/yr
Work closely with AI researchers and infrastructure engineers to develop efficient model execution ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
Full Time Llm Researcher information
See salary details
$30K - $42.2K
4% of jobs
$42.2K - $54.5K
3% of jobs
$54.5K - $66.7K
18% of jobs
$67K is the 25th percentile. Wages below this are outliers.
$66.7K - $78.9K
9% of jobs
$78.9K - $91.1K
8% of jobs
$91.1K - $103.4K
3% of jobs
$103.4K - $115.6K
3% of jobs
The median wage is $120.2K / yr.
$115.6K - $127.8K
4% of jobs
$127.8K - $140K
3% of jobs
$140K - $152.3K
3% of jobs
$157.1K is the 75th percentile. Wages above this are outliers.
$152.3K - $164.5K
41% of jobs
$30K
$113.1K
$164.5K
How much do full time llm researcher jobs pay per year?
What are the key skills and qualifications needed to thrive as a Full Time LLM Researcher, and why are they important?
What does a Full Time LLM Researcher do?
What are some common challenges Full Time LLM Researchers face when collaborating with cross-functional teams?
What is the difference between Full Time Llm Researcher vs Part Time Llm Researcher?
| Aspect | Full Time Llm Researcher | Part Time Llm Researcher |
|---|---|---|
| Work Hours | Typically 35-40 hours per week | Less than 20 hours per week |
| Employment Status | Full-time employment | Part-time employment |
| Credentials | Usually requires an LLM degree, relevant research experience | Same as full-time, but may have more flexible qualifications |
| Work Environment | Research institutions, law firms, universities | Similar environments, with flexible scheduling |
Full Time Llm Researchers work full-time hours, often with more responsibilities and consistent schedules, while Part Time Llm Researchers have flexible hours with potentially fewer responsibilities. Both roles typically require an LLM degree and involve research in legal fields, but the full-time position offers more stability and engagement.

Job description
Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem-comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets-to create contextual, multilingual, pre-trained datasets; fine-tuned, industry-specific LLMs; and RAG pipelines supported by vector databases. Our zero-distance innovationâ„¢ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.
Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers. We aim to help these organizations unlock significant business value by deploying GenAI at scale, helping to ensure they stay at the forefront of technological advancement and maintain a competitive edge in their respective markets.
About Job
Research Scientist, LLM Evaluation & Post-Training
Company: Centific
Location: Palo Alto, CA or Seattle, WA (Hybrid/Remote)
Type: Full-time
Role Overview
As a Research Scientist, LLM Evaluation & Post-Training, you will be at the frontier of how evaluation design, measurement strategy, and feedback signals drive model improvement across Centific's AI platform products. This is a high-impact individual contributor and collaborative research role that sits at the intersection of applied ML research, enterprise AI product development, and customer-facing scientific consulting.
You will lead research programs that define next-generation evaluation-driven post-training workflows, develop rigorous benchmark frameworks, and partner directly with leading AI organizations to deliver credible, actionable model improvement insights. This role offers the opportunity to shape Centific's internal research agenda, build reusable scientific assets, and publish at top-tier venues.
Key Responsibilities
- Research Agenda & Experimentation: Define and execute a rigorous research agenda focused on LLM evaluation and post-training, with emphasis on evaluation-driven model improvement. Design experiments to study how evaluation methodologies impact fine-tuning and post-training outcomes.
- Evaluation Framework Development: Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems, covering benchmark and task design, scoring methods, judge/model-assisted evaluation, human evaluation protocols, and robustness/stress testing.
- Advanced Evaluation Research: Lead research on frontier evaluation domains including long-context, cross-modal, and dynamic multi-turn evaluations. Study effectiveness and limitations of existing techniques and propose improved methodologies with clear validity and scalability tradeoffs.
- Model Behavior Analysis: Analyze model behavior and failure patterns; generate actionable recommendations for model improvement and evaluation redesign. Translate findings into practical improvements for customer solutions and Centific's internal platforms.
- Cross-Functional Collaboration: Partner with Language Data Scientists to integrate human-in-the-loop and synthetic data/evaluation strategies, and with AI/ML Research Engineers to translate research methods into scalable evaluation and post-training pipelines.
- Customer Engagement: Engage with customer technical stakeholders at leading AI organizations to understand evaluation goals, review methodologies, and provide expert scientific recommendations. Serve as a credible technical peer to research and engineering leaders.
- Knowledge & IP Creation: Contribute to internal benchmark datasets, reusable evaluation frameworks, and research assets. Produce high-quality technical documentation, internal research reports, and client-facing materials explaining methods, results, assumptions, and limitations.
- Thought Leadership: Contribute to Centific's position as a leader in LLM evaluation and post-training through publications, conference presentations, and open-source contributions.
Core Technical Competencies
You will provide technical depth and leadership across the following domains:
Evaluation Science & Benchmarking
- Expert-level benchmark dataset and test suite design for language and multimodal models
- Deep understanding of metric design, scoring reliability, and measurement validity
- Experience with human evaluation methods and quality assurance (rubric design, inter-rater reliability, adjudication frameworks)
LLM & Post-Training Methods
- Strong understanding of post-training techniques (SFT, RLHF, RLAIF, DPO, PPO, GRPO) and how training objectives interact with evaluation outcomes
- Ability to reason about model behavior, failure modes, and performance tradeoffs across tasks and domains
- Familiarity with alignment, safety, and robustness considerations in model evaluation
Quantitative Analysis & Scientific Rigor
- Strong statistical analysis skills: sampling, uncertainty quantification, significance testing, error analysis, metric interpretation
- Ability to synthesize complex experimental findings into concise, actionable recommendations for engineering and business stakeholders
Required Qualifications
- Education: MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or a related quantitative field (PhD strongly preferred).
- Research Experience: 5+ years of relevant experience in applied ML research or research science, with substantial work in LLMs or foundation models (graduate research counts).
- LLM Evaluation Expertise: Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
- Experimental Design: Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
- Technical Proficiency: Strong Python coding skills for research experimentation, data processing, evaluation pipelines, statistical analysis, and visualization. Hands-on experience with modern ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow).
- Evaluation Methodology: Ability to evaluate and compare human and automated evaluation methods, including tradeoffs in cost, reliability, validity, and scalability. Experience designing reproducible evaluation studies across datasets and model versions.
- Communication: Strong written and verbal communication skills; able to present nuanced technical conclusions, assumptions, and limitations clearly to both research and non-technical audiences.
Preferred Qualifications
- Post-Training Practice: Hands-on experience running fine-tuning or post-training experiments (SFT, preference optimization, RLHF/RLAIF-style workflows).
- Multimodal & Long-Context: Experience with multimodal evaluation (text-image, audio, video) and long-context benchmarking in real-world settings.
- Agentic Evaluation: Experience designing multi-turn, interactive, or agentic evaluation protocols.
- Scientific Contribution: Publications and/or open-source benchmark contributions in LLM evaluation, post-training, alignment, or related areas at top venues (NeurIPS, ICML, ICLR, ACL, EMNLP, etc.).
- Applied Research Consulting: Experience in customer-facing applied research, technical consulting, or cross-functional product/research collaboration.
- Safety & Governance: Familiarity with safety, trustworthiness, and governance considerations in GenAI evaluation.
Salary: $150K - $300K Annually
Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.
About Centific
Sourced by ZipRecruiter
Industry
It services
Company size
5,001 - 10,000 Employees
Headquarters location
Redmond, WA, US
Year founded
2020