1

Llm Trainer Jobs (NOW HIRING)

LLM Dataset Engineer

San Francisco, CA · On-site

$155K - $210K/yr

Post-Training & Alignment Data: Lead the development of high-quality post-training datasets ... Experience building massive LLM training sets from scratch , including raw web crawls (e.g., Common ...

Connect LLM capabilities with Luminary's Physics AI training/evaluation/inference pipelines, physics simulation solvers, mesh tools, and analytics APIs to enable end-to-end automation * Establish ...

You will collaborate with world-class talent in LLM training, on-device and server optimization, ML tools/platforms, datasets, and evaluation. You will develop reliable and scalable pipelines and ...

They are seeking an AI/LLM Engineer to design and implement advanced AI systems centered on large ... Gap International is a global business management consulting firm that provides executive training ...

About the Role EnCharge AI is seeking an LLM Inference Deployment Engineer to optimize, deploy, and ... Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like ...

The AI/LLM Engineer will lead the design and implementation of advanced systems centered on large ... Gap International is a global business management consulting firm that provides executive training ...

AI / LLM Engineering & Agentic Systems * Design, build, and deploy LLM powered applications using ... Develop Python based pipelines for model training, evaluation, and deployment * Apply prompt ...

OR · On-site

Researching innovative techniques in generative models, artificial data creation, user simulation, reward modeling, and data-quality estimation for LLM training. * Crafting and applying new methods ...

Target AI infrastructure (model serving, training pipelines, vector databases, GPU/MLOps tooling ... Build and extend LLM-powered applications (prompting, structured output, agentic workflows)

next page

Showing results 1-20

Llm Trainer information

See salary details

$15

$36

$92

How much do llm trainer jobs pay per hour?

As of Jun 13, 2026, the average hourly pay for llm trainer in the United States is $36.91, according to ZipRecruiter salary data. Most workers in this role earn between $19.23 and $52.88 per hour, depending on experience, location, and employer.

What are some typical responsibilities and challenges faced by an LLM Trainer on a day-to-day basis?

LLM Trainers are responsible for designing and refining training datasets, developing prompts, evaluating model outputs, and working closely with engineers and data scientists to optimize large language models. Common challenges include maintaining data quality, mitigating model biases, and staying up-to-date with rapidly evolving AI research and best practices. You’ll often collaborate with cross-functional teams, communicate findings clearly, and adapt to new tools or methodologies. This dynamic environment offers opportunities for innovation and skill development, making it an excellent fit for those passionate about advancing AI technology.

What are the key skills and qualifications needed to thrive in the Llm Trainer position, and why are they important?

To thrive as an LLM Trainer, you need a deep understanding of natural language processing (NLP), machine learning principles, and data annotation techniques, often supported by a background in computer science or related fields. Familiarity with tools like Python, PyTorch or TensorFlow, data labeling platforms, and version control systems is essential, along with knowledge of prompt engineering and model fine-tuning. Strong analytical thinking, attention to detail, and collaborative communication skills are crucial soft skills for working with cross-functional AI teams. These competencies are important for developing high-quality language models that meet user needs and industry standards.

What is an LLM Trainer job?

An LLM Trainer is responsible for training and fine-tuning large language models (LLMs) to improve their accuracy, efficiency, and relevance for specific applications. This role involves curating and preprocessing training data, designing training methodologies, and evaluating model performance. LLM Trainers work closely with data scientists, engineers, and researchers to optimize models for tasks such as natural language understanding, text generation, and conversational AI. They also ensure ethical AI practices by mitigating biases and refining model outputs.

What cities are hiring for Llm Trainer jobs? Cities with the most Llm Trainer job openings:
What are the most commonly searched types of Llm Trainer jobs? The most popular types of Llm Trainer jobs are:
What states have the most Llm Trainer jobs? States with the most job openings for Llm Trainer jobs include:
Infographic showing various Llm Trainer job openings in the United States as of June 2026, with employment types broken down into 6% Internship, 66% Full Time, and 28% Contract. Highlights an 72% In-person, 11% Hybrid, and 17% Remote job distribution, with an average salary of $76,772 per year, or $36.9 per hour.

LLM Dataset Engineer

Sciforium

San Francisco, CA • On-site

$155K - $210K/yr

Full-time

Medical, Dental, Vision, Retirement

Posted 7 days ago


Job description

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. Backed by multi-million-dollar funding and direct sponsorship from AMD with hands-on support from AMD engineers the team is scaling rapidly to build the full stack powering frontier AI models and real-time applications.
Role Overview
Sciforium is seeking a highly technical and visionary LLM Dataset Engineer to lead the strategy, creation, and curation of the massive datasets that power our foundation models. We believe that in the era of LLMs, data is the primary competitive advantage. In this role, you will own the end-to-end data lifecycle-from raw web-scale crawling to the fine-grained human-alignment datasets that define model behavior.
This position is ideal for a scientist who views data as a high-scale engineering challenge and an analytical puzzle. You will not just "provide" data; you will design the taxonomies, filtering heuristics, and post-training pipelines that ensure our models are world-class in reasoning, safety, and multimodal understanding.
Key Responsibilities
  • Foundation Dataset Strategy: Own the end-to-end creation of pre-training datasets for LLMs. This includes defining the mix of web data, code, books, and technical papers to optimize for downstream model performance.
  • Petabyte-Scale Curation: Design and implement sophisticated pipelines for data cleaning, exact/fuzzy deduplication, and high-quality signal extraction from petabytes of raw, unstructured data.
  • Post-Training & Alignment Data: Lead the development of high-quality post-training datasets, including Supervised Fine-Tuning (SFT) instructions, multi-turn dialogues, and preference modeling data (RLHF/DPO).
  • Multimodal Expansion: Drive the acquisition and processing of vision and video data, navigating the complexities of multimodal alignment, video compression, and temporal data consistency.
  • High-Performance Engineering: Develop high-throughput data processing scripts using Python, leveraging multiprocessing and multithreading to handle massive-scale ingestion and transformation without bottlenecks.
  • Data Profiling & Analysis: Conduct deep-dive statistical analysis on training corpora to identify biases, gaps in knowledge, and quality regressions, ensuring the "diet" of the model is mathematically balanced.
  • Synthetic Data Generation: (Added Value) Design pipelines to generate high-reasoning synthetic data to augment gaps in natural datasets, utilizing existing models for data labeling and refinement.
Must-Haves
  • 5+ years of industry experience in Data Science or Machine Learning, with a proven track record of building and managing datasets for foundation models.
  • Deep Proficiency in Python: Expert-level skills with a focus on high-performance code, including multiprocessing, multithreading, and efficient memory management for large-scale data tasks.
  • Petabyte-Scale Experience: Demonstrated experience working with petabyte-scale datasets that have been directly used to train production-grade LLMs or Large Vision Models.
  • Dataset Reconstruction: Experience building massive LLM training sets from scratch, including raw web crawls (e.g., Common Crawl) and specialized domain data.
  • Post-Training Expertise: Hands-on experience building datasets for RLHF, DPO, and multi-turn instruction following, including the management of human-labeling workflows and quality gold-sets.
  • Data Tooling: Mastery of data-at-scale frameworks such as Spark, Ray, or high-performance data-loading formats (e.g., WebDataset, Parquet).
Nice-to-Haves
  • Computer Vision (CV) Curation: Experience building large-scale image or video datasets from scratch (e.g., LAION-style pipelines).
  • Multimodal Crawling: Familiarity with large-scale crawling of multimodal data and the associated challenges of video processing, codecs, and compression.
  • Taxonomy Design: Experience in designing complex labeling schemas for reasoning, coding, and mathematical benchmarks.
  • Research Background: A Master's or PhD in a quantitative field with a focus on data-centric AI or information retrieval.

Benefits include
  • Medical, dental, and vision insurance
  • 401k plan
  • Daily lunch, snacks, and beverages
  • Flexible time off
  • Competitive salary and equity

Equal opportunity
Sciforium is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.