Overview
We are seeking a seasoned techno-functional leader to drive the development and execution of large-scale LLM training programs. This leader would partner with our clients (leading LLM labs) research teams to:
- Identify opportunities for building training datasets to improve model capabilities and performance
- Generate these datasets with high quality and speed
- Build automation tools and processes for scalability
- Deliver the datasets so that they are easily usable by our clients
Key Responsibilities
Operational Leadership & Performance Management
- Lead and scale global delivery teams of 100+, distributed across functions, regions, and levels (ICs, leads, and managers)
- Implement performance management systems that go beyond managerial reporting using data-driven metrics, tools, and products to assess productivity, quality, and output consistency
- Build strong operational structures that allow for transparency, accountability, and early detection of underperformance
- Partner with cross-functional leads to optimize workflows and improve internal tool adoption for delivery efficiency
Data Quality & Scripting-Driven Automation
- Own the quality, accuracy, and scalability of data generated for LLM training
- Move beyond manual QA layers by leveraging Python scripting, APIs, and automation frameworks to measure, validate, and improve dataset integrity
- Design and oversee tools or scripts for data validation, annotation accuracy checks, and pipeline consistency
- Ensure datasets adhere to compliance standards (PII, GDPR, HIPAA) and can be programmatically tested for usability and quality
LLM Training & Evaluation
- Lead generation and delivery of high-quality, scalable datasets focused on SFT, RLHF, reasoning, and agentic workflows
- Oversee the entire data lifecycle from client intake and annotation workflow design to delivery
- Partner with product, research, and engineering teams to implement evaluation metrics (e.g., win rate, inter-annotator agreement, and pairwise preference scoring)
Client Partnership & Communication
- Serve as the primary point of contact for enterprise AI clients; manage expectations, delivery timelines, and escalations
- Build relationships with engineering and research stakeholders by delivering consistently high-quality data
- Communicate effectively across technical and non-technical audiences; provide transparency through structured updates and quality reporting
Team Development & Tooling
- Recruit, mentor, and coach cross-functional leaders (Eng, Data, Ops, and Program Management)
- Drive adoption and improvement of internal tools (e.g., task management systems, quality dashboards)
- Champion continuous improvement across data quality, tools, and delivery processes
Required Qualifications
- 10+ years of experience leading large-scale technical delivery organizations, ideally across AI, ML, or data operations
- Bachelor's degree in Engineering, Computer Science, or equivalent technical discipline
- Demonstrated ability to act as a strategic business partner with our clients, researchers, and engineers at leading LLM labs
- Proven success in building and scaling multi-level high performance teams, with distributed global operations
- Experience managing managers
- Skip-level performance management
- Hands-on technical fluency: ability to write and review data validation scripts
- Demonstrated experience managing dataset generation or annotation for machine learning model evaluation and/or training
- Familiarity with ML tools and data workflows (e.g., HuggingFace, LangChain, Weights & Biases, Databricks)
Preferred Qualifications
- Experience evaluating large language model performance and/or improving model performance via fine-tuning
- Strong understanding of data quality frameworks, including automation, toolings and manual processes
- Experience in AI data annotation, model evaluation, and fine-tuning platforms
- Strong communication and storytelling skills with executive stakeholders
Location SF Bay Area (Hybrid)
Compensation: $255,000 to $325,000 OTE + Equity