Role OverviewAs an Applied Research intern at Labelbox, you will design, build, and productionize evaluation and posttraining systems for frontier LLMs and multimodal models. You'll own continuous, high-quality evals and benchmarks (reasoning, code, agent/tooluse, longcontext, visionlanguage, et al.), create and curate posttraining datasets (human + synthetic), and prototype RLHF/RLAIF/RLVR/RM/DPOstyle training loops to measure and improve realworld task and agent performance.
Your Impact- Build and own evaluation and benchmark suites for reasoning, code, agents, longcontext, and V/LLMs.
- Create posttraining datasets at scale: design preference/critique pipelines (human + synthetic), and target hard failures surfaced by evals.
- Experiment and prototype RLHF/RLAIF/RLVR/RM/DPOstyle training loops to improve real-world task and agent performance.
- Land research in product: ship improvements into Labelbox workflows, services, and customerfacing evaluation/quality features; quantify impact with customer and internal metrics.
- Engage with customer research teams: run pilots, codesign benchmarks, and share practical findings through internal research reports, blog posts, talks, and published papers.
What You Bring- A strong foundation in AI and machine learning, backed by a Ph.D. or Master's degree in Computer Science, Machine Learning, AI, or a related field (in progress degrees are acceptable for intern positions).
- A deep understanding of frontier autoregressive and diffusion multimodal models, along with the human and synthetic data strategies needed to optimize them.
- Passion and experience for LLM evaluation and benchmarking.
- Expertise in training data quality construction, measurement and refinement.
- The ability to bridge research and application by interpreting new findings and translating them into functional prototypes.
- A track record of publishing in top-tier AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL) and contributing to the broader research community.
- Proficiency in Python and experience with deep learning frameworks like PyTorch, JAX, or TensorFlow.
- Exceptional communication and collaboration skills.
Applied Research at LabelboxAt Labelbox Applied Research, we're committed to pushing the boundaries of AI and data-centric machine learning, with a particular focus on advancing human-AI interaction techniques. We believe that high-quality human data and sophisticated human feedback integration methods are key to unlocking the next generation of AI capabilities. Our research team works at the intersection of machine learning, human-computer interaction, and AI ethics to develop innovative solutions that can be practically applied in real-world scenarios.