This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that ...
This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that ...
Annotation Data Scientist, Evaluation Integrity (Siri)
Cambridge, MA · On-site
$154K - $274K/yr
This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that ...
Annotation Data Scientist, Evaluation Integrity (Siri)
Cambridge, MA · On-site
$154K - $274K/yr
This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that ...
Data Annotation Technician
Kirkland, WA · On-site
Data Annotation Technician Join Q Analysts and become part of a world-class organization. Q ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
Data Annotation Technician
Kirkland, WA · On-site
Data Annotation Technician Join Q Analysts and become part of a world-class organization. Q ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
Senior Frontend Engineer, Annotation Tools
Sunnyvale, CA · On-site
$143K - $197K/yr
... strong UX judgment when they don't. You'll help shape frontend systems that scale gracefully ... Expertise building annotation, labeling, or complex data-visualization tools.Experience integrating ...
Senior Frontend Engineer, Annotation Tools
Sunnyvale, CA · On-site
$143K - $197K/yr
... strong UX judgment when they don't. You'll help shape frontend systems that scale gracefully ... Expertise building annotation, labeling, or complex data-visualization tools.Experience integrating ...
This person will work closely with ML Engineers to manage and analyze our human and automated data annotation processes, and to develop, test, and refine LLM judges for generative AI model evaluation.
This person will work closely with ML Engineers to manage and analyze our human and automated data annotation processes, and to develop, test, and refine LLM judges for generative AI model evaluation.
Remote | Network Data Annotation & Infrastructure Specialist -- $55-$80/hour
New York, NY · Remote
$55 - $80/hr
Network Data Review & Annotation * Review network-related data such as logs, configurations ... Maintain accuracy, consistency, and professional judgment across submitted work Ideal Profile ...
Quick apply
Remote | Network Data Annotation & Infrastructure Specialist -- $55-$80/hour
New York, NY · Remote
$55 - $80/hr
Network Data Review & Annotation * Review network-related data such as logs, configurations ... Maintain accuracy, consistency, and professional judgment across submitted work Ideal Profile ...
Senior Frontend Engineer, Annotation Tools
$181K - $318K/yr
... strong UX judgment when they don't. You'll help shape frontend systems that scale gracefully ... Expertise building annotation, labeling, or complex data-visualization tools. Experience ...
Senior Frontend Engineer, Annotation Tools
$181K - $318K/yr
... strong UX judgment when they don't. You'll help shape frontend systems that scale gracefully ... Expertise building annotation, labeling, or complex data-visualization tools. Experience ...
Data Annotation Technician
Kirkland, WA · On-site +1
Q Analysts is looking for Data Annotation Technicians to support Ground Truth Data Collection ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
Data Annotation Technician
Kirkland, WA · On-site +1
Q Analysts is looking for Data Annotation Technicians to support Ground Truth Data Collection ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
Data Scientist - Survey Design, Data Annotation, and Machine Learning Evaluation
Cupertino, CA · On-site
... Large Language Model judges. We are lookingfor a skilled Data Scientist to join our Machine ... A successful candidate is experienced in survey design, data annotation, LLMprompt engineering and ...
Data Scientist - Survey Design, Data Annotation, and Machine Learning Evaluation
Cupertino, CA · On-site
... Large Language Model judges. We are lookingfor a skilled Data Scientist to join our Machine ... A successful candidate is experienced in survey design, data annotation, LLMprompt engineering and ...
Q Analysts is looking for Data Annotation Technicians to support Ground Truth Data Collection ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
Q Analysts is looking for Data Annotation Technicians to support Ground Truth Data Collection ... Analyze content and use best judgement for proper answers * Meet daily, weekly and monthly velocity ...
This role leads two tightly coupled but distinct capabilities: (1) governing canonical annotation standards and judgment frameworks, and (2) applying those standards at scale through operational ...
This role leads two tightly coupled but distinct capabilities: (1) governing canonical annotation standards and judgment frameworks, and (2) applying those standards at scale through operational ...
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Quick apply
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Quick apply
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Quick apply
Apply annotation guidelines consistently while exercising independent judgment on ambiguous or edge-case scenarios. * Identify and flag unsafe, incomplete, or anomalous driving behaviors (e.g ...
Remote | Biology Research Data Review Scientist -- $40-$60/hour
New York, NY · On-site +1
$40 - $60/hr
Apply consistent annotation standards across database records and supporting source materials * Maintain accuracy, clarity, and professional judgment across submitted review work Ideal Profile Strong ...
Quick apply
Remote | Biology Research Data Review Scientist -- $40-$60/hour
New York, NY · On-site +1
$40 - $60/hr
Apply consistent annotation standards across database records and supporting source materials * Maintain accuracy, clarity, and professional judgment across submitted review work Ideal Profile Strong ...
Remote Video Annotators
Hampton, VA · Remote
Apply Advanced Annotation Capability by using the following skills: Use your experience with ... Judgment by using the following skills: Strong understanding of traffic rules and right-of-way ...
Quick apply
Remote Video Annotators
Hampton, VA · Remote
Apply Advanced Annotation Capability by using the following skills: Use your experience with ... Judgment by using the following skills: Strong understanding of traffic rules and right-of-way ...
Conduct detailed data annotation and quality assurance of natural language datasets following ... These tools assist our recruitment team but do not replace human judgment. Final hiring decisions ...
Conduct detailed data annotation and quality assurance of natural language datasets following ... These tools assist our recruitment team but do not replace human judgment. Final hiring decisions ...
Gain familiarity with existing literature on data annotation and LLM as judge * Understand NIST's role and ongoing efforts in assessing and measuring the validity and reliability of AI-related risks ...
Gain familiarity with existing literature on data annotation and LLM as judge * Understand NIST's role and ongoing efforts in assessing and measuring the validity and reliability of AI-related risks ...
... systems, and annotation and/or study participant guidelines.","responsibilities":"Taxonomy ... Automated Judge Development: Shape the development, training and fine-tuning, and validation of ...
... systems, and annotation and/or study participant guidelines.","responsibilities":"Taxonomy ... Automated Judge Development: Shape the development, training and fine-tuning, and validation of ...
This role focuses on annotation, data quality, prompt evaluation, and the creation of high-quality ... These tools assist our recruitment team but do not replace human judgment. Final hiring decisions ...
This role focuses on annotation, data quality, prompt evaluation, and the creation of high-quality ... These tools assist our recruitment team but do not replace human judgment. Final hiring decisions ...
Annotation Judge information
What is an Annotation Judge?
What are the key skills and qualifications needed to thrive as an Annotation Judge, and why are they important?
What is the difference between Annotation Judge vs Data Annotator?
| Aspect | Annotation Judge | Data Annotator |
|---|---|---|
| Credentials | Typically requires basic education, sometimes certification in data labeling | Usually requires similar or less formal education, often on-the-job training |
| Work Environment | Office or remote, working with data labeling platforms | Office or remote, performing data labeling tasks |
| Industry Usage | Used across AI, machine learning, and data science projects | Common in AI, machine learning, and data preparation workflows |
| Search & Comparison Intent | Often compared for roles involving data review and quality control | Compared for entry-level data labeling roles |
The main difference between an Annotation Judge and a Data Annotator lies in their roles. Annotation Judges typically review and validate annotations made by Data Annotators, ensuring quality and accuracy. Data Annotators perform the initial labeling of data. Both roles are essential in AI data pipelines, with Annotation Judges focusing on quality control and Data Annotators on data preparation.
What are some common challenges faced by Annotation Judges, and how can they effectively overcome them?

Full-time
Posted 2 days ago
Apple rating
8.1
Based on 661 frontline employees who took The Breakroom Quiz
6th of 30 rated technology retailers
Job description
Play a part in the ongoing revolution in human-computer interaction. Siri is evolving - and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release.Within the Siri evaluation organization, the Human Evaluation sub-team is responsible for answering the question: can we trust our evals? We do that by designing human-in-the-loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation - the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre-ship model and product decisions.As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent-to-agent conversations, and the reliability of LLM-as-judge and rule-based evaluators against Siri's product specifications. You will own annotation initiatives end-to-end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams.
Bachelor's or Master's degree in a quantitative or related field such as Data Science, Computer Science, Linguistics, Statistics, or Cognitive Science, or equivalent job-related experience.5+ years of hands-on experience working with human-annotated datasets or human-in-the-loop evaluation methodologies for machine learning, natural language processing, or large language model systems.5+ years of experience using Python for data processing, analysis, and prototyping, including experience with libraries such as pandas, Jupyter, and at least one data visualization library.Experience designing, implementing, and communicating annotation schemas, rubrics, or ontologies for machine learning training or evaluation data.Experience managing multiple concurrent dataset curation efforts, including scoping work, iterating on guidelines, coordinating with in-house or vendor annotators, and monitoring annotator performance metrics such as accuracy, throughput, and inter-annotator agreement.Experience specifying or designing custom annotation tooling in collaboration with software engineers.
Experience evaluating LLM-powered or agentic systems, including familiarity with LLM-as-judge methodologies, rubric-based grading, or trajectory and tool-call evaluation.Familiarity with statistical methods that address accuracy and variability in human annotation data, such as inter-annotator agreement, Cohen's or Fleiss' kappa, Krippendorff's alpha, or bootstrapping.Data-querying experience with SQL, Spark, or similar, and comfort working with large, complex, real-world datasets.Experience building pre-ship evaluation pipelines for conversational or assistant products.Experience with prompt engineering, or with designing simulated user personae for agent evaluation.Experience running annotation programs across multiple locales or at large scale.Excellent written and verbal communication skills, with the ability to explain technical topics clearly to data scientists, engineers, annotators, and cross-functional partners.Proven ability to collaborate effectively across functions and drive projects of varying sizes and scopes - knowing when to dive deep and when to delegate.
About Apple
Sourced by ZipRecruiter
Imagine what you could do here! At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Dynamic, intelligent people and inspiring, innovative technologies are the norm here. The people who work here have reinvented entire industries with all Apple Hardware products. The same real passion for innovation that goes into our products also applies to our practices strengthening our dedication to leave the world better than we found it.
Industry
Computer and electronic product manufacturing
Company size
10,000+ Employees
Headquarters location
Cupertino, CA, US
Year founded
1976