1

Data Annotation Engineer Jobs in Austin, TX (NOW HIRING)

Delivery Lead

Austin, TX · Remote

$110K - $140K/yr

... data creation to annotation to delivery. We design and create datasets from scratch, recruit and ... Partner with Product and Engineering to evolve internal tooling, automation, and operational ...

Collaborate with Coordinators and Leads regarding routing conflicts or missing data. * Execute ... Engineering or equivalent experience. * Strong experience in detailed annotation and drawing ...

next page

Showing results 1-20

Data Annotation Engineer information

See Austin, TX salary details

$51K

$146.1K

$195.2K

How much do data annotation engineer jobs pay per year?

As of Jun 12, 2026, the average yearly pay for data annotation engineer in Austin, TX is $146,130.00, according to ZipRecruiter salary data. Most workers in this role earn between $83,200.00 and $194,200.00 per year, depending on experience, location, and employer.

What are the main challenges faced by Data Annotation Engineers in their daily work?

One of the main challenges Data Annotation Engineers face is ensuring consistent accuracy and quality in labeling large and often complex datasets. Attention to detail is critical, as even small errors can significantly affect machine learning model performance. Additionally, engineers must frequently adapt to evolving annotation guidelines and emerging data types, which requires ongoing learning and flexibility. Collaboration with data scientists and project managers is common to clarify requirements and resolve ambiguities, making strong communication skills essential for success.

What are the key skills and qualifications needed to thrive in the Data Annotation Engineer position, and why are they important?

To thrive as a Data Annotation Engineer, you need a strong background in data analysis, attention to detail, and familiarity with annotation processes, often supported by a degree in computer science or a related field. Proficiency with annotation tools like Labelbox, CVAT, or VIA, and understanding of data formats used in machine learning, is commonly required. Excellent communication, collaboration, and organizational skills help you effectively manage projects and cooperate with cross-functional teams. These abilities are crucial for delivering high-quality labeled data, which directly impacts the performance of AI and machine learning models.

Is data annotation real or fake?

Data annotation is a real and essential process in machine learning where human annotators label data such as images, text, or audio to train AI models. Data annotation engineers perform this work using specialized tools and quality standards to ensure accurate and reliable datasets.

What is a data annotation engineer?

A data annotation engineer is a professional responsible for labeling and annotating data, such as images, text, or videos, to train machine learning models. They often use specialized tools and follow guidelines to ensure data quality and accuracy, supporting AI development and data-driven applications.

How hard is it to get a job with data annotation tech?

Getting a job as a Data Annotation Engineer typically requires basic computer skills, attention to detail, and familiarity with annotation tools or platforms. Entry-level positions are often accessible with minimal formal education, but having knowledge of machine learning concepts or experience with data labeling can improve job prospects.

Does data annotation really pay you?

Data annotation engineers are typically paid for their work, often earning hourly wages or project-based fees depending on the employer or platform. Compensation varies based on experience, skill level, and the complexity of annotation tasks, which may involve using tools like labeling software or AI platforms.

What is a Data Annotation Engineer job?

A Data Annotation Engineer is responsible for labeling and annotating data—such as text, images, audio, or video—to train machine learning models. They ensure that data is accurately categorized and structured to improve model performance. This role often involves using specialized annotation tools, following detailed guidelines, and working closely with data scientists and AI teams. Data Annotation Engineers play a crucial role in the development of AI applications by providing high-quality labeled datasets for supervised learning.

What job categories do people searching Data Annotation Engineer jobs in Austin, TX look for? The top searched job categories for Data Annotation Engineer jobs in Austin, TX are:
What cities near Austin, TX are hiring for Data Annotation Engineer jobs? Cities near Austin, TX with the most Data Annotation Engineer job openings:

Applied Data Scientist, LLM Evaluation

Driver AI Inc.

Austin, TX • Remote

Other

Medical, Dental, Vision, Life, Retirement

Posted 19 days ago


Job description

Applied Data Scientist, LLM Evaluation Introduction

At Driver, we're building systems that turn source code into human language. The tech stack includes a core compiler-like engine, a heavily asynchronous/distributed backend server, and a frontend web application that provides a rich user experience.

About Driver

We're an early-stage startup backed by Y Combinator and Google Ventures that combines first principles technical approaches and applied LLM expertise to tackle context engineering at scale. Driver builds the context layer for employees and AI agents alike to use in developing software.

Working at Driver

Driver is an early-stage but fast-growing startup. As such, we take advantage of that which startups can excel: delivery speed, flexibility, and enjoying working with a small close-knit team.

Organizational and engineering values at Driver include first-principles thinking, correct by construction, writing things down, experimentation and iteration, pragmatism, commitment to effective communication and transparency, autonomy, and ambition.

Job Overview

Title: Applied Data Scientist, LLM Evaluation

Location: Remote or Austin, Tx

Our value is directly tied to the quality of our content at scale. The platform generates technical documentation across a complex, multi-stage pipeline - producing multiple content types at different levels of abstraction, from individual code elements up to high-level summaries. Today, changes to models, context strategies, or pipeline architecture are evaluated largely through manual review and intuition. There is no systematic way to answer: "Did this change make our output better, worse, or the same - and for which languages, repo sizes, and content types?"

This is a hard problem. LLM outputs are non-deterministic - identical inputs produce different outputs across runs, and small variations at early pipeline stages compound into meaningfully different end-user content downstream. Evaluating quality requires methodology that accounts for this: statistical reasoning over multiple runs, understanding of cascade effects through the pipeline, and rubrics that balance human judgment with automated signals.

This role builds the evaluation function from scratch. You'll define what "good" means for our generated content, build the infrastructure to measure it, and create the experimental framework that lets the team ship changes with confidence.

What You'll Do

You'll own the LLM evaluation strategy at Driver - from first principles to production infrastructure. This is a foundational role: you're not joining an existing eval team, you're building it. As the function matures, you'll seed and grow a team around it.

Define quality metrics and build evaluation datasets. Establish what "good" looks like for each content type across the pipeline. Build and curate gold-standard evaluation datasets across languages and repo archetypes (monorepos, microservices, libraries, applications). Design rubrics that capture accuracy, completeness, usefulness, and readability.

Build benchmarking and experimentation infrastructure. Create automated evaluation pipelines that score output against reference datasets. Instrument the content generation pipeline to support A/B comparisons - run the same codebase through two strategies and compare results. Build tooling for LLM-as-judge evaluation and regression detection. Integrate evaluation into CI so pipeline changes come with quality evidence.

Develop automated quality signals at scale. Build quality checks that flag degraded output without requiring human review of every document. Monitor content quality trends over time. Design sampling strategies for human review that maximize signal with minimal annotation effort.

Quantify tradeoffs and inform decisions. Run experiments on model selection, context strategies, and pipeline architecture changes. Quantify cost/quality/latency tradeoffs. Partner with the engineering team to turn evaluation insights into shipped improvements.

Qualifications

Education: Bachelor's, Master's, or PhD in Statistics, Machine Learning, Data Science, Computational Linguistics, or a related quantitative field.

Experience: Minimum 3 - 5 years in applied science, ML engineering, or data science roles with a focus on evaluation, NLP, or generative AI. 7+ years experience preferred.

Required Technical Skills

  • Strong statistical foundations: experimental design, hypothesis testing, confidence intervals, effect sizes, power analysis.
  • Experience designing and running evaluations for LLM or NLP systems - you've thought carefully about what "better" means when outputs are open-ended text.
  • Proficient in Python and the scientific/data stack (pandas, NumPy, scipy, sklearn).
  • Comfortable working in Jupyter notebooks for exploration and prototyping, and turning that work into automated pipelines.
  • Experience with LLM-as-judge approaches, inter-annotator agreement, and rubric design for subjective quality assessment.
  • Familiarity with the practical challenges of non-deterministic systems: variance decomposition, multi-run methodology, distinguishing signal from noise at scale.
  • Strong data storytelling - you can turn experiment results into clear recommendations that drive engineering and product decisions.

Preferred and Nice-to-Have Technical Skills

  • Experience with LLM APIs and prompt engineering across multiple providers.
  • Familiarity with evaluation frameworks (e.g., RAGAS, DeepEval, custom harnesses).
  • Experience building data pipelines or ETL workflows (Airflow, Dagster, or similar).
  • Comfort with SQL and working directly against production data stores.
  • Experience with visualization tools (Matplotlib, Plotly, Streamlit) for building internal dashboards and reports.
  • Background in code understanding, developer tools, or technical documentation.
  • Experience building or managing annotation pipelines and human evaluation workflows.
Benefits
  • Competitive Compensation Packages - Cash & Equity
  • Flexible Work Culture
  • Unlimited Time Off + 12 Paid Company Holidays
  • Insurance - Health, Dental, & Vision
  • Life Insurance & FSA Accounts
  • 401(k) Retirement Accounts - Traditional, Roth, or Both
  • Quarterly Team Offsites

Driver is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.