2

Entry Level Machine Learning Data Annotation Jobs in California

... AI, machine learning, and large-scale data infrastructure Company : Abaka AI provides accurate and efficient AI data services, including data collection, data cleaning, data annotation, and OTS ...

Required : • Bachelor's degree in computer science, machine learning, data science, electrical engineering, or a similar discipline • Proficient in Python • Foundational understanding of ...

... AI, machine learning, and large-scale data infrastructure Company : Abaka AI provides accurate and efficient AI data services, including data collection, data cleaning, data annotation, and OTS ...

Computer Vision AI & ML Engineer

San Mateo, CA · On-site

$127K - $149K/yr

We believe massive scale through data-driven machine learning is the key to unlocking these ... Design labeling strategies and tooling for automated annotation, QA workflows, dataset management ...

next page

Showing results 1-20

Entry Level Machine Learning Data Annotation information

What is the difference between Entry Level Machine Learning Data Annotation vs Entry Level Data Labeling Specialist?

AspectEntry Level Machine Learning Data AnnotationEntry Level Data Labeling Specialist
CredentialsBasic understanding of data annotation tools, no formal certification requiredSimilar; often no formal certification needed
Work EnvironmentRemote or on-site, working with AI teams and datasetsRemote or on-site, focusing on labeling data for AI/ML projects
Industry UsagePrimarily in AI, machine learning, and data science companiesUsed across tech, automotive, healthcare, and other industries
Search & Comparison IntentCommonly compared for entry-level roles in AI data prepOften compared as a similar entry-level data labeling role

Both roles involve preparing data for machine learning models, with similar entry-level requirements. The main difference lies in terminology and specific job focus, but they often overlap in skills and work environment.

What are the most commonly searched types of Machine Learning Data Annotation jobs in California? The most popular types of Machine Learning Data Annotation jobs in California are:
What are popular job titles related to Entry Level Machine Learning Data Annotation jobs in California? For Entry Level Machine Learning Data Annotation jobs in California, the most frequently searched job titles are:
What job categories do people searching Entry Level Machine Learning Data Annotation jobs in California look for? The top searched job categories for Entry Level Machine Learning Data Annotation jobs in California are:
What cities in California are hiring for Entry Level Machine Learning Data Annotation jobs? Cities in California with the most Entry Level Machine Learning Data Annotation job openings:
Infographic showing various Entry Level Machine Learning Data Annotation job openings in California as of June 2026, with employment types broken down into 95% Full Time, 4% Part Time, and 1% Contract. Highlights an 97% Physical, 1% Hybrid, and 2% Remote job distribution.

Machine Learning Engineer, Data Quality

Rime Labs

San Francisco, CA

$134K - $162K/yr

Other

Posted 13 days ago


Job description

Machine Learning Engineer, Data Quality
Rime builds voice AI for enterprises running customer experiences at scale. Our text-to-speech models are purpose-built for high-volume conversational deployments, engineered for the pronunciation accuracy, latency, and deployment flexibility that production environments actually demand.
We started from a different premise than the rest of the field: voice AI isn't bottlenecked by model architecture. It's bottlenecked by data. So before we trained a single model, we built our own corpus: full-duplex, studio-quality conversational speech, recorded and annotated by PhD linguists. That's our moat. It's also why enterprises pick Rime when pilots need to convert into production.
We're backed by top-tier investors including Unusual Ventures, and we've built a team at the intersection of product, research, and craft. Building voice models is an art. We intend to master it. The path is the craft itself: the loop between theory and practice - the shared mental model of how things should behave, met by the reality that doesn't quite conform, sharpened by the meeting.
Role Overview
We're hiring a Machine Learning Engineer, Data Quality to own the operational data pipeline that produces our training corpus end-to-end - and to bring a vision for where it should go next. We take that seriously: if you can plan an overhaul, justify it, and orchestrate the human and machine migration work, we'll do it together.
This is a sociotechnical role. You'll be in the loop on everything and talking to everyone that touches the data across 42+ languages: 50+ annotators, 32+ external vendors and an in-house recording studio, and the systems behind them - ingestion, quality assurance, pre-processing, cataloging, export to training. At any given moment, dozens of deliverables are in flight, each on its own clock.
The people who thrive here want to listen to the audio clips and design the system that scales their judgment to the next million. You don't need deep expertise across the whole stack on day one - you need the judgment to know what good looks like at each stage, and the engineering depth to build (or learn to build) the parts that need building.
What You'll Own
  • Linguist- and annotation-team-facing tooling: annotation UI, PM workflow for project management, QC dashboards. This is the surface the frontline uses every day.
  • Vendor data QA workflows: A large share of incoming data arrives from vendors in various states and needs to pass QA before it can be trusted. The tooling, routing, and tracking for that work is yours.
  • Quality systems across the network: The signals, dashboards, and review loops that surface when a corner of the network is drifting - a vendor's transcripts getting sloppy, an annotator's IAA slipping, a language's gold set going stale - before it lands in the training pool.
  • End-to-end audio annotation pipeline: Currently some stages exist as prototypes; productionizing and rebuilding them is work that's currently in flight.
  • Dataset versioning and experimenter tooling: the model team will want to subset the vetted pool ("speakers X/Y/Z, duration 3-12s, quality > 0.8") into reproducible training manifests. The query interface, manifest format, and lineage tracking are all yours.
  • Pipelines for full- and half-duplex training data
What We're Looking For
  • Instinct for data quality. You can tell good data from bad. You know what "bad" looks like in this specific domain - not just generic "anomalies," but the particular ways audio and transcripts go wrong.
  • Willing to look at the data. Open the file. Listen to the clip. Read the transcript. You don't outsource the first-pass checks to a script.
  • Opinionated, and curious when challenged. You arrive with a perspective informed by what you've seen work and what you've seen fail - and you're equally interested in pressure-testing it. A "what about..." question isn't a threat; it's where the work happens.
  • Project sense. You can hold a lot of moving parts in your head - what's in flight, what's blocked, what's about to slip - and keep the picture clear enough that others can step into it.
  • Designs, doesn't just execute. You want to take on more design responsibility over time, not less. You're looking for a role where you (co-)own things end-to-end, not one where someone hands you tasks to implement.
  • Comfort being out of your depth at the boundary. You'll sometimes debug code you didn't write in tools you don't use daily. You should find this energizing, not threatening.
  • Solid software and data engineering fundamentals. Python, schemas you can reason about, production data pipelines you've built and operated on cloud-native infrastructure.
Nice to have - in rough order from hardest-to-acquire to most learnable:
  • Audio pipeline tooling: ffmpeg, Silero VAD, faster-whisper, neural audio codecs (Encodec, SNAC, SoundStream).
  • TTS frontend work: G2P (phonemizer, g2p-en), text normalization (NeMo TN or equivalent), prosody and phoneme alignment.
  • Annotation platforms: Label Studio, Argilla, or equivalent - particularly customizing or replacing them.
  • Direct experience with our stack: GCP (Cloud Run, Cloud Batch, GCS, Pub/Sub), Supabase / Postgres. AWS or Azure experience maps fine.
Why Join Rime
  • Build the data infrastructure behind a category-defining voice AI company.
  • The pipelines you build determine what models we can train.
  • Meaningful equity upside.
  • High ownership, high standards, low bureaucracy.
What We Offer
  • Competitive base + meaningful early-stage equity
  • Remote-friendly
  • Visa sponsorship available
  • Access to a proprietary, full-duplex, studio-quality conversational speech corpus
  • Compute and tooling to do the work
  • Direct influence on the future of voice AI
At Rime, we...
  • Are outliers
  • Cut through the hype to focus on the craft
  • Move fast with agency and freedom
  • Maintain a growth mindset, finding joy in the struggle
  • Do the right things, knowing that it'll lead to making money

If that sounds like you too, you'll be a great fit for Rime!