Data Scientist, Model Development
About the Company
Kadence is partnered with an AI native Healthtech company in the Bay Area, that is building the infrastructure behind modern healthcare operations, turning fragmented healthcare data into reliable, operational systems that power automation, analytics, billing, and customer deployments at scale.
Medical coding is unforgiving: a model error isn't just a bad prediction, it can become a billing or compliance issue. We need someone who can systematically raise model quality through rigorous data science, evaluation, prompt engineering, and domain learning.
About the Role
We're hiring a Data Scientist, Model Development for the company, to build and improve specialty-specific medical coding agents, working closely with engineering, clinical coding experts, auditors, and customers to turn model failures into measurable improvements.
This role blends applied data science, LLM agent development, evaluation design, and healthcare data work - building the feedback loop that turns audited decisions, clinical edge cases, and production failures into stronger agents. We need someone who reasons carefully about data quality, evaluation design, and real deployment risk, not just someone who can train a model.
What You'll Do
- Model development: Build and iterate specialty-specific coding agents; turn expert feedback into prompt, evaluation, and logic improvements; run structured improvement cycles with regression checks; define readiness criteria for customer-facing models.
- Evaluation & quality systems: Design gold-standard datasets and evaluation harnesses; build regression testing so changes don't silently break working behavior; develop metrics meaningful to engineers and clinicians; build confidence/uncertainty scoring.
- Data & pipelines: Curate datasets for training, eval, and customer delivery; support data lineage and auditability; partner with engineering on schemas; support customer pilots and audits.
- LLM & agent development: Design and iterate prompts for clinical evidence extraction and coding recommendations; build agent scaffolding (prompt chains, eval loops); work toward model-agnostic evaluation across foundation models.
- Clinical collaboration: Work directly with certified medical coders to understand coding rules and edge cases; translate clinical feedback into model-development tasks; help non-technical stakeholders understand model behavior and tradeoffs.
What We're Looking For
Required
- 2+ years in data science, applied ML, ML engineering, or LLM-based product development
- Strong Python; experience with messy, real-world datasets
- Experience building evaluation frameworks, benchmarks, or regression/quality systems
- Solid statistical reasoning (precision/recall, confidence intervals, error analysis)
- Experience with LLMs, prompt engineering, or agentic systems
- Ability to debug model failures through deep example review and structured iteration
- Strong communication with non-technical domain experts
- Comfort with ambiguity in an early-stage startup environment
Preferred
- Healthcare, revenue cycle, medical coding, claims, EHR, or clinical NLP experience
- Familiarity with CPT, ICD-10, payer rules, NCCI edits, or CMS/Medicare rules
- Human-in-the-loop ML, labeling workflows, or expert feedback loops
- Document AI, OCR, PDF parsing, or clinical note processing
- Experience with Claude, OpenAI, or other frontier LLM APIs
- Data versioning, experiment tracking, or ML observability
- Familiarity with HIPAA, SOC 2, or PHI handling
What Success Looks Like
- 30 days: Understand current agents, audit workflows, datasets, and major failure modes.
- 60 days: Own model improvement workstreams; build structured evaluation loops; translate auditor feedback into measurable gains.
- 90 days: Help establish a repeatable model development system- curated datasets, regression checks, error analysis, confidence metrics.
Location
Bay Area, CA - local candidates strongly preferred.
Compensation
$140,000 - $180,000 base, before equity.