Mechanistic Interpretability Jobs (NOW HIRING)

AI Foundations - Research Software Engineer

Cambridge, MA · On-site

$225K/yr

Preferred : • Experience with or a strong understanding of mechanistic interpretability for AI models. • Proficiency in systems programming languages, particularly Rust or C++. • Proven ...

AI Foundations - Research Software Engineer

Cambridge, MA · On-site

$225K/yr

Preferred : • Experience with or a strong understanding of mechanistic interpretability for AI models. • Proficiency in systems programming languages, particularly Rust or C++. • Proven ...

CTGT

Machine Learning Engineer: LLM Interpretability & Systems

San Francisco, CA · On-site

$175K - $250K/yr

Take ideas from mechanistic interpretability and related work and turn them into code that runs in production, making research into reality. * Work directly with model internals to improve behavior ...

CTGT

Machine Learning Engineer: LLM Interpretability & Systems

San Francisco, CA · On-site

$175K - $250K/yr

Research Engineer, Interpretability

San Francisco, CA · On-site +1

How can we trust them?" The Interpretability team at Anthropic is working to reverse-engineer how trained models work because we believe that a mechanistic understanding is the most robust way to ...

Research Engineer, Interpretability

San Francisco, CA · On-site +1

Research Engineer, Interpretability

San Francisco, CA · On-site

Research Engineer, Interpretability

San Francisco, CA · On-site

Postdoctoral Researcher at Polymathic AI

New York, NY · On-site

... as mechanistic interpretability of scientific foundation models. The rush to build foundation models has led to the development of large machine learning models in Astrophysics, fluid dynamics ...

Postdoctoral Researcher at Polymathic AI

New York, NY · On-site

AI Foundations - Research Software Engineer

Cambridge, MA · On-site

$224K/yr

Experience with or a strong understanding of mechanistic interpretability for AI models. * Proficiency in systems programming languages, particularly Rust or C++. * Proven experience designing ...

AI Foundations - Research Software Engineer

Cambridge, MA · On-site

$224K/yr

Umd

Postdoctoral Associate - AI Security

College Park, MD · Hybrid

Experience with mechanistic interpretability and/or alternative approaches to understanding model internals (e.g., activation analysis, circuit-level reasoning, representation learning). Background ...

Umd

Postdoctoral Associate - AI Security

College Park, MD · Hybrid

Field Team - Member of Technical Staff

San Francisco, CA · On-site

Familiarity with interpretability, mechanistic interpretability, or model internals (sparse autoencoders, feature steering, etc.). Our values Goodfire is looking for individuals who embody our values ...

Field Team - Member of Technical Staff

San Francisco, CA · On-site

University of Maryland

Postdoctoral Associate - AI Security

College Park, MD · On-site

... mechanistic interpretability. Key Responsibilities • Conduct original research in AI security, including adversarial machine learning, model robustness, and secure AI system design. • Develop and ...

University of Maryland

Postdoctoral Associate - AI Security

College Park, MD · On-site

Field Team - Member of Technical Staff

San Francisco, CA

$200K - $325K/yr

Field Team - Member of Technical Staff

San Francisco, CA

$200K - $325K/yr

Research Scientist (Prof Sedoc)

New York, NY · On-site

$61K - $102K/yr

Hands-on experience with LLM internals: activations, representations, fine-tuning, and/or mechanistic interpretability * Strong Python and PyTorch skills; comfortable working with large model weights

Research Scientist (Prof Sedoc)

New York, NY · On-site

$61K - $102K/yr

AI Engineer / Research Scientist (Senior, Staff), Explainable AI

Austin, TX

Experience in explainable and interpretable AI, such as feature attribution methods like LIME and SHAP, example- or influence-based attribution, or mechanistic interpretability. * Track record of ...

AI Engineer / Research Scientist (Senior, Staff), Explainable AI

Austin, TX

FAR.AI

$170K - $270K/yr

Mechanistic Interpretability: finding issues with Sparse Autoencoders, probing deception using AmongUs, understanding learned planning in SokoBan, and interpretable data attribution. Red-teaming ...

FAR.AI

$170K - $270K/yr

AI Engineer / Research Scientist (Senior, Staff), Explainable AI

Austin, TX · On-site

AI Engineer / Research Scientist (Senior, Staff), Explainable AI

Austin, TX · On-site

AI Strategy Leader, R&D

Mountain View, CA · On-site

In this pivotal role, you will translate cutting-edge AI trends-such as agentic and multi-agent systems, advanced reasoning frameworks, embodied/physical AI, mechanistic interpretability, efficient ...

AI Strategy Leader, R&D

Mountain View, CA · On-site

Senior Program Scientist, AI and Advanced Computing Institute

Manhattan, NY · On-site

$100K - $137K/yr

... in mechanistic interpretability or auditing methods • Expertise in multi-agent systems and emergent behavior • Expertise in AI-accelerated simulation frameworks • Familiarity with how AI is ...

Senior Program Scientist, AI and Advanced Computing Institute

Manhattan, NY · On-site

$100K - $137K/yr

Senior Program Scientist, AI and Advanced Computing Institute

Manhattan, NY · On-site

$100K - $137K/yr

AI model evaluation and red-teaming for scientific reliability, Mechanistic interpretability or auditing methods, Multi-agent systems and emergent behavior, AI-accelerated simulation frameworks • ...

Senior Program Scientist, AI and Advanced Computing Institute

Manhattan, NY · On-site

$100K - $137K/yr

AI Strategy Leader, R&D

Mountain View, CA · On-site