Mechanistic Interpretability Jobs (NOW HIRING)

Research Fellowship - Mechanistic Interpretability

We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers. This 3 to 6 ...

Vmax

Research Fellowship - Mechanistic Interpretability

San Francisco, CA · On-site

Vmax

Research Fellowship - Mechanistic Interpretability

San Francisco, CA

Vmax

Research Fellowship - Mechanistic Interpretability

San Francisco, CA

Radical Numerics, Inc

Member of Technical Staff, Mechanistic Interpretability

San Francisco, CA · On-site

About the Role As a Member of Technical Staff, Mechanistic Interpretability at Radical Numerics, you will study how multimodal genome language models represent, process, and reason about information ...

Radical Numerics, Inc

Member of Technical Staff, Mechanistic Interpretability

San Francisco, CA · On-site

Vmax

Member of Technical Staff - Mechanistic Interpretability

San Francisco, CA · On-site

$300K - $500K/yr

We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers.

Vmax

Member of Technical Staff - Mechanistic Interpretability

San Francisco, CA · On-site

$300K - $500K/yr

We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers.

Vmax

Member of Technical Staff - Mechanistic Interpretability

San Francisco, CA

$300K - $500K/yr

We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers.

Vmax

Member of Technical Staff - Mechanistic Interpretability

San Francisco, CA

$300K - $500K/yr

We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers.

OpenAI

Researcher, Interpretability

San Francisco, CA · On-site

They are seeking a researcher passionate about understanding deep networks, who will develop and carry out a research plan in mechanistic interpretability, collaborating closely with a motivated team ...

OpenAI

Researcher, Interpretability

San Francisco, CA · On-site

Anthropic

Research Scientist, Interpretability

San Francisco, CA · On-site +1

We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do ...

Anthropic

Research Scientist, Interpretability

San Francisco, CA · On-site +1

We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do ...

Anthropic

Research Scientist, Interpretability

San Francisco, CA · On-site

We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do ...

Anthropic

Research Scientist, Interpretability

San Francisco, CA · On-site

We're focused on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as trying to do ...

OpenAI

Researcher, Interpretability

San Francisco, CA · On-site

$295K - $445K/yr

You will develop and carry out a research plan in mechanistic interpretability, in close collaboration with a highly motivated team. You will play a critical role in helping OpenAI ensure future ...

OpenAI

Researcher, Interpretability

San Francisco, CA · On-site

$295K - $445K/yr

Anthropic

[Expression of Interest] Research Manager, Interpretability

San Francisco, CA · On-site

We believe that a mechanistic understanding is the most robust way to make advanced systems safe. People mean many different things by "interpretability". We're focused on mechanistic ...

Anthropic

[Expression of Interest] Research Manager, Interpretability

San Francisco, CA · On-site

We believe that a mechanistic understanding is the most robust way to make advanced systems safe. People mean many different things by "interpretability". We're focused on mechanistic ...

Anthropic

[Expression of Interest] Research Manager, Interpretability

San Francisco, CA · On-site

We believe that a mechanistic understanding is the most robust way to make advanced systems safe. People mean many different things by "interpretability". We're focused on mechanistic ...

Anthropic

[Expression of Interest] Research Manager, Interpretability

San Francisco, CA · On-site

We believe that a mechanistic understanding is the most robust way to make advanced systems safe. People mean many different things by "interpretability". We're focused on mechanistic ...

HHM Talent

Machine Learning Researcher

San Francisco, CA · On-site

$150K - $300K/yr

Position Overview The Machine Learning Researcher will own cutting-edge research in mechanistic interpretability, context compression, and transformer training. Researchers drive projects end-to-end ...

Quick apply

HHM Talent

Machine Learning Researcher

San Francisco, CA · On-site

$150K - $300K/yr

HHM CPAs

Machine Learning Researcher

San Francisco, CA · On-site

$150K - $300K/yr

HHM CPAs

Machine Learning Researcher

San Francisco, CA · On-site

$150K - $300K/yr

HHM CPAs

Machine Learning Researcher

San Francisco, CA

$150K - $300K/yr

HHM CPAs

Machine Learning Researcher

San Francisco, CA

$150K - $300K/yr

David Joseph & Company

ML Researcher

San Francisco, CA

$150K - $300K/yr

Strong ML fundamentals -- transformers, mechanistic interpretability, LLM research * A high-agency, self-directed, experiment-driven researcher (not RAG- or chatbot-only) * A spiky profile ...

Quick apply

David Joseph & Company

ML Researcher

San Francisco, CA

$150K - $300K/yr

Strong ML fundamentals -- transformers, mechanistic interpretability, LLM research * A high-agency, self-directed, experiment-driven researcher (not RAG- or chatbot-only) * A spiky profile ...

Output, Inc

Member of the Technical Staff, Interpretability

New York, NY · On-site

$150K - $350K/yr

You have a strong publication record at top-tier venues (e.g., NeurIPS, ICML, ICLR) with contributions to mechanistic interpretability, representation analysis, probing methods, or model ...

Output, Inc

Member of the Technical Staff, Interpretability

New York, NY · On-site

$150K - $350K/yr

You have a strong publication record at top-tier venues (e.g., NeurIPS, ICML, ICLR) with contributions to mechanistic interpretability, representation analysis, probing methods, or model ...

Output Biosciences

Member of the Technical Staff, Interpretability

New York, NY · On-site

$150K - $350K/yr

You have a strong publication record at top-tier venues (e.g., NeurIPS, ICML, ICLR) with contributions to mechanistic interpretability, representation analysis, probing methods, or model ...

Quick apply

Output Biosciences

Member of the Technical Staff, Interpretability

New York, NY · On-site

$150K - $350K/yr

You have a strong publication record at top-tier venues (e.g., NeurIPS, ICML, ICLR) with contributions to mechanistic interpretability, representation analysis, probing methods, or model ...

Scale AI

Research Scientist, Safety Post Training

Manhattan, NY · On-site

Preferred : • Experience with mechanistic interpretability, probing, or other techniques for understanding model internals. • Familiarity with red-teaming or adversarial evaluation of post ...

Scale AI

Research Scientist, Safety Post Training

Manhattan, NY · On-site

Goodfire

Field Team - Member of Technical Staff

San Francisco, CA · On-site

... interpretability, mechanistic interpretability, or model internals (sparse autoencoders, feature steering, etc.). Company : Goodfire is an AI research lab using interpretability to turn AI into ...

Goodfire

Field Team - Member of Technical Staff

San Francisco, CA · On-site

Scale AI

Research Scientist, Safety Post Training

Manhattan, NY · On-site

Scale AI

Research Scientist, Safety Post Training

Manhattan, NY · On-site

Showing results 1-20

Mechanistic Interpretability Jobs

Mechanistic Interpretability information

See salary details

$31K

$36.3K

$50.5K

How much do mechanistic interpretability jobs pay per year?

As of Jul 28, 2026, the average yearly pay for mechanistic interpretability in the United States is $36,260.00, according to ZipRecruiter salary data. Most workers in this role earn between $33,500.00 and $34,000.00 per year, depending on experience, location, and employer.

How to become mech interp researcher?

To become a mechanistic interpretability researcher, typically a strong background in machine learning, deep learning, and programming (e.g., Python) is required. Gaining expertise through advanced degrees such as a master's or Ph.D. in computer science, neuroscience, or related fields, along with experience in analyzing neural networks and using interpretability tools, is essential for this role.

What is the difference between Mechanistic Interpretability vs Data Scientist?

Aspect	Mechanistic Interpretability	Data Scientist
Required credentials	Advanced degrees in AI, ML, or related fields	Degree in Data Science, Statistics, or Computer Science
Work environment	Research labs, AI development teams	Business, tech companies, consulting firms
Industry usage	AI research, model transparency, safety	Data analysis, predictive modeling, insights
Search intent	Understanding model internals, interpretability techniques	Data analysis, insights, model building

Mechanistic Interpretability focuses on understanding how AI models work internally, often requiring deep technical expertise. Data Scientists analyze data to build models and extract insights. While both roles involve data and algorithms, Mechanistic Interpretability is more research-oriented, emphasizing transparency and safety of AI systems, whereas Data Scientists focus on practical data analysis and modeling for business applications.

Is ML a high paying job?

Mechanistic interpretability is a specialized area within machine learning that often requires advanced skills in deep learning, programming, and mathematics. Salaries for machine learning roles vary widely depending on experience, location, and industry, but generally, ML jobs tend to be well-compensated compared to many other tech roles, especially at senior levels or in research positions. Entry-level positions may offer lower salaries, but experienced professionals in this field can earn high six-figure incomes or more.

Which 5 jobs will survive AI?

Mechanistic interpretability is a specialized field within AI research focused on understanding how models work. Jobs in AI safety, research, and development that require deep technical expertise and critical thinking are likely to persist, as they involve tasks that are difficult to automate. Roles emphasizing creativity, complex problem-solving, and human judgment, such as AI ethicists or interdisciplinary researchers, are also expected to remain relevant.

How does mechanistic interpretability work?

Mechanistic interpretability involves analyzing neural networks by examining their internal components, such as neurons and weights, to understand how they process information. It often requires techniques like feature visualization, circuit analysis, and the use of specialized tools to trace decision pathways, helping researchers identify how specific features influence model outputs.

More about Mechanistic Interpretability jobs

The 10 Top Types Of Mechanistic Interpretability Jobs

What cities are hiring for Mechanistic Interpretability jobs? Cities with the most Mechanistic Interpretability job openings:

What states have the most Mechanistic Interpretability jobs? States with the most job openings for Mechanistic Interpretability jobs include:

What job categories do people searching Mechanistic Interpretability jobs look for? The top searched job categories for Mechanistic Interpretability jobs are:

Mechanistic Interpretability jobs near you

Infographic showing various Mechanistic Interpretability job openings in the United States as of July 2026, with employment types broken down into 100% Full Time. Highlights an 100% In-person job distribution, with an average salary of $36,260 per year, or $17.4 per hour.

Research Fellowship - Mechanistic Interpretability

Vmax

San Francisco, CA • On-site

Apply

Full-time

Posted 9 days ago

Job description

About Vmax
Vmax is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimizing beyond the local maxima of learning from human expertise.
About the role
LLMs are fantastically powerful and there is a rapidly growing corpus of work devoted to understanding their internal representations and computations. We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers.
This 3 to 6 month fellowship is for PhD students or equivalent early-career researchers who want to work at the intersection of mechanistic interpretability and reinforcement learning. You will own a focused research project, work closely with Vmax technical staff, and contribute to research publications.
Responsibilities

Develop mechanistic interpretability methods for understanding internal representations, features, circuits, and computations in language models and agents.
Investigate how model internals can be used to generate intrinsic rewards, auxiliary objectives, diagnostics, or training signals for reinforcement learning.
Design and run experiments that test whether interpretability-derived signals improve learning, exploration, generalization, robustness, or sample efficiency.
Compare internally derived rewards against baselines such as human-generated verifiers, reward models, task-level outcome rewards, and standard intrinsic motivation methods.
Use techniques such as probing, activation analysis, sparse autoencoders, causal interventions, feature attribution, or representation analysis to study model behavior.
Analyze failure modes, including reward hacking, spurious features, non-causal correlations, objective misspecification, and overfitting to narrow evaluation distributions.
Build research code, evaluation harnesses, and experimental infrastructure that make results reproducible and useful to the broader team.
Communicate research progress clearly through written updates, internal presentations, and final project outputs.

Role Requirements

Currently enrolled in a PhD program in machine learning, computer science, artificial intelligence, computational neuroscience, mathematics, or a related technical field. Exceptional candidates with equivalent research experience may also be considered.
Track record of research excellence or strong research promise, demonstrated through publications, preprints, open-source work, technical projects, competitions, or publicly available artifacts.
Working understanding of reinforcement learning.
Familiarity with mechanistic interpretability, representation analysis, or empirical methods for understanding neural networks.
Strong programming ability in Python and experience with at least one major ML framework such as PyTorch or JAX.
Clear written and verbal communication of technical ideas.

Nice to have

Experience with LLM post-training methods
Familiarity with intrinsic motivation, unsupervised RL, auxiliary objectives, representation learning for RL, or curiosity-driven learning.
Experience with scalable ML experimentation, distributed training, experiment tracking, or reproducible research infrastructure.
Interest in turning mechanistic understanding into practical training methods, rather than only analyzing models after training.

Role specific location policy

This role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement

Apply

Mechanistic Interpretability Jobs (NOW HIRING)

Research Fellowship - Mechanistic Interpretability

Research Fellowship - Mechanistic Interpretability

Research Fellowship - Mechanistic Interpretability

Research Fellowship - Mechanistic Interpretability

Member of Technical Staff, Mechanistic Interpretability

Member of Technical Staff, Mechanistic Interpretability

Member of Technical Staff - Mechanistic Interpretability

Member of Technical Staff - Mechanistic Interpretability

Member of Technical Staff - Mechanistic Interpretability

Member of Technical Staff - Mechanistic Interpretability

Researcher, Interpretability

Researcher, Interpretability

Research Scientist, Interpretability

Research Scientist, Interpretability

Research Scientist, Interpretability

Research Scientist, Interpretability

Researcher, Interpretability

Researcher, Interpretability

[Expression of Interest] Research Manager, Interpretability

[Expression of Interest] Research Manager, Interpretability

[Expression of Interest] Research Manager, Interpretability

[Expression of Interest] Research Manager, Interpretability

Machine Learning Researcher

Machine Learning Researcher

Machine Learning Researcher

Machine Learning Researcher

Machine Learning Researcher

Machine Learning Researcher

ML Researcher

ML Researcher

Member of the Technical Staff, Interpretability

Member of the Technical Staff, Interpretability

Member of the Technical Staff, Interpretability

Member of the Technical Staff, Interpretability

Research Scientist, Safety Post Training

Research Scientist, Safety Post Training

Field Team - Member of Technical Staff

Field Team - Member of Technical Staff

Research Scientist, Safety Post Training

Research Scientist, Safety Post Training

Mechanistic Interpretability information

See salary details

How much do mechanistic interpretability jobs pay per year?

How to become mech interp researcher?

What is the difference between Mechanistic Interpretability vs Data Scientist?

Is ML a high paying job?

Which 5 jobs will survive AI?

How does mechanistic interpretability work?

Research Fellowship - Mechanistic Interpretability

Share this job

Job description

Share this job