1

Diffusion Model Jobs (NOW HIRING)

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video ... Lead research initiatives in advanced diffusion models for text-to-video generation, focusing on ...

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video ... Lead research initiatives in advanced diffusion models for text-to-video generation, focusing on ...

Real-time Video Researcher

Palo Alto, CA ยท On-site

$185K - $400K/yr

Work on diffusion model distillation and develop diffusion-based world models for video applications * Train and finetune autoregressive models and diffusion models with a focus on real-time ...

Research Engineer

New York, NY ยท On-site

$200K - $300K/yr

To do this we're developing cutting-edge diffusion models and designing novel, personalized interfaces. We're a small team of creative builders in NYC with a rare combination of taste and deep AI ...

next page

Showing results 1-20

Diffusion Model information

See salary details

$30

$52

$96

How much do diffusion model jobs pay per hour?

As of Jun 4, 2026, the average hourly pay for diffusion model in the United States is $52.18, according to ZipRecruiter salary data. Most workers in this role earn between $38.46 and $96.15 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Diffusion Model Engineer, and why are they important?

To thrive as a Diffusion Model Engineer, you need a strong background in machine learning, deep learning, mathematics, and programming, usually supported by a degree in computer science or a related field. Familiarity with frameworks like PyTorch or TensorFlow, experience with large-scale data processing, and knowledge of diffusion model architectures are typically required. Creativity, problem-solving, and effective communication are crucial soft skills for collaborating with multidisciplinary teams and advancing research. These skills enable the development and implementation of cutting-edge generative models that drive innovation in AI applications.

What are some common challenges faced by professionals working with diffusion models, and how can these be addressed?

Professionals working with diffusion models often encounter challenges related to computational resource demands, model stability, and data quality. Training large diffusion models can require significant GPU resources and careful tuning to prevent issues like mode collapse or slow convergence. Collaborating closely with data engineers and domain experts helps ensure high-quality, diverse datasets, which are critical for realistic outputs. Staying up-to-date with the latest research and best practices can also help address these challenges and advance your skills in this rapidly evolving field.

What are diffusion models in machine learning?

Diffusion models are a type of generative model in machine learning that create data, such as images, by simulating a process where noise is gradually removed from a random signal. These models learn to reverse a diffusion process, transforming noisy data into structured outputs that resemble real examples from the training set. They have gained popularity for producing high-quality, realistic images and other media. Diffusion models are used in various applications, including image synthesis, inpainting, and audio generation.

What is the difference between Diffusion Model vs Data Scientist?

AspectDiffusion ModelData Scientist
Required CredentialsTypically a background in machine learning, statistics, or computer scienceDegree in data science, statistics, computer science, or related fields
Work EnvironmentResearch labs, AI development teams, tech companiesBusiness, tech firms, consulting, research institutions
Industry UsageUsed in AI image generation, generative modelingAnalyzing data, building predictive models, data visualization

While both roles involve data and algorithms, a Diffusion Model focuses on developing generative AI models, whereas a Data Scientist analyzes data to inform business decisions. Understanding these differences helps in choosing the right career path or job focus.

Infographic showing various Diffusion Model job openings in the United States as of May 2026, with employment types broken down into 83% Full Time, 13% Part Time, 1% Temporary, and 3% Contract. Highlights an 79% Physical, 1% Hybrid, and 20% Remote job distribution, with an average salary of $108,534 per year, or $52.2 per hour.

Member of Technical Staff - Diffusion Model

Moonlake AI

San Francisco, CA โ€ข On-site

Full-time

Posted 28 days ago


Job description

Introducing Moonlake, AI for creating world simulations.
About Moonlake
Moonlake is building the frontier of interactive world models: systems that generate, simulate, and reason over 3D environments for embodied AI, robotics and gaming. We develop the simulation infrastructure to build worlds (e.g., assets, scenes, digital twins) at scale.
Our team sits at the intersection of:
  • Embodied AI
  • Robotics simulation
  • Interactive 3D worlds
  • World models
  • Real-time generation
  • AI infrastructure

Moonlake is building the next generation of AI infrastructure for interactive digital worlds. Our mission is to enable anyone to create, simulate, and interact with rich environments using natural language and multimodal inputs, turning simple ideas into worlds with structure, logic, and agents that can perceive and act.
Our team has raised $28M in seed funding from NVIDIA Ventures, Threshold Ventures, AIX ventures and notable angels including Naval Ravikant and Jeff Dean to build the foundational layer for the future of AI - powering everything from creative tools and games to robotics training, simulations, and digital twins. Our goal is to make building and experimenting with these environments as accessible and scalable as publishing video on the internet.
We are looking for exceptional research engineers and applied researchers to help push the frontier of interactive AI.
The Role
We're looking for a Member of Technical Staff - Diffusion Models to help design and train the next generation of multimodal generative systems powering Moonlake's interactive world platform.
This is a research-heavy role focused on:
  • Diffusion architectures
  • Video generation
  • Conditioning systems
  • Multimodal generation
  • Control and personalization
  • Large-scale training

The ideal candidate combines:
  • Strong ML research fundamentals
  • Practical systems intuition
  • Experience training generative models at scale
  • Deep curiosity around interactive world generation

This role has a very high technical bar. Successful candidates typically have:
  • Published research
  • Strong generative modeling experience
  • Video generation or graphics-related experience
  • Prior work on frontier multimodal systems
What You'll Do
  • Build and iterate on diffusion architectures across:
    • 2D
    • 3D
    • Image
    • Video
    • Audio
  • Develop conditioning and control systems for multimodal generation
  • Improve generation quality, controllability, consistency, and efficiency
  • Train large-scale generative models
  • Build systems for editing, personalization, and controllable generation
  • Collaborate closely with infrastructure, world-modeling, and product teams
  • Push generation systems toward real-time and interactive applications
Scope of Work
Modeling & Architecture
  • Build and improve diffusion architectures
  • Video diffusion systems
  • Multimodal generation pipelines
  • Latent-space modeling
  • Real-time generation architectures
  • Interactive generation systems

Conditioning & Multi-Modal Learning
  • Text conditioning
  • Image conditioning
  • Pose/layout/control signals
  • Multi-modal encoders
  • Guidance strategies
  • Structured generation control

Training & Optimization
  • Large-scale diffusion training
  • Distributed training systems
  • Sample quality vs. compute optimization
  • Distillation techniques
  • Consistency models
  • One-step generation systems
  • Efficient generation pipelines

Control & Alignment
  • ControlNet
  • LoRA
  • IP-Adapters
  • Style / identity / geometry conditioning
  • Editing pipelines
  • Inpainting systems
  • Personalization systems
  • DreamBooth and custom tuning workflows
What We're Looking For
  • Strong ML research background
  • Deep understanding of diffusion models and generative architectures
  • Experience training large-scale generative systems
  • Strong grasp of optimization, scaling, and multimodal learning
  • Ability to work across both research and implementation
  • Strong engineering fundamentals
  • Ability to iterate quickly in a fast-moving research environment
Bonus Points
  • Experience with 3D generation or world models
  • Robotics simulation or embodied AI familiarity
  • Interactive generation systems
  • Real-time inference optimization
  • Graphics or game-engine experience
  • Experience building production-grade generation pipelines
Why This Role Matters
Moonlake is not building static image generators.
The company is building systems capable of generating:
  • Interactive worlds
  • Dynamic simulations
  • Controllable environments
  • Real-time multimodal experiences

The diffusion stack is foundational to making these systems coherent, controllable, scalable, and interactive.
You'll help define the generation systems behind the next generation of world-model AI.
We are committed to being an on-site, in-person team currently based in San Francisco.