Multimodal Learning Jobs in California (NOW HIRING)

Member of Technical Staff (MTS) - Multimodal Foundation Models

Focus Multimodal Foundation Models Representation Learning Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and representation ...

Deeproute.ai

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA

Deeproute.ai

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Focus Multimodal Foundation Models · Representation Learning · Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...

Quick apply

Deeproute.ai

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Focus Multimodal Foundation Models · Representation Learning · Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...

Deeproute.ai

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Focus Multimodal Foundation Models • Representation Learning • Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...

Deeproute.ai

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Stand Insurance

Machine Learning Engineer - Multimodal Modeling

San Francisco, CA · On-site

$250K - $295K/yr

As a Machine Learning Engineer on the Applied Science team, you will design, train, and deploy Stand's flagship AI capabilities, with a central focus on the multimodal meshing of our Stand World ...

Stand Insurance

Machine Learning Engineer - Multimodal Modeling

San Francisco, CA · On-site

$250K - $295K/yr

ByteDance

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

San Jose, CA · On-site

... learning. • Improve agent capabilities such as perception, memory, decision-making, and tool use ... etc. • Experience in multimodal learning, reinforcement learning, or agent systems. • ...

ByteDance

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

San Jose, CA · On-site

Xaira Therapeutics

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, CA

$170K - $240K/yr

... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...

Xaira Therapeutics

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, CA

$170K - $240K/yr

... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Quick apply

Eluvio

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Hark

Member of Technical Staff, Multimodal Vision

San Jose, CA

$180K - $450K/yr

Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...

Hark

Member of Technical Staff, Multimodal Vision

San Jose, CA

$180K - $450K/yr

Hark

Member of Technical Staff, Multimodal Vision

San Jose, CA · On-site

$180K - $450K/yr

Hark

Member of Technical Staff, Multimodal Vision

San Jose, CA · On-site

$180K - $450K/yr

Tacit

Machine Learning Scientist

San Francisco, CA · On-site

$180K - $270K/yr

As a Machine Learning Scientist, you will develop cutting-edge AI models to integrate and decode complex, multimodal data streams from our custom sensing hardware. You'll play a pivotal role in ...

Tacit

Machine Learning Scientist

San Francisco, CA · On-site

$180K - $270K/yr

NVIDIA

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$102K - $125K/yr

The role involves conceptualizing and implementing deep learning architectures for biological data, developing multimodal learning systems, and collaborating with a diverse team to advance ...

NVIDIA

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$102K - $125K/yr

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

Description As a Machine Learning Research Engineer, you will help design and develop models and algorithms for multimodal perception and reasoning leveraging Vision-Language Models (VLMs) and ...

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

Menlo Park, CA

$219K/yr

Research expertise in video generation/understanding, multimodal learning, or diffusion models * Demonstrated significant industry influence in the field of AI and/or recently published research in ...

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

Menlo Park, CA

$219K/yr

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA

$150K - $277K/yr

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA

$150K - $277K/yr

Cisco

Senior AI Researcher

San Francisco, CA · On-site

Our research spans foundation models, agentic AI, multimodal learning, reasoning systems, scalable training algorithms, evaluation science, inference optimization, and AI systems infrastructure. We ...

Cisco

Senior AI Researcher

San Francisco, CA · On-site

TikTok

Machine Learning Scientist Intern (TikTok-Content Ecology-LLM application) - 2026 Start (PhD)

San Jose, CA · On-site

$60/hr

... learning, and recommendation algorithms. We develop cutting-edge AI capabilities that power ... Our work includes: - Short Video Content Understanding - Building multimodal AI models to analyze ...

TikTok

Machine Learning Scientist Intern (TikTok-Content Ecology-LLM application) - 2026 Start (PhD)

San Jose, CA · On-site

$60/hr

The Bot Company

Machine Learning: Multimodal Foundation Models

San Francisco, CA · On-site

$200K - $350K/yr

Machine Learning: Multimodal Foundation Models We are building unified foundation models that natively reason across text, image, video, and kinematics to drive intelligent robotic policies. You will ...

The Bot Company

Machine Learning: Multimodal Foundation Models

San Francisco, CA · On-site

$200K - $350K/yr

Motional

Senior Machine Learning Engineer, Data Mining

San Francisco, CA · On-site +1

$144K - $190K/yr

Omnitag, our ML-powered multimodal data mining framework, is the engine that powers this discovery. As a Senior Machine Learning Engineer on the Data Mining team, your mission is to build the "Brain ...

Quick apply

Motional

Senior Machine Learning Engineer, Data Mining

San Francisco, CA · On-site +1

$144K - $190K/yr

Showing results 1-20

Multimodal Learning Jobs in California

Multimodal Learning information

What is multimodal learning?

Multimodal learning is an area of machine learning that involves integrating and processing information from multiple types of data, such as text, images, audio, and video. The goal is to create models that can understand and make predictions based on more than one data modality, similar to how humans use various senses. This approach is used in applications like speech recognition with visual cues, image captioning, and video analysis. By combining different data types, multimodal learning systems can achieve better accuracy and more robust understanding.

What is the difference between Multimodal Learning vs Data Scientist?

Aspect	Multimodal Learning	Data Scientist
Required Credentials	Advanced degrees in AI, Machine Learning, or Computer Science	Bachelor's or Master's in Data Science, Statistics, or related fields
Work Environment	Research labs, AI development teams, academia	Business, tech companies, analytics teams
Industry Usage	AI research, multimedia applications, robotics	Data analysis, predictive modeling, business insights

Multimodal Learning focuses on developing AI models that process and integrate multiple data types like images, text, and audio. Data Scientists analyze data to extract insights, build models, and support decision-making. While both roles involve data and algorithms, Multimodal Learning is specialized in AI model development for complex data integration, whereas Data Scientists work broadly across data analysis and interpretation.

What are the key skills and qualifications needed to thrive as a Multimodal Learning Specialist, and why are they important?

To excel as a Multimodal Learning Specialist, you need a solid background in machine learning, data science, and computer vision, often supported by an advanced degree in a related field. Familiarity with deep learning frameworks like TensorFlow or PyTorch, experience integrating data from diverse sources (e.g., text, audio, images), and knowledge of relevant algorithms are crucial. Strong problem-solving abilities, creativity, and effective collaboration are standout soft skills for this role. These competencies are vital for developing innovative models that can process and interpret complex, multi-source data to drive impactful AI solutions.

What are some common challenges faced by professionals working in multimodal learning roles, and how can they be addressed?

Professionals in multimodal learning frequently encounter challenges related to integrating and aligning data from multiple sources, such as text, images, audio, or video. Ensuring data quality and consistency across modalities can be complex, and developing models that effectively combine heterogeneous information often requires advanced technical skills and innovative thinking. Collaboration with domain experts and other data scientists is key to overcoming these obstacles, as is staying up to date with the latest research and tools in machine learning. Regular team meetings and cross-disciplinary workshops can help foster a collaborative environment and promote knowledge sharing.

What are popular job titles related to Multimodal Learning jobs in California? For Multimodal Learning jobs in California, the most frequently searched job titles are:

What job categories do people searching Multimodal Learning jobs in California look for? The top searched job categories for Multimodal Learning jobs in California are:

What cities in California are hiring for Multimodal Learning jobs? Cities in California with the most Multimodal Learning job openings:

Multimodal Learning jobs near you

Infographic showing various Multimodal Learning job openings in California as of July 2026, with employment types broken down into 1% As Needed, 71% Full Time, 25% Part Time, 1% Temporary, and 2% Contract. Highlights an 86% Physical, 2% Hybrid, and 12% Remote job distribution.

Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai

Fremont, CA

Apply

Other

Posted 29 days ago

Job description

Focus

Multimodal Foundation Models Representation Learning Method Innovation
We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

Strong experimental rigor
Solid systems and modeling intuition
Hands-on engineering ability
Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.
Responsibilities

1. Large-Scale Foundation Model Pretraining

Develop scalable pretraining pipelines for large-scale multimodal driving data
Design and optimize training strategies for:

Vision-language-action models
Video foundation models
Long-context temporal modeling
Multimodal representation alignment

Improve:

Training stability
Data efficiency
Scaling efficiency
Representation robustness

Work on distributed training systems and large-scale model optimization using frameworks such as:

PyTorch Distributed
DeepSpeed
Megatron-LM

2. Representation Learning & Method Innovation

Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
Conduct architecture-level research on:

Vision Transformers (ViT)
Video / temporal architectures
Multimodal fusion and alignment
Embedding and retrieval systems
Long-context and memory-efficient architectures

Explore and improve:

Pretraining objectives
Loss functions
Training paradigms
Generalization and robustness

Analyze model behavior through:

Rigorous ablation studies
Failure case analysis

Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
Work on areas such as:

Model quantization
Knowledge distillation
Efficient attention mechanisms
Sparse architectures and Mixture-of-Experts (MoE)
Long-context and memory-efficient modeling
Inference acceleration and serving optimization
Training and inference system efficiency

Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments

Requirements

MS or PhD in:

Computer Vision
Machine Learning
Robotics
Computer Science
Related fields

Strong understanding of:

Foundation models
Self-supervised learning
Representation learning
Multimodal learning
Large-scale pretraining

Hands-on experience with methods such as:

CLIP
DINO / DINOv2
MAE
Contrastive learning
Masked modeling
MoE or scalable transformer architectures

Experience with one or more of the following is highly valued:

Video foundation models
Long-context modeling
Retrieval systems
Efficient inference
Distributed training
Model compression and deployment optimization

Strong publication record in top-tier venues is preferred:

CVPR
ICCV
ECCV
NeurIPS
ICLR
ICML

Apply

Multimodal Learning Jobs in California (NOW HIRING)

Member of Technical Staff (MTS) - Multimodal Foundation Models

Member of Technical Staff (MTS) - Multimodal Foundation Models

Member of Technical Staff (MTS) - Multimodal Foundation Models

Member of Technical Staff (MTS) - Multimodal Foundation Models

Member of Technical Staff (MTS) - Multimodal Foundation Models

Member of Technical Staff (MTS) - Multimodal Foundation Models

Machine Learning Engineer - Multimodal Modeling

Machine Learning Engineer - Multimodal Modeling

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

AI Scientist - Biomedical Multimodal Modeling

AI Scientist - Biomedical Multimodal Modeling

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Member of Technical Staff, Multimodal Vision

Member of Technical Staff, Multimodal Vision

Member of Technical Staff, Multimodal Vision

Member of Technical Staff, Multimodal Vision

Machine Learning Scientist

Machine Learning Scientist

Senior Applied AI Researcher, Digital Biology

Senior Applied AI Researcher, Digital Biology

Machine Learning Research Engineer, SIML - ISE

Machine Learning Research Engineer, SIML - ISE

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

Machine Learning Research Engineer, SIML - ISE

Machine Learning Research Engineer, SIML - ISE

Senior AI Researcher

Senior AI Researcher

Machine Learning Scientist Intern (TikTok-Content Ecology-LLM application) - 2026 Start (PhD)

Machine Learning Scientist Intern (TikTok-Content Ecology-LLM application) - 2026 Start (PhD)

Machine Learning: Multimodal Foundation Models

Machine Learning: Multimodal Foundation Models

Senior Machine Learning Engineer, Data Mining

Senior Machine Learning Engineer, Data Mining

Multimodal Learning information

What is multimodal learning?

What is the difference between Multimodal Learning vs Data Scientist?

What are the key skills and qualifications needed to thrive as a Multimodal Learning Specialist, and why are they important?

What are some common challenges faced by professionals working in multimodal learning roles, and how can they be addressed?

Member of Technical Staff (MTS) - Multimodal Foundation Models

Share this job

Job description

Share this job