Multimodal Learning Jobs (NOW HIRING)

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Focus Multimodal Foundation Models · Representation Learning · Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA

Focus Multimodal Foundation Models Representation Learning Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and representation ...

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

Focus Multimodal Foundation Models • Representation Learning • Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...

Member of Technical Staff (MTS) - Multimodal Foundation Models

Fremont, CA · On-site

New York University

[EOI] Postdoctoral Associate in Multimodal AI| Professor Saining Xie

New York, NY · On-site

$62K - $125K/yr

Conducting original research in multimodal learning, including model design, training, and evaluation * Developing scalable methods for aligning and integrating diverse data modalities

New York University

[EOI] Postdoctoral Associate in Multimodal AI| Professor Saining Xie

New York, NY · On-site

$62K - $125K/yr

Conducting original research in multimodal learning, including model design, training, and evaluation * Developing scalable methods for aligning and integrating diverse data modalities

Stand Insurance

Machine Learning Engineer - Multimodal Modeling

San Francisco, CA · On-site

$250K - $295K/yr

As a Machine Learning Engineer on the Applied Science team, you will design, train, and deploy Stand's flagship AI capabilities, with a central focus on the multimodal meshing of our Stand World ...

New

Stand Insurance

Machine Learning Engineer - Multimodal Modeling

San Francisco, CA · On-site

$250K - $295K/yr

New

ByteDance

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

San Jose, CA · On-site

... learning. • Improve agent capabilities such as perception, memory, decision-making, and tool use ... etc. • Experience in multimodal learning, reinforcement learning, or agent systems. • ...

ByteDance

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

San Jose, CA · On-site

Xaira Therapeutics

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, CA

$170K - $240K/yr

... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...

Xaira Therapeutics

AI Scientist - Biomedical Multimodal Modeling

South San Francisco, CA

$170K - $240K/yr

... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA

Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Senior Staff Machine Learning Scientist, Assets

OR · On-site +1

$91K - $124K/yr

We're looking for a Senior Staff Machine Learning Scientist to help us solve challenging problems ... Design, implement, train, and optimize large-scale vision and multimodal foundation models across ...

Senior Staff Machine Learning Scientist, Assets

OR · On-site +1

$91K - $124K/yr

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Quick apply

Last Minute AI-Machine Learning Summer Internship (Gen AI - Multimodal)

Berkeley, CA · On-site

Member of Technical Staff, Multimodal Vision

San Jose, CA

$180K - $450K/yr

Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...

Member of Technical Staff, Multimodal Vision

San Jose, CA

$180K - $450K/yr

Senior Staff Machine Learning Scientist, Assets

$93K - $127K/yr

Senior Staff Machine Learning Scientist, Assets

$93K - $127K/yr

Member of Technical Staff, Multimodal Vision

San Jose, CA · On-site

$180K - $450K/yr

Multimodal Learning jobs near you

Member of Technical Staff, Multimodal Vision

San Jose, CA · On-site

$180K - $450K/yr

NVIDIA AI

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$107K - $146K/yr

The role focuses on applied research in deep learning architectures for biological data, including the development of multimodal learning systems and digital twin systems for healthcare. ...

NVIDIA AI

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$107K - $146K/yr

The role focuses on applied research in deep learning architectures for biological data, including the development of multimodal learning systems and digital twin systems for healthcare. ...

Amazon

2026 Fall Applied Science Internship - Natural Language Processing and Speech Technologies - United

Seattle, WA · On-site

$17 - $22.75/hr

NLP/NLU, LLMs, Reinforcement Learning, Human Feedback/HITL, Deep Learning, Speech Recognition, Conversational AI, Natural Language Modeling, Multimodal Learning. In this role, you will work alongside ...

Amazon

2026 Fall Applied Science Internship - Natural Language Processing and Speech Technologies - United

Seattle, WA · On-site

$17 - $22.75/hr

NVIDIA

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$102K - $125K/yr

The role involves conceptualizing and implementing deep learning architectures for biological data, developing multimodal learning systems, and collaborating with a diverse team to advance ...

NVIDIA

Senior Applied AI Researcher, Digital Biology

Santa Clara, CA · On-site

$102K - $125K/yr

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

Description As a Machine Learning Research Engineer, you will help design and develop models and algorithms for multimodal perception and reasoning leveraging Vision-Language Models (VLMs) and ...

Apple

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

New York, NY

$219K/yr

Research expertise in video generation/understanding, multimodal learning, or diffusion models * Demonstrated significant industry influence in the field of AI and/or recently published research in ...

AI Research Scientist (Technical Leadership), Multimodal - Monetization GenAI

New York, NY

$219K/yr

Roblox

[2026] Senior Machine Learning Engineer, Account Identity - PhD Early Career

San Mateo, CA · On-site

$119K - $163K/yr

Expertise in one or more areas: computer vision, multimodal learning, deepfake detection, facial representation, adversarial machine learning, or VLM/LLM. * Strong coding skills with proficiency in ...

Roblox

[2026] Senior Machine Learning Engineer, Account Identity - PhD Early Career

San Mateo, CA · On-site

$119K - $163K/yr

Showing results 1-20

Multimodal Learning Jobs

Multimodal Learning information

See salary details

$21K

$61.7K

$114.5K

How much do multimodal learning jobs pay per year?

As of Jul 21, 2026, the average yearly pay for multimodal learning in the United States is $61,692.00, according to ZipRecruiter salary data. Most workers in this role earn between $41,000.00 and $72,000.00 per year, depending on experience, location, and employer.

What is multimodal learning?

Multimodal learning is an area of machine learning that involves integrating and processing information from multiple types of data, such as text, images, audio, and video. The goal is to create models that can understand and make predictions based on more than one data modality, similar to how humans use various senses. This approach is used in applications like speech recognition with visual cues, image captioning, and video analysis. By combining different data types, multimodal learning systems can achieve better accuracy and more robust understanding.

What is the difference between Multimodal Learning vs Data Scientist?

Aspect	Multimodal Learning	Data Scientist
Required Credentials	Advanced degrees in AI, Machine Learning, or Computer Science	Bachelor's or Master's in Data Science, Statistics, or related fields
Work Environment	Research labs, AI development teams, academia	Business, tech companies, analytics teams
Industry Usage	AI research, multimedia applications, robotics	Data analysis, predictive modeling, business insights

Multimodal Learning focuses on developing AI models that process and integrate multiple data types like images, text, and audio. Data Scientists analyze data to extract insights, build models, and support decision-making. While both roles involve data and algorithms, Multimodal Learning is specialized in AI model development for complex data integration, whereas Data Scientists work broadly across data analysis and interpretation.

What are the key skills and qualifications needed to thrive as a Multimodal Learning Specialist, and why are they important?

To excel as a Multimodal Learning Specialist, you need a solid background in machine learning, data science, and computer vision, often supported by an advanced degree in a related field. Familiarity with deep learning frameworks like TensorFlow or PyTorch, experience integrating data from diverse sources (e.g., text, audio, images), and knowledge of relevant algorithms are crucial. Strong problem-solving abilities, creativity, and effective collaboration are standout soft skills for this role. These competencies are vital for developing innovative models that can process and interpret complex, multi-source data to drive impactful AI solutions.

What are some common challenges faced by professionals working in multimodal learning roles, and how can they be addressed?

Professionals in multimodal learning frequently encounter challenges related to integrating and aligning data from multiple sources, such as text, images, audio, or video. Ensuring data quality and consistency across modalities can be complex, and developing models that effectively combine heterogeneous information often requires advanced technical skills and innovative thinking. Collaboration with domain experts and other data scientists is key to overcoming these obstacles, as is staying up to date with the latest research and tools in machine learning. Regular team meetings and cross-disciplinary workshops can help foster a collaborative environment and promote knowledge sharing.

More about Multimodal Learning jobs

The 10 Top Types Of Multimodal Learning Jobs

What cities are hiring for Multimodal Learning jobs? Cities with the most Multimodal Learning job openings:

What states have the most Multimodal Learning jobs? States with the most job openings for Multimodal Learning jobs include:

What job categories do people searching Multimodal Learning jobs look for? The top searched job categories for Multimodal Learning jobs are:

Infographic showing various Multimodal Learning job openings in the United States as of July 2026, with employment types broken down into 1% As Needed, 72% Full Time, 25% Part Time, 1% Temporary, and 1% Contract. Highlights an 86% Physical, 2% Hybrid, and 12% Remote job distribution, with an average salary of $61,692 per year, or $29.7 per hour.

Member of Technical Staff (MTS) - Multimodal Foundation Models