The University of Bristol's School of Physiology, Pharmacology and Neuroscience is seeking a Senior Research Associate in Multimodal Learning. The role involves conducting research on audio-visual ...
The University of Bristol's School of Physiology, Pharmacology and Neuroscience is seeking a Senior Research Associate in Multimodal Learning. The role involves conducting research on audio-visual ...
Focus Multimodal Foundation Models Representation Learning Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and representation ...
Focus Multimodal Foundation Models Representation Learning Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and representation ...
Focus Multimodal Foundation Models ยท Representation Learning ยท Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...
Quick apply
Focus Multimodal Foundation Models ยท Representation Learning ยท Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...
Focus Multimodal Foundation Models โข Representation Learning โข Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...
Focus Multimodal Foundation Models โข Representation Learning โข Method Innovation We are looking for strong technical builders and researchers who deeply understand foundation models and ...
... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...
... multimodal data ... Our approach combines representation learning and generative modeling to capture structure ...
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Quick apply
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Help train and develop multimodal learning models using advanced learning techniques including RAG, self-supervised learning, semi-supervised, and transductive learning. Requirements Desired ...
Senior Staff Machine Learning Scientist, Assets
OR ยท On-site +1
$91K - $124K/yr
Design, implement, train, and optimize large-scale vision and multimodal foundation models across ... Proficiency in modern deep learning frameworks such as PyTorch and TensorFlow. * Demonstrated ...
Senior Staff Machine Learning Scientist, Assets
OR ยท On-site +1
$91K - $124K/yr
Design, implement, train, and optimize large-scale vision and multimodal foundation models across ... Proficiency in modern deep learning frameworks such as PyTorch and TensorFlow. * Demonstrated ...
Member of Technical Staff, Multimodal Vision
San Jose, CA ยท On-site
$180K - $450K/yr
Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...
Member of Technical Staff, Multimodal Vision
San Jose, CA ยท On-site
$180K - $450K/yr
Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...
Member of Technical Staff, Multimodal Vision
San Jose, CA ยท On-site
$180K - $450K/yr
Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...
Member of Technical Staff, Multimodal Vision
San Jose, CA ยท On-site
$180K - $450K/yr
Experience with large-scale machine learning systems and distributed training. * Strong background ... Experience with multimodal systems (vision + text, vision + audio) or real-time AI systems is a ...
Senior Staff Machine Learning Scientist, Assets
$93K - $127K/yr
Design, implement, train, and optimize large-scale vision and multimodal foundation models across ... Proficiency in modern deep learning frameworks such as PyTorch and TensorFlow. * Demonstrated ...
Senior Staff Machine Learning Scientist, Assets
$93K - $127K/yr
Design, implement, train, and optimize large-scale vision and multimodal foundation models across ... Proficiency in modern deep learning frameworks such as PyTorch and TensorFlow. * Demonstrated ...
[EOI] Postdoctoral Associate in Multimodal AI| Professor Saining Xie
New York, NY ยท On-site
$62K - $125K/yr
Conducting original research in multimodal learning, including model design, training, and evaluation * Developing scalable methods for aligning and integrating diverse data modalities
[EOI] Postdoctoral Associate in Multimodal AI| Professor Saining Xie
New York, NY ยท On-site
$62K - $125K/yr
Conducting original research in multimodal learning, including model design, training, and evaluation * Developing scalable methods for aligning and integrating diverse data modalities
2026 Fall Applied Science Internship - Natural Language Processing and Speech Technologies - United
Seattle, WA ยท On-site
$17 - $22.75/hr
NLP/NLU, LLMs, Reinforcement Learning, Human Feedback/HITL, Deep Learning, Speech Recognition, Conversational AI, Natural Language Modeling, Multimodal Learning. In this role, you will work alongside ...
2026 Fall Applied Science Internship - Natural Language Processing and Speech Technologies - United
Seattle, WA ยท On-site
$17 - $22.75/hr
NLP/NLU, LLMs, Reinforcement Learning, Human Feedback/HITL, Deep Learning, Speech Recognition, Conversational AI, Natural Language Modeling, Multimodal Learning. In this role, you will work alongside ...
Description As a Machine Learning Research Engineer, you will help design and develop models and algorithms for multimodal perception and reasoning leveraging Vision-Language Models (VLMs) and ...
Description As a Machine Learning Research Engineer, you will help design and develop models and algorithms for multimodal perception and reasoning leveraging Vision-Language Models (VLMs) and ...
Postdoctoral Associate
Cambridge, MA ยท On-site
POSTDOCTORAL ASSOCIATE, MACHINE LEARNING FOR HEALTH, Medical Engineering & Science (IMES), will develop machine learning methods for latent representation learning from complex, multimodal, time ...
Postdoctoral Associate
Cambridge, MA ยท On-site
POSTDOCTORAL ASSOCIATE, MACHINE LEARNING FOR HEALTH, Medical Engineering & Science (IMES), will develop machine learning methods for latent representation learning from complex, multimodal, time ...
Staff Machine Learning Engineer
Boston, MA ยท On-site +1
Omnitag, our ML-powered multimodal data mining framework, is the engine that powers this discovery. As a Staff Machine Learning Engineer, you will serve as a technical leader defining the roadmap and ...
Staff Machine Learning Engineer
Boston, MA ยท On-site +1
Omnitag, our ML-powered multimodal data mining framework, is the engine that powers this discovery. As a Staff Machine Learning Engineer, you will serve as a technical leader defining the roadmap and ...
Staff Machine Learning Engineer
Pittsburgh, PA ยท On-site +1
Omnitag, our ML-powered multimodal data mining framework, is the engine that powers this discovery. As a Staff Machine Learning Engineer, you will serve as a technical leader defining the roadmap and ...
Staff Machine Learning Engineer
Pittsburgh, PA ยท On-site +1
Omnitag, our ML-powered multimodal data mining framework, is the engine that powers this discovery. As a Staff Machine Learning Engineer, you will serve as a technical leader defining the roadmap and ...
Expertise in multimodal learning integrating text, images, and structured molecular data. * Experience with omics data analysis (genomics, transcriptomics, proteomics) and knowledge graph
Expertise in multimodal learning integrating text, images, and structured molecular data. * Experience with omics data analysis (genomics, transcriptomics, proteomics) and knowledge graph
Expertise in multimodal learning integrating text, images, and structured molecular data. * Experience with omics data analysis (genomics, transcriptomics, proteomics) and knowledge graph
Expertise in multimodal learning integrating text, images, and structured molecular data. * Experience with omics data analysis (genomics, transcriptomics, proteomics) and knowledge graph
Multimodal Learning information
See salary details
$21K - $29.5K
10% of jobs
$29.5K - $38K
14% of jobs
$39.2K is the 25th percentile. Wages below this are outliers.
$38K - $46.5K
10% of jobs
$46.5K - $55K
12% of jobs
The median wage is $57K / yr.
$55K - $63.5K
20% of jobs
$68.8K is the 75th percentile. Wages above this are outliers.
$63.5K - $72K
15% of jobs
$72K - $80.5K
4% of jobs
$80.5K - $89K
2% of jobs
$89K - $97.5K
4% of jobs
$97.5K - $106K
0% of jobs
$106K - $114.5K
9% of jobs
$21K
$61.7K
$114.5K
How much do multimodal learning jobs pay per year?
What is multimodal learning?
What is the difference between Multimodal Learning vs Data Scientist?
| Aspect | Multimodal Learning | Data Scientist |
|---|---|---|
| Required Credentials | Advanced degrees in AI, Machine Learning, or Computer Science | Bachelor's or Master's in Data Science, Statistics, or related fields |
| Work Environment | Research labs, AI development teams, academia | Business, tech companies, analytics teams |
| Industry Usage | AI research, multimedia applications, robotics | Data analysis, predictive modeling, business insights |
Multimodal Learning focuses on developing AI models that process and integrate multiple data types like images, text, and audio. Data Scientists analyze data to extract insights, build models, and support decision-making. While both roles involve data and algorithms, Multimodal Learning is specialized in AI model development for complex data integration, whereas Data Scientists work broadly across data analysis and interpretation.
What are the key skills and qualifications needed to thrive as a Multimodal Learning Specialist, and why are they important?
What are some common challenges faced by professionals working in multimodal learning roles, and how can they be addressed?

Senior Research Associate in Multimodal Learning
Bristol, CT โข On-site
Full-time
Posted 4 days ago
Job description
The University of Bristol's School of Physiology, Pharmacology and Neuroscience is seeking a Senior Research Associate in Multimodal Learning. The role involves conducting research on audio-visual understanding for smart hearing aids, collaborating with other researchers, and publishing findings in top-tier venues.
Responsibilities:
โข Conducting novel research in multimodal audio-visual understanding โ contributing novel research on designing, training and evaluating audio-visual understanding in conversational setting. This will include hands-on research using the latest deep learning approaches.
โข Preparing API packages with low latency that will be integrated with partner demonstrations on quarterly basis.
โข Presenting your work in regular meetings, taking feedback and integrating the goals of the proect into your individual research directions.
โข Publishing in top-tier venues (conferences and journals). Communicating your work to the best possible audience.
โข Collaborating with other researchers (postdocs and faculty) in the WeHear project.
โข Co-advising junior PGR students.
Qualifications:
Required:
โข PhD [near submission, submitted or graduated] in Multimodal Understanding, preferably with expertise in audio understanding, video understanding or multimodal visual models.
โข Prior degree in computer science, engineering or mathematics
โข Detailed knowledge of video understanding state-of-the-art, approaches, datasets and problems, preferably with expertise in egocentric datasets.
โข Prior knowledge of egocentric audio-visual devices that work in real time like Meta Aria Glasses (Gen1 or Gen2) and Apple Vision Pro.
โข Experience in handling audio-video data, for learning and inference
โข Experience in modelling deep learning approaches
โข Experience and evidence of publishing at high-calibre conferences and journals (at least one first-author paper in a major venue โ CVPR/ICCV/ECCV/ICASSP/NeurIPs/PAMI/IJCV/Neurips/ICLR in the past 3 years).
โข Excellent programming skills (Python)
โข Proficiency in deep learning frameworks (PyTorch)
Company:
Research within the School of Physiology, Pharmacology and Neuroscience is conducted across Neuroscience, Cardiovascular and Cell Signalling. Founded in 1876, the company is headquartered in Bristol, Bristol, GB, , with a team of 51-200 employees. The company is currently Growth Stage.