Log In

1

Vision Language Model Jobs (NOW HIRING)

AI Solutions Architect

Houston, TX · On-site

The ideal candidate will have advanced working knowledge of data analytics, modern machine learning algorithms, foundation models, large language models, vision-language models, small language models ...

AI Solutions Architect

Houston, TX · On-site

The ideal candidate will have advanced working knowledge of data analytics, modern machine learning algorithms, foundation models, large language models, vision-language models, small language models ...

Sony Corporation

Research Intern - Multimodal Foundation Model for Vision

New York, NY · On-site +1

$50/hr

Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge.

Sony Corporation

Research Intern - Multimodal Foundation Model for Vision

New York, NY · On-site +1

$50/hr

Conduct fundamental and innovative development in low-cost yet powerful vision-language models (VLM), unified models, automatic model compression, optimization and deployement on cloud and edge.

Cnpc Usa Corporation

AI Solutions Architect

Houston, TX · On-site

The ideal candidate will have advanced working knowledge of data analytics, modern machine learning algorithms, foundation models, large language models, vision-language models, small language models ...

Quick apply

Cnpc Usa Corporation

AI Solutions Architect

Houston, TX · On-site

The ideal candidate will have advanced working knowledge of data analytics, modern machine learning algorithms, foundation models, large language models, vision-language models, small language models ...

VLM & VFM Forward Deployed Engineer

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

VLM & VFM Forward Deployed Engineer

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

VLM & VFM Forward Deployed Engineer

Palo Alto, CA · On-site

$150K - $300K/yr

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

Quick apply

VLM & VFM Forward Deployed Engineer

Palo Alto, CA · On-site

$150K - $300K/yr

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

VLM & VFM Forward Deployed Engineer

Palo Alto, CA · On-site

$150K - $300K/yr

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

VLM & VFM Forward Deployed Engineer

Palo Alto, CA · On-site

$150K - $300K/yr

We're looking for a Vision Language Model (VLM) & Visual Foundation Model (VFM) Forward Deployed Engineer to operate at the forefront of visual and multi-modal intelligence deployment in industry ...

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Quick apply

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Quick apply

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Applied Researcher, Vision Language Models/VLM - TikTok

San Jose, CA · On-site

$244.80K - $450K/yr

We are looking for researchers in LLM, VLM and Omni Model domain who are experienced in single ... Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with ...

Applied Researcher, Vision Language Models/VLM - TikTok

San Jose, CA · On-site

$244.80K - $450K/yr

We are looking for researchers in LLM, VLM and Omni Model domain who are experienced in single ... Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with ...

Senior Machine Learning Engineer, Computer Vision/VLM

San Diego, CA · On-site

$110.90K - $152.40K/yr

Develop and prototype novel prompting strategies for Vision-Language Models (VLMs) to elicit complex, causal reasoning about driving scenarios. * Collaborate closely with the ML Infra, Perception ...

Senior Machine Learning Engineer, Computer Vision/VLM

San Diego, CA · On-site

$110.90K - $152.40K/yr

Develop and prototype novel prompting strategies for Vision-Language Models (VLMs) to elicit complex, causal reasoning about driving scenarios. * Collaborate closely with the ML Infra, Perception ...

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Quick apply

Objectways Technologies Llc

Vision-Language-Action (VLA) Annotator

$25/hr

We are looking for a detail-oriented and technically capable Vision-Language-Action (VLA) Annotator ... models. Your work directly impacts the safety and performance of AI systems operating in the real ...

Applied Researcher, Vision Language Models/VLM - TikTok

San Jose, CA · On-site

$244.80K - $450K/yr

We are looking for researchers in LLM, VLM and Omni Model domain who are experienced in single ... Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with ...

Applied Researcher, Vision Language Models/VLM - TikTok

San Jose, CA · On-site

$244.80K - $450K/yr

We are looking for researchers in LLM, VLM and Omni Model domain who are experienced in single ... Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with ...

Staff R&D AI Engineer

Austin, TX · On-site +1

You'll architect and implement Vision-Language-Action (VLA) models, advance reinforcement learning applications, and push the boundaries of multimodal AI integration. This role combines deep ...

Staff R&D AI Engineer

Austin, TX · On-site +1

You'll architect and implement Vision-Language-Action (VLA) models, advance reinforcement learning applications, and push the boundaries of multimodal AI integration. This role combines deep ...

University of Pittsburgh

Post Doctoral Associate

This position will play a vital role in driving high-quality outcomes in artificial intelligence in medicine, with a focus on large language models, natural language processing (NLP), and vision ...

University of Pittsburgh

Post Doctoral Associate

This position will play a vital role in driving high-quality outcomes in artificial intelligence in medicine, with a focus on large language models, natural language processing (NLP), and vision ...

AI Research Scientist, VLM (vision language models)

Bellevue, WA · On-site

$184K - $257K/yr

... vision, NLP, speech • Experience writing software and executing complex experiments involving large AI models and datasets • Must obtain work authorization in the country of employment at the ...

AI Research Scientist, VLM (vision language models)

Bellevue, WA · On-site

$184K - $257K/yr

... vision, NLP, speech • Experience writing software and executing complex experiments involving large AI models and datasets • Must obtain work authorization in the country of employment at the ...

Robotics AI Engineer Sr. Staff/Principal Engineer - Embodied AI/Vision Language Action Models

San Diego, CA · On-site

$200.80K - $301.20K/yr

Design and develop models that connect vision, language, and action for real world robotic ... Knowledge of model compression techniques (quantization, distillation) for efficient deployment.

Robotics AI Engineer Sr. Staff/Principal Engineer - Embodied AI/Vision Language Action Models

San Diego, CA · On-site

$200.80K - $301.20K/yr

Design and develop models that connect vision, language, and action for real world robotic ... Knowledge of model compression techniques (quantization, distillation) for efficient deployment.

Staff Vision-Language-Action Robotics ML Engineer

San Carlos, CA · On-site

A leading AI research company in California is seeking an experienced Machine Learning Engineer to develop Vision-Language-Action models for robotics. The ideal candidate has over 5 years of ...

Staff Vision-Language-Action Robotics ML Engineer

San Carlos, CA · On-site

A leading AI research company in California is seeking an experienced Machine Learning Engineer to develop Vision-Language-Action models for robotics. The ideal candidate has over 5 years of ...

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

$252.90K/yr

This role requires experience in vision-language models, and ability to fine-tune/adapt/distill multi-modal LLMs. You will be part of a fast-paced, impact-driven Applied Research organization working ...

Machine Learning Research Engineer, SIML - ISE

Cupertino, CA · On-site

$252.90K/yr

This role requires experience in vision-language models, and ability to fine-tune/adapt/distill multi-modal LLMs. You will be part of a fast-paced, impact-driven Applied Research organization working ...

Computer Vision Manager

Manhattan, NY · Remote

$300K/yr

You'll play a pivotal role in building advanced vision pipelines (detection, segmentation, transformers, 3D vision) and integrating them with large language models (LLMs) and vision-language models ...

Computer Vision Manager

Manhattan, NY · Remote

$300K/yr

You'll play a pivotal role in building advanced vision pipelines (detection, segmentation, transformers, 3D vision) and integrating them with large language models (LLMs) and vision-language models ...

AI/Machine Learning Engineer - Geospatial (TS/SCI) with Security Clearance

Herndon, VA · On-site +1

$175K - $250K/yr

AI/Machine Learning Engineer - Vision Language Models / Multimodal AI (NGA) Location: Springfield or Herndon, VA (onsite) Clearance: TS/SCI (CI Poly preferred) Position Type: Full-Time, Direct Hire ...

New

AI/Machine Learning Engineer - Geospatial (TS/SCI) with Security Clearance

Herndon, VA · On-site +1

$175K - $250K/yr

AI/Machine Learning Engineer - Vision Language Models / Multimodal AI (NGA) Location: Springfield or Herndon, VA (onsite) Clearance: TS/SCI (CI Poly preferred) Position Type: Full-Time, Direct Hire ...

New

1

2

3

Showing results 1-20

People also search for

Job

Ai Mod

Next

Vision Language Model Jobs

Vision Language Model information

See salary details

$10

$31

$67

How much do vision language model jobs pay per hour?

As of Jun 4, 2026, the average hourly pay for vision language model in the United States is $31.37, according to ZipRecruiter salary data. Most workers in this role earn between $18.99 and $39.18 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Vision Language Model Engineer, and why are they important?

To thrive as a Vision Language Model Engineer, you need a strong background in computer vision, natural language processing, machine learning, and often a graduate degree in computer science or a related field. Proficiency with deep learning frameworks such as TensorFlow or PyTorch, experience with large-scale datasets, and familiarity with model deployment tools are typically required. Strong problem-solving skills, creativity, and effective collaboration abilities help you stand out in this rapidly evolving field. These skills are essential for developing advanced AI systems that accurately interpret and generate language grounded in visual data, driving innovation in applications like image captioning and visual question answering.

What are some common challenges faced by professionals working with Vision Language Models, and how can they be addressed?

Professionals working with Vision Language Models often encounter challenges such as aligning visual and textual data, handling large-scale datasets, and ensuring model interpretability. Dealing with noisy or incomplete data from either modality can affect model performance, so strong data preprocessing and augmentation skills are essential. Collaboration with multidisciplinary teams—including data engineers, machine learning researchers, and domain experts—is key to refining models and deploying them effectively. Staying updated with the latest advancements and leveraging open-source resources can also help address these challenges.

What is a Vision Language Model?

A Vision Language Model (VLM) is an artificial intelligence system designed to understand and generate information using both visual data (like images or videos) and textual data (like written language). These models are trained on large datasets containing images paired with descriptive text, allowing them to perform tasks such as image captioning, visual question answering, and multimodal content generation. VLMs use advanced machine learning techniques to learn the relationships between visual elements and language, making them valuable for applications that require an integrated understanding of both modalities. They are widely used in fields such as robotics, accessibility technology, and automated content creation.

What is the difference between Vision Language Model vs Computer Vision Engineer?

Aspect	Vision Language Model	Computer Vision Engineer
Required credentials	Advanced degrees in AI, Machine Learning, or related fields	Degree in Computer Science, Electrical Engineering, or related fields
Work environment	Research labs, AI startups, tech companies focusing on multimodal AI	Tech companies, research institutions, industries applying image analysis
Industry usage	Developing multimodal AI systems combining vision and language	Creating algorithms for image recognition, object detection, and analysis
Search and comparison intent	Understanding roles in AI development involving vision and language	Focus on technical image processing and computer vision applications

While both roles involve working with visual data, a Vision Language Model specializes in integrating visual and textual information using advanced AI techniques, often in research or product development. In contrast, a Computer Vision Engineer focuses on developing algorithms for analyzing and interpreting visual data, primarily in applications like image recognition and object detection.

Infographic showing various Vision Language Model job openings in the United States as of May 2026, with employment types broken down into 1% As Needed, 39% Full Time, 55% Part Time, 1% Temporary, 3% Contract, and 1% Nights. Highlights an 91% Physical, 3% Hybrid, and 6% Remote job distribution, with an average salary of $65,246 per year, or $31.4 per hour.

AI Solutions Architect

Houston, TX • On-site

Apply

Other

This job post has expired today. Applications are no longer accepted.

Job description

Company Profile:
CNPC USA is a subsidiary of China National Petroleum Company and serves as the North American headquarters. Our mission is to drive innovation through advanced research and development of next-generation technologies for oil and gas exploration and production.
Job Summary:
CNPC USA is seeking a highly experienced AI Solutions Architect to lead the design, prototyping, implementation, and integration of artificial intelligence, machine learning, generative AI, and industrial analytics solutions for oil and gas technology applications. This position is a key technical role responsible for translating open-ended business and technical challenges into scalable AI system architectures, decision-support tools, digital workflows, and production-ready analytical solutions.
The ideal candidate will have advanced working knowledge of data analytics, modern machine learning algorithms, foundation models, large language models, vision-language models, small language models, optimization methods, operations research, and modern decision science. This role will work closely with subject-matter experts, product champions, product managers, designers, and software engineers to develop AI-enabled solutions that support CNPC USA technology development, product commercialization, and energy-domain digital transformation.
Key Responsibilities:

Conduct exploratory and undirected technology development to address open-ended AI/ML problems and questions in the energy domain.
Participate in data science, artificial intelligence, machine learning, industrial analytics, decision science, and operations research initiatives.
Develop, prototype, and evaluate solutions using modern deep learning methods, foundation models, generative AI, modern NLP, vision-language models, small language models, and time-series analytics.
Research and assess next-generation technologies for inference, predictive modeling, general-purpose data-driven modeling, and optimization of complex systems.
Engineer appropriate system-level AI solutions in collaboration with subject-matter experts, product champions, product managers, designers, and software engineers.
Work with software engineering teams to integrate AI solutions into business workflows, cloud environments, data platforms, and production applications.
Prototype end-to-end data solutions across multiple cross-functional teams in high-visibility roles.
Generate innovative ideas, establish new technology development directions, and shape and execute technical projects from concept through deployment.
Maintain state-of-the-art knowledge and contribute to technical discussions, architecture reviews, project reviews, and expert assessments in related areas of responsibility.
Communicate sophisticated AI concepts, plans, recommendations, and results effectively to management, clients, technical stakeholders, and the broader business community.
Prepare oral and written reports, presentations, technical memoranda, project documentation, and executive-level summaries.
Work effectively with peers, management, operations groups, and outside organizations to advance technology development and deployment.
Participate in relevant technical reviews and audits of projects as requested.
Review, mentor, and coach junior team members while defining and promoting standards, best practices, reusable architectures, and lessons learned.
Actively disseminate knowledge through webinars, talks, tutorials, technical communities, and internal training activities.

Minimum Education & Experience Requirements:

Master's degree in Operations Research, Industrial Engineering, Applied Mathematics, Computer Science, or a related STEM field, or foreign equivalent.
Three (3) years of post-baccalaureate experience in the job offered or in any AI/data science-related job title.

Applicants must have three (3) years of experience in each of the following:

AI and data science in the decision science and operations research space using software implementation technology.
Markov decision process methods and applications.
Data mining for analytics and decision making.
LLM-based generative AI solution development.
Vision-language model and small language model system development.
Modern NLP development in AI.
Computational intelligence and non-convex optimization techniques.
Time-series analysis techniques using statistics and AI.
Applied mathematics and statistics.
Cloud development tools and cloud environments for AI, data mining, and large-scale data systems.
Optimization solver tools, including CPLEX.
Programming languages and frameworks for modern AI and data science, including Python, R, TensorFlow, and PyTorch.

Preferred Experience:

Experience applying AI/ML, optimization, and decision science to oilfield, drilling, completion, reservoir, production, or other oil and gas related domains.
Experience architecting end-to-end AI systems, including data pipelines, model development, model serving, evaluation, monitoring, and workflow integration.
Experience with generative AI application patterns such as retrieval-augmented generation, domain-specific copilots, multimodal AI workflows, and human-in-the-loop decision support.
Experience translating ambiguous business needs into technical roadmaps, architecture options, proof-of-concept demonstrations, and scalable implementation plans.
Experience leading cross-functional technical discussions and mentoring engineers or data scientists on AI solution design and best practices.

Physical Demands:
The physical demands described here are representative of those that must be met by an employee to successfully perform the essential job functions.
While performing the duties of this job, the employee is regularly required to talk or hear. This is a sedentary role; however, some filing, bending and the ability to lift 20 lbs. is required.
Travel:
This position requires 5-10% domestic and international travel for internal workshops, project work sessions, technical workshops, conferences, and customer presentations. Local travel between other CNPC USA locations and testing or partner facilities may be required.
Work Arrangement:
Telecommuting is permitted less than 50% per week within the same geographic location as the assigned CNPC USA office location.
Supervisory Responsibility:
This position has no direct supervisory responsibilities; however, it does act as a mentor and technical point of contact for less experienced engineers, data scientists, and AI/ML team members.
CNPC USA is an Equal Opportunity Employer (EOE). Qualified applicants are considered regardless of race, color, age, sex, sexual orientation, religion, disability, ethnicity, national origin, marital status, veteran status, or any other legally protected status.
Disclaimer: The job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Other duties, responsibilities and activities may change or be assigned at any time with or without notice.

Apply