1

Image Captioning Jobs (NOW HIRING)

... image captioning, question answering, language models, etc. Learn more about our innovative research: The Atlanta Area base salary range for this full-time position is $137,500-$168,200, which can ...

Experience with multimodal and vision-language models for image understanding, captioning, or visual analysis. * Experience with cloud-based AI infrastructure for training, fine-tuning, and serving ...

Experience with multimodal and vision-language models for image understanding, captioning, or visual analysis. * Experience with cloud-based AI infrastructure for training, fine-tuning, and serving ...

next page

Showing results 1-20

People also search for

Image Captioning information

See salary details

$19

$46

$69

How much do image captioning jobs pay per hour?

As of Jun 10, 2026, the average hourly pay for image captioning in the United States is $46.80, according to ZipRecruiter salary data. Most workers in this role earn between $38.22 and $52.16 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive in the Image Captioning position, and why are they important?

To thrive in an Image Captioning role, you need strong attention to detail, language proficiency, and an ability to interpret visual content accurately. Familiarity with digital annotation tools, content management systems, or image labeling platforms is often required. Exceptional communication and time management skills help you handle large volumes of images and collaborate with team members or editors. These abilities ensure captions are clear, contextually relevant, and consistently meet quality and deadline standards.

What are the typical responsibilities of someone working in image captioning?

Professionals in image captioning are primarily responsible for examining photos, graphics, or other visual data and crafting concise, accurate, and contextually appropriate captions. This process often involves using specialized software to annotate or tag images, ensuring consistency with style guidelines, and collaborating with editors, data teams, or project managers to align with project objectives. Daily tasks may also include reviewing and revising captions based on feedback, managing large batches of content, and maintaining organization within digital asset systems. The role is detail-oriented and can be performed individually or as part of a larger content or machine learning team depending on the employer.

What is an Image Captioning job?

An Image Captioning job involves generating descriptive text for images using artificial intelligence or human expertise. Professionals in this field work with machine learning models, datasets, and natural language processing to create accurate and contextually relevant captions. This role is essential for improving accessibility, content organization, and searchability of visual media. It is commonly used in applications like social media, e-commerce, and automated reporting.

More about Image Captioning jobs
What are the most commonly searched types of Image Captioning jobs? The most popular types of Image Captioning jobs are:
Machine Learning Engineer - Camera & Photos, Creative Foundations

Machine Learning Engineer - Camera & Photos, Creative Foundations

Apple

San Diego, CA • On-site

Full-time

Posted 19 days ago


Apple rating

8.1

Company rating: 8.1 out of 10

Based on 661 frontline employees who took The Breakroom Quiz

6th of 30 rated technology retailers


Job description

We're looking for a Machine Learning Engineer and Researcher to join the Creative Foundations team within Camera & Photos. In this role, you won't just implement models - you'll invent them. You'll work at the intersection of cutting-edge ML research and the features that hundreds of millions of people use every day to capture, relive, and share their most meaningful moments...This is a role for someone who gets excited about turning a theoretical breakthrough into a magical user experience. You'll bridge the gap between what's possible in research and what's shippable in product - translating state-of-the-art advances in image understanding into intelligent systems that feel intuitive and delightful. We're especially drawn to those with a passion for photography and experience with the visual and creative domains that make this work so meaningful..
As a Machine Learning Engineer on the Creative Foundations team, you will pioneer novel approaches to image understanding - designing architectures, training strategies, and intelligent systems that push the boundaries of what our camera and photo experiences can do. You'll continuously survey state-of-the-art research, rapidly prototype high-potential ideas, and translate them into shippable features - while also leveraging model introspection and interpretability techniques to deeply understand why models behave the way they do and guide decisions accordingly. You'll collaborate across disciplines with product designers, software engineers, and aesthetic science researchers in an environment that values diverse perspectives, research rigor, and agility in an ever-evolving ML landscape.
MS or PhD in Computer Science, Machine Learning, Artificial Intelligence, Electrical Engineering, Applied Mathematics, Statistics, or a related field - or equivalent practical experience demonstrating deep ML expertise.Experience in machine learning, computer vision, or a related field (academic or industry), with a strong portfolio of building and shipping models or publishing research.Deep understanding of modern ML architectures and techniques - including (but not limited to) transformers, diffusion models, contrastive learning, multi-modal models, and efficient neural network design and optimization.Proficiency in ML frameworks such as PyTorch, and comfort working across the full model lifecycle from research exploration using large-scale data to production deployment.Experience with image understanding tasks such as semantic segmentation, scene recognition, image captioning, visual question answering, image aesthetics, or image retrieval.Strong fundamental software engineering background
A track record of creative problem-solving - taking an ambiguous challenge and finding an elegant, sometimes unconventional, ML-driven solution.A genuine passion for pushing the boundaries of what's possible with machine learning and a deep curiosity for how intelligent systems can transform everyday experiences.Published research at top-tier venues (CVPR, ICCV, ECCV, NeurIPS, ICML, SIGGRAPH, etc.) is valued - but so is a strong portfolio of impactful shipped features or open-source contributions.Comfort navigating ambiguity and working in a fast-moving R&D environment where the problem definition evolves alongside the solution.A personal connection to photography or visual storytelling - whether through a creative practice, a deep appreciation for the craft, or simply an obsession with what makes a great image.Specific computer vision experience in the areas of Semantic Image Understanding, Diffusion for Image Generation, Style Transfer, Computational Photography, Image Enhancement (Super-Resolution, Eenoising, etc.), Aesthetic Quality Assessment, Personalization (Few-Shot Adaptation)

What Apple employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Apple logo

About Apple

Sourced by ZipRecruiter

Imagine what you could do here! At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Dynamic, intelligent people and inspiring, innovative technologies are the norm here. The people who work here have reinvented entire industries with all Apple Hardware products. The same real passion for innovation that goes into our products also applies to our practices strengthening our dedication to leave the world better than we found it.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Cupertino, CA, US

Year founded

1976