1

Image Captioning Jobs (NOW HIRING)

... image processing, computer vision, data science & analytics, distributed systems, cloud, edge ... captioning, etc. would be highly beneficial. We are looking for candidates with experience in any ...

Digital Network Operator

Santa Monica, CA

$20.25 - $23.25/hr

Additional responsibilities include enabling and disabling Closed Captioning as needed to meet ... Execute daily content operations processes, including image replacement and descriptions as well as ...

next page

Showing results 1-20

People also search for

Image Captioning information

See salary details

$19

$46

$69

How much do image captioning jobs pay per hour?

As of Jun 11, 2026, the average hourly pay for image captioning in the United States is $46.80, according to ZipRecruiter salary data. Most workers in this role earn between $38.22 and $52.16 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive in the Image Captioning position, and why are they important?

To thrive in an Image Captioning role, you need strong attention to detail, language proficiency, and an ability to interpret visual content accurately. Familiarity with digital annotation tools, content management systems, or image labeling platforms is often required. Exceptional communication and time management skills help you handle large volumes of images and collaborate with team members or editors. These abilities ensure captions are clear, contextually relevant, and consistently meet quality and deadline standards.

What are the typical responsibilities of someone working in image captioning?

Professionals in image captioning are primarily responsible for examining photos, graphics, or other visual data and crafting concise, accurate, and contextually appropriate captions. This process often involves using specialized software to annotate or tag images, ensuring consistency with style guidelines, and collaborating with editors, data teams, or project managers to align with project objectives. Daily tasks may also include reviewing and revising captions based on feedback, managing large batches of content, and maintaining organization within digital asset systems. The role is detail-oriented and can be performed individually or as part of a larger content or machine learning team depending on the employer.

How much do caption writers get paid?

The pay for image captioning writers varies depending on experience, project complexity, and whether they work freelance or full-time. Freelance caption writers typically earn between $0.05 and $0.50 per caption, while full-time positions may offer salaries ranging from $30,000 to $70,000 annually. Skills in image analysis and familiarity with captioning tools can influence earnings.

How to get hired as a caption typer?

To get hired as a caption typer, develop strong typing skills, attention to detail, and familiarity with captioning software or tools. Building a portfolio of sample captions and applying to companies that offer remote captioning jobs can improve your chances of employment.

Can I get paid to caption videos?

Image captioning jobs typically involve creating descriptive text for images or videos and can be paid positions or freelance opportunities. Payment depends on the employer, project scope, and your skills in visual analysis and language. Some roles may require familiarity with captioning tools or software and a good understanding of visual content.

What is an Image Captioning job?

An Image Captioning job involves generating descriptive text for images using artificial intelligence or human expertise. Professionals in this field work with machine learning models, datasets, and natural language processing to create accurate and contextually relevant captions. This role is essential for improving accessibility, content organization, and searchability of visual media. It is commonly used in applications like social media, e-commerce, and automated reporting.

More about Image Captioning jobs
What are the most commonly searched types of Image Captioning jobs? The most popular types of Image Captioning jobs are:

Senior Multimodal AI Researcher, Audio

Dolby

Atlanta, GA โ€ข On-site

Full-time

Posted 18 hours ago


Job description

Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you'll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent. We're big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
The Advanced Technology Group (ATG) is the research division of the company. ATG's mission is to look ahead, deliver insights, and innovate technological solutions that will fuel Dolby's continued growth. Our researchers have a broad range of expertise related to computer science and electrical engineering, such as AI/ML, algorithms, digital signal processing, audio engineering, image processing, computer vision, data science & analytics, distributed systems, cloud, edge & mobile computing, computer networking, and IoT.
Dolby is looking for a talented Senior Multimodal AI Researcher, Audio to join Dolby's research efforts and drive innovation in multimodal AI for audio applications, multimodal representations, and generative modeling for audio, speech, and music. You will join the Machine Reasoning and Perception team to join a team of top-tier researchers working on challenging problems in multimodal AI for entertainment applications. You will focus on the creation and implementation of multimodal and audio AI technologies from the underlying theoretical concepts to the development of prototypes and demonstrations, with the goal to create new experiences.
You will drive key innovations for Dolby's core business which allow Dolby and its customers to build products that push the boundaries of sound and multimedia experiences.
Summary
You will push the boundaries of the state-of-the-art in audio and multimodal technologies. The ideal candidate would have a strong background in deep learning, both in terms of conceptual understanding, as well as practical experience, with previous exposure to audio applications. A core aspect of this role involves being able to keep up to date with the literature, implement, and innovate with the bleeding edge in generative models, self-supervised learning, and multi-modal learning.
With the explosion of large language models and natural language processing, you will partner closely with Dolby's worldwide AI research staff, which actively pursues the integration of such models into audio and media experiences. You will be able to hit the ground running, innovate, and contribute to such projects. Consequently, experience with language models, question answering, vision-language models, captioning, etc. would be highly beneficial.
We are looking for candidates with experience in any of the following:
  • Generative modeling for audio applications (diffusion models, autoregressive models, masked generative transformers).
  • Multimodal semantic understanding and multimodal reasoning.
  • Multimodal representations (audio-video, audio-text, audio-video-text).
  • Multimodal AI architectures, with a focus on generating audio, music, and speech (text-to-audio, video-to-audio, image-to-audio).
  • Self and semi-supervised learning.
  • AI driven audio enhancement, processing, and generation (for speech and music), such as speech enhancement and analysis, source separation, text-to-speech, text-to-music, music information retrieval, audio classification.
  • LLMs for audio applications.

What You Will Accomplish
  • Partner closely with other domain experts to refine and execute Dolby's technical strategy in artificial intelligence and machine learning.
  • Use deep learning to create new solutions (including foundation models) and enhance existing applications.
  • Push the state-of-the-art and develop intellectual property.
  • Transfer technology to product groups.
  • Establish research collaborations with external university partners.
  • Mentor interns on novel research problems.
  • Publish papers in top-tier conferences and journals.
  • Advise internal leaders on recent deep learning advancements in the industry and academia to further influence research direction and business decisions.

Key Requirements
  • Ph.D. in Computer Science or similar field.
  • A strong background in deep learning, both in terms of conceptual understanding, as well as practical experience.
  • Technical knowledge of audio fundamentals.
  • Deep passion for audio, music, and multimedia applications.
  • Deep knowledge on current machine learning literature.
  • Strong publication record, with publications in major machine learning conferences (e.g. NeurIPS, ICLR, ICML) or top domain-specific conferences is desirable (e.g., ACL, CVPR, ICASSP, Interspeech).
  • Highly skilled in Python and one or more popular deep learning frameworks (TensorFlow or PyTorch).
  • Ability to envision new technologies and turn them into innovative products.
  • Good communication and collaboration skills.

Learn more about our innovative research: https://www.dolby.com/about/innovation/empowering/
The Atlanta Area base salary range for this full-time position is $140,700-$170,000 , which can vary if outside this location,plus bonus, benefits, and some roles may also include equity. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, competencies, experience, market demands, internal parity, and relevant education or training. Your recruiter can share more about the specific salary range and perks and benefits for your location during the hiring process.
Dolby will consider qualified applicants with criminal histories in a manner consistent with the requirements of San Francisco Police Code, Article 49, and Administrative Code, Article 12
Equal Employment Opportunity:
Dolby is proud to be an equal opportunity employer. Our success depends on the combined skills and talents of all our employees. We are committed to making employment decisions without regard to race, religious creed, color, age, sex, sexual orientation, gender identity, national origin, religion, marital status, family status, medical condition, disability, military service, pregnancy, childbirth and related medical conditions or any other classification protected by federal, state, and local laws and ordinances.