Text To Speech Jobs (NOW HIRING)

Deep Learning Scientist, Speech Synthesis

The ideal candidate has strong experience in speech synthesis (Text-to-Speech) or Speech-to-Text , deep learning, and Python development. Success in this role requires the ability to analyze model ...

Catapult Solutions Group

Deep Learning Scientist, Speech Synthesis

Santa Clara, CA · On-site +1

Catapult Solutions Group

Deep Learning Scientist, Speech Synthesis

Santa Clara, CA · On-site

Catapult Solutions Group

Deep Learning Scientist, Speech Synthesis

Santa Clara, CA · On-site

Deepgram

Research Engineer, Machine Learning Systems

Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text (STT) and text-to-speech (TTS). They are seeking a highly ...

Deepgram

Research Engineer, Machine Learning Systems

Speechify

Software Engineer, Platform - Cary, NC, USA

Speechify is a company dedicated to making reading accessible for everyone through its innovative text-to-speech products. The role involves building and maintaining backend services to enhance user ...

Speechify

Software Engineer, Platform - Cary, NC, USA

DEEPREC.AI

ML Researcher, Speech

San Francisco, CA · On-site

$200K - $250K/yr

You'll work across the core building blocks of that roadmap, such as: speech-to-text, text-to-speech, neural audio codecs, and getting LLMs to understand and reason over audio directly. You'll take ...

DEEPREC.AI

ML Researcher, Speech

San Francisco, CA · On-site

$200K - $250K/yr

Delviom LLC

Sr AI/ML Engineer with Python + AWS

Plano, TX · On-site

Understanding of chat bots (speech to text, text to speech) * Strong understanding of API design (REST, GraphQL), microservices architecture, and event-driven systems * Must have Hands-on experience ...

Delviom LLC

Sr AI/ML Engineer with Python + AWS

Plano, TX · On-site

Pioneer Bible Translators

Human Language Technology Specialist

Dallas, TX · On-site

Whether text-to-text, text-to-speech, speech-to-text, or speech-to-speech, machine translation can help overlooked communities finally be understood in the world. HLT will bring critical educational ...

Pioneer Bible Translators

Human Language Technology Specialist

Dallas, TX · On-site

ASAPP

Lead AI/ML Engineer

Mountain View, CA · On-site

$170K - $190K/yr

You will lead the design and delivery of end-to-end voice AI solutions, combining large language models with speech technologies such as speech-to-text, text-to-speech, and real-time streaming audio ...

Quick apply

ASAPP

Lead AI/ML Engineer

Mountain View, CA · On-site

$170K - $190K/yr

Weekday AI

Southern American English Voice Actor (AI Speech & Voice Modeling)

$50/hr

This role is for one of our clients Compensation: $50 per hour Join an innovative AI research initiative focused on developing next-generation text-to-speech (TTS) systems capable of producing ...

Weekday AI

Southern American English Voice Actor (AI Speech & Voice Modeling)

$50/hr

Real Soft, Inc.

Information Technology_USA - USA_Engineer

Jacksonville, FL · On-site

$49 - $67/hr

... text/text-to-speech. Role Descriptions: 4+ years of commercial software development experience.Design and implement scalable CCaaS and IVA solutions leveraging leading Cloud and enterprise ...

Real Soft, Inc.

Information Technology_USA - USA_Engineer

Jacksonville, FL · On-site

$49 - $67/hr

Zyphra

Research Engineer - Audio & Speech Models

San Francisco, CA · On-site

Expertise and intuition for training models in the audio domain, including text-to-speech, ASR, speech-to-speech, speech-emotion-recognition, or other models * Experience in training audio ...

Quick apply

Zyphra

Research Engineer - Audio & Speech Models

San Francisco, CA · On-site

Zyphra Technologies Inc

Research Engineer - Audio & Speech Models

San Francisco, CA · On-site

Zyphra Technologies Inc

Research Engineer - Audio & Speech Models

San Francisco, CA · On-site

Real Soft, Inc.

Information Technology_USA - USA_Product Architect

Jacksonville, FL · On-site

... Text-to-Speech), and NLP/LLM pipelines. • Create frameworks for conversational flows, prompt engineering, retrieval-augmented generation (RAG), and context management. Solution Development • ...

Real Soft, Inc.

Information Technology_USA - USA_Product Architect

Jacksonville, FL · On-site

TriOptus LLC

Voice Applications Operations Engineer

Plano, TX · On-site

Speech recognition & Text to Speech * Web & Cloud technologies Skills: * IVR, Genesys, Java, Telephony, SQL, UNIX, Windows, Oracle

TriOptus LLC

Voice Applications Operations Engineer

Plano, TX · On-site

Speech recognition & Text to Speech * Web & Cloud technologies Skills: * IVR, Genesys, Java, Telephony, SQL, UNIX, Windows, Oracle

Deepgram

Software Engineer - Deepgram for Restaurants

Deepgram is the leading voice AI platform for developers building speech-to-text and text-to-speech offerings. They are seeking a Software Engineer to join their new business unit focused on ...

Deepgram

Software Engineer - Deepgram for Restaurants

Deepgram

Software Engineer - Applied AI (Senior or Staff Level)

Manhattan, NY · On-site

$134K - $177K/yr

Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text and text-to-speech. They are seeking a Software Engineer to ...

Deepgram

Software Engineer - Applied AI (Senior or Staff Level)

Manhattan, NY · On-site

$134K - $177K/yr

Speechify

Software Engineer, Platform - Virginia Beach, VA, USA

Virginia Beach, VA

Speechify's text-to-speech reading products include its iOS app, Android App, Mac App, Chrome Extension, and Web App. Google recently named Speechify the Chrome Extension of the Year and Apple named ...

Speechify

Software Engineer, Platform - Virginia Beach, VA, USA

Virginia Beach, VA

Speechify

Software Engineer, Platform - Saint Paul, MN, USA

Saint Paul, MN · On-site

Speechify

Software Engineer, Platform - Saint Paul, MN, USA

Saint Paul, MN · On-site

Speechify

Software Engineer, Platform - Tulsa, OK, USA

Tulsa, OK · On-site

Speechify

Software Engineer, Platform - Tulsa, OK, USA

Tulsa, OK · On-site

Speechify

Software Engineer, Platform - College Park, MD, USA

College Park, MD

Speechify

Software Engineer, Platform - College Park, MD, USA

College Park, MD

Showing results 1-20

Text To Speech Jobs

Text To Speech information

What is a Text To Speech (TTS) job?

A Text To Speech (TTS) job typically involves converting written text into spoken audio using specialized software or AI technology. Professionals in this field may work on developing, fine-tuning, or implementing TTS systems for various applications, such as virtual assistants, accessibility tools, or audiobooks. The role can also include tasks like voice data collection, script editing, and quality assurance of generated speech. TTS jobs are important for making digital content more accessible to people with visual impairments or reading difficulties. The field combines elements of linguistics, software engineering, and artificial intelligence.

What are some common challenges faced by professionals working in Text to Speech (TTS) development roles?

Professionals in Text to Speech development often encounter challenges such as fine-tuning synthetic voices to sound natural and expressive, handling diverse accents or languages, and optimizing algorithms for various platforms. Collaboration with linguists, UX designers, and software engineers is frequent, as ensuring accessibility and seamless integration across applications is a top priority. Staying updated on advances in AI and deep learning is essential, as the field evolves rapidly and demands continuous improvement in both technical and creative aspects.

What is the difference between Text To Speech vs Voice Actor?

Aspect	Text To Speech	Voice Actor
Required Credentials	None or basic audio editing skills	Voice training, acting skills, often professional demos
Work Environment	Software, digital platforms, remote	Recording studios, on-location, remote
Industry Usage	Automation, AI, tech companies	Media, entertainment, advertising
Search & Comparison Intent	Automated voice solutions, TTS technology	Voice acting, narration, character voices

Text To Speech involves using software to convert written text into spoken words, primarily for automation and digital applications. Voice Actors, on the other hand, provide human voice recordings for media, entertainment, and advertising. While TTS is tech-driven and often used in AI and accessibility tools, Voice Actors bring emotional nuance and personality to their performances. Both roles are essential in their respective industries, but they differ significantly in skills, environment, and purpose.

What are the key skills and qualifications needed to thrive as a Text to Speech Engineer, and why are they important?

To thrive as a Text to Speech Engineer, you need a strong background in computer science, linguistics, and digital signal processing, often supported by a relevant degree. Experience with machine learning frameworks, speech synthesis toolkits (like Tacotron or WaveNet), and programming languages such as Python or C++ is typically required. Creativity, analytical thinking, and cross-functional communication skills help you collaborate with diverse teams and innovate in voice technology. These skills ensure the development of accurate, natural-sounding speech systems that meet user and client needs.

More about Text To Speech jobs

The 10 Top Types Of Text To Speech Jobs

What cities are hiring for Text To Speech jobs? Cities with the most Text To Speech job openings:

What states have the most Text To Speech jobs? States with the most job openings for Text To Speech jobs include:

What job categories do people searching Text To Speech jobs look for? The top searched job categories for Text To Speech jobs are:

Text To Speech jobs near you

Infographic showing various Text To Speech job openings in the United States as of July 2026, with employment types broken down into 4% As Needed, 64% Full Time, 23% Part Time, 1% Temporary, and 8% Contract. Highlights an 95% Physical, and 5% Remote job distribution.

Deep Learning Scientist, Speech Synthesis

Catapult Solutions Group

Santa Clara, CA • On-site, Remote

Apply

Contractor

Posted 25 days ago

Job description

Deep Learning Scientist - Speech Synthesis
Location: 100% Remote (Anywhere in the U.S.)
Duration: 6-Month Contract
Position Overview
We are seeking a Deep Learning Scientist - Speech Synthesis to support the development of next-generation speech AI technologies. This role focuses on training and optimizing speech models, improving model performance, and solving complex machine learning challenges related to speech applications.
The ideal candidate has strong experience in speech synthesis (Text-to-Speech) or Speech-to-Text, deep learning, and Python development. Success in this role requires the ability to analyze model behavior, diagnose training issues, and improve model performance-not just collect or evaluate data.
Key Responsibilities

Train and optimize speech synthesis models, including mel spectrogram and vocoder models.
Analyze training metrics, validation losses, and model performance to identify root causes of model issues and recommend improvements.
Benchmark and optimize speech models across multiple use cases.
Improve speech data preparation, augmentation, filtering, and dataset quality.
Develop and refine high-quality training datasets for speech AI models.
Measure and characterize model accuracy, quality, and bias.
Collaborate with cross-functional teams to develop and deliver new speech AI features.
Participate in software development, design reviews, testing, and code reviews.
Troubleshoot technical issues and contribute to continuous model improvements.

Required Qualifications

Master's degree or Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Applied Mathematics, Linguistics, Computational Linguistics, or a related field (or equivalent experience).
3+ years of relevant industry experience.
Strong Python programming skills.
Strong understanding of machine learning and deep learning concepts.
Experience with Text-to-Speech (TTS), Speech Synthesis, or Speech-to-Text (STT) technologies.
Hands-on experience training deep learning models using PyTorch.
Ability to analyze training behavior, validation losses, and model performance to troubleshoot and improve machine learning models.
Knowledge of speech signal processing concepts, including FFT, MFCC, and mel spectrograms.
Strong understanding of software development fundamentals.
Experience using version control systems such as Git, Gerrit, or GitLab.
Excellent communication and collaboration skills.

Preferred Qualifications

Experience with deep learning architectures such as CNNs, RNNs, LSTMs, and Transformers.
Experience with voice cloning or multilingual speech systems.
Knowledge of text normalization (TN), inverse text normalization (ITN), or grapheme-to-phoneme (G2P) systems.
Fluency in one or more languages such as Spanish, Mandarin, German, Japanese, Russian, French, Arabic, Hindi, Korean, Italian, or Portuguese.
Interest in linguistics, phonetics, and speech technologies.
Strong C++ programming skills.
Familiarity with GPU technologies such as CUDA, cuDNN, or TensorRT.
Experience deploying machine learning models to cloud, data center, or embedded environments.

What We're Looking For
The ideal candidate is someone who enjoys solving difficult machine learning problems and has hands-on experience training speech models. Beyond building models, we're looking for someone who can investigate why a model is underperforming, analyze validation losses, identify root causes, and improve overall model quality and performance.
Additional Information

100% remote position within the United States.
No specific U.S. time zone requirement.
This is a contract opportunity.
Opportunity to contribute to cutting-edge speech AI and deep learning technologies.

About Catapult Solutions Group

Sourced by ZipRecruiter

Industry

Recruiting and staffing services

Company size

201 - 500 Employees

Headquarters location

Plano, TX, US

Year founded

2013

Website

catapultsg.com

Social media

View All Catapult Solutions Group Jobs

Apply

Text To Speech Jobs (NOW HIRING)

Deep Learning Scientist, Speech Synthesis

Deep Learning Scientist, Speech Synthesis

Deep Learning Scientist, Speech Synthesis

Deep Learning Scientist, Speech Synthesis

Research Engineer, Machine Learning Systems

Research Engineer, Machine Learning Systems

Software Engineer, Platform - Cary, NC, USA

Software Engineer, Platform - Cary, NC, USA

ML Researcher, Speech

ML Researcher, Speech

Sr AI/ML Engineer with Python + AWS

Sr AI/ML Engineer with Python + AWS

Human Language Technology Specialist

Human Language Technology Specialist

Lead AI/ML Engineer

Lead AI/ML Engineer

Southern American English Voice Actor (AI Speech & Voice Modeling)

Southern American English Voice Actor (AI Speech & Voice Modeling)

Information Technology_USA - USA_Engineer

Information Technology_USA - USA_Engineer

Research Engineer - Audio & Speech Models

Research Engineer - Audio & Speech Models

Research Engineer - Audio & Speech Models

Research Engineer - Audio & Speech Models

Information Technology_USA - USA_Product Architect

Information Technology_USA - USA_Product Architect

Voice Applications Operations Engineer

Voice Applications Operations Engineer

Software Engineer - Deepgram for Restaurants

Software Engineer - Deepgram for Restaurants

Software Engineer - Applied AI (Senior or Staff Level)

Software Engineer - Applied AI (Senior or Staff Level)

Software Engineer, Platform - Virginia Beach, VA, USA

Software Engineer, Platform - Virginia Beach, VA, USA

Software Engineer, Platform - Saint Paul, MN, USA

Software Engineer, Platform - Saint Paul, MN, USA

Software Engineer, Platform - Tulsa, OK, USA

Software Engineer, Platform - Tulsa, OK, USA

Software Engineer, Platform - College Park, MD, USA

Software Engineer, Platform - College Park, MD, USA

Text To Speech information

What is a Text To Speech (TTS) job?

What are some common challenges faced by professionals working in Text to Speech (TTS) development roles?

What is the difference between Text To Speech vs Voice Actor?

What are the key skills and qualifications needed to thrive as a Text to Speech Engineer, and why are they important?

Deep Learning Scientist, Speech Synthesis

Share this job

Job description

About Catapult Solutions Group

Industry

Company size

Headquarters location

Year founded

Website

Social media

Share this job