1

Voice Model Jobs (NOW HIRING)

You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices ...

You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions -- expressive, multilingual, and reliable across devices ...

Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...

Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...

Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...

AI Engineer

Pittsburgh, PA · Remote

$70 - $76/hr

In this role, you will work on developing, training, and refining AI models for voice synthesis, voice cloning, speech recognition, and/or voice transformation. Your work will contribute to cutting ...

Research Engineer, Voice

Palo Alto, CA · On-site

$225K - $325K/yr

Pi is a personal AI agent powered by Inflection AI's foundation model, proving that AI can be ... Research, develop, and optimize neural models for voice and audio-including text-to-speech ...

Research Engineer, Voice

Palo Alto, CA · On-site

$225K - $325K/yr

Research, develop, and optimize neural models for voice and audio-including text-to-speech, automatic speech recognition, audio generation, and spoken dialogue systems. * Build and maintain ...

Building and optimizing voice agents from scratch, integrating Large Language Models (LLMs), and ensuring low-latency processing of Automatic Speech Recognition (ASR) and Text-to-Speech (TTS)

next page

Showing results 1-20

Voice Model information

See salary details

$5

$48

$76

How much do voice model jobs pay per hour?

As of Jun 4, 2026, the average hourly pay for voice model in the United States is $48.17, according to ZipRecruiter salary data. Most workers in this role earn between $39.18 and $60.10 per hour, depending on experience, location, and employer.

What is a Voice Model job?

A Voice Model job involves providing high-quality voice recordings that are used to create or enhance AI-powered speech systems. These recordings help train text-to-speech (TTS) models, virtual assistants, and other voice-enabled applications. Voice Models may work on projects requiring natural speech, emotional expression, or specific accents and tones. The role can involve reading scripts, responding to prompts, or improvising speech patterns to capture a variety of vocal nuances.

What are the key skills and qualifications needed to thrive in the Voice Model position, and why are they important?

To thrive as a Voice Model, you need an excellent command of vocal techniques, clear diction, and the ability to adjust your voice for various styles or characters, often supported by formal vocal or acting training. Familiarity with professional recording equipment, audio editing software, and sometimes home studio setups is essential. Adaptability, reliability, and taking direction well are important soft skills for succeeding in client-driven environments. These skills and qualities ensure Voice Models can deliver high-quality vocal performances that meet diverse client needs across advertising, entertainment, and media industries.

What are the typical work arrangements and environments for a Voice Model?

Voice Models often work on a freelance basis or as part of talent agencies, providing voice recordings for commercials, animations, video games, and audiobooks. Many professionals operate from home studios using specialized equipment, while some projects require sessions at recording studios with directors and sound engineers present. Flexibility is important, as schedules can include last-minute bookings and varied project durations. Collaboration with creative teams, such as producers and scriptwriters, is common to ensure that the final product matches the intended vision. This dynamic environment offers both autonomy and opportunities for skill development across different media industries.

Does voices.com actually pay?

Voices.com is a platform that connects voice actors with clients and generally facilitates payments for completed work. Payment terms and schedules depend on individual projects and agreements made between voice models and clients, with the platform often acting as an intermediary. Voice models should review platform policies and client contracts to understand payment processes fully.
What cities are hiring for Voice Model jobs? Cities with the most Voice Model job openings:
What states have the most Voice Model jobs? States with the most job openings for Voice Model jobs include:
Infographic showing various Voice Model job openings in the United States as of May 2026, with employment types broken down into 73% Full Time, 18% Part Time, and 9% Temporary. Highlights an 100% In-person job distribution, with an average salary of $100,198 per year, or $48.2 per hour.

Member of Technical Staff - Voice Model

xAI

Palo Alto, CA

$150K - $450K/yr

Other

Medical, Dental, Vision, Life, Retirement

Posted 19 days ago


Job description

ABOUT THE ROLE:

You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices and real-time scenarios. We own the full training pipeline: massive data curation, premium audio processing, frontier speech-language pre-training, and intensive post-training to push quality, speed, and stability to the limit.

Our goal: make talking to AI feel like conversing with the most charming, kind, and knowledgeable person imaginable. We're seeking exceptionally smart, execution-oriented engineers to help us get there.

RESPONSIBILITIES:
  • Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
  • Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
  • Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
  • Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.
BASIC QUALIFICATIONS:
  • Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
  • Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
  • Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
  • Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
  • Experience building or working with large-scale distributed training and inference systems on Kubernetes.
  • Proactive, self-driven attitude - ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.
COMPENSATION AND BENEFITS:

$150,000 - $450,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.