The Member of Technical Staff - Voice Model will join the Grok Voice Model team to develop advanced voice AI systems, ensuring high-quality spoken interactions through innovative data processing and ...
The Member of Technical Staff - Voice Model will join the Grok Voice Model team to develop advanced voice AI systems, ensuring high-quality spoken interactions through innovative data processing and ...
Member of Technical Staff - Voice Model
Palo Alto, CA · On-site
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices ...
Member of Technical Staff - Voice Model
Palo Alto, CA · On-site
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices ...
The role involves joining the Grok Voice Model team to develop advanced voice AI, focusing on building and refining speech data curation and processing pipelines for high-quality model training and ...
The role involves joining the Grok Voice Model team to develop advanced voice AI, focusing on building and refining speech data curation and processing pipelines for high-quality model training and ...
Member of Technical Staff - Voice Model
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions -- expressive, multilingual, and reliable across devices ...
Quick apply
Apply Early
Member of Technical Staff - Voice Model
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions -- expressive, multilingual, and reliable across devices ...
Apply Early
Member of Technical Staff - Voice Model
Palo Alto, CA · On-site
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices ...
Member of Technical Staff - Voice Model
Palo Alto, CA · On-site
$150K - $450K/yr
You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth, natural, low-latency spoken interactions - expressive, multilingual, and reliable across devices ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
Responsibilities : • Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. • Work directly with state-of-the-art accelerators (H100s, H200s ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
Responsibilities : • Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. • Work directly with state-of-the-art accelerators (H100s, H200s ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Senior Machine Learning Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Software Engineer - Voice AI (Inference Runtime)
New York, NY · On-site
$165K - $330K/yr
Develop world-class model serving stack for state-of-the-art open-source voice models - reduce end-to-end and tail latency (p95/p99), increase throughput, and improve GPU efficiency via profiling ...
Software Engineer - Voice AI (Inference Runtime)
New York, NY · On-site
$165K - $330K/yr
Develop world-class model serving stack for state-of-the-art open-source voice models - reduce end-to-end and tail latency (p95/p99), increase throughput, and improve GPU efficiency via profiling ...
Member of Technical Staff - Voice Product
Palo Alto, CA · On-site
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Member of Technical Staff - Voice Product
Palo Alto, CA · On-site
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Member of Technical Staff - Voice Product
$170K - $197K/yr
Responsibilities : • Own backend engineering for scalable, low-latency voice infrastructure and model integrations. • Collaborate directly with Grok Voice model, media, and product teams to ...
Member of Technical Staff - Voice Product
$170K - $197K/yr
Responsibilities : • Own backend engineering for scalable, low-latency voice infrastructure and model integrations. • Collaborate directly with Grok Voice model, media, and product teams to ...
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Own the model serving stack that powers Together's voice platform across STT, TTS, and speech-to-speech. * Work directly with state-of-the-art accelerators (H100s, H200s, B200s) to optimize voice ...
Member of Technical Staff - Voice Product
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Quick apply
Apply Early
Member of Technical Staff - Voice Product
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Apply Early
Member of Technical Staff - Voice Product
Palo Alto, CA · On-site
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Member of Technical Staff - Voice Product
Palo Alto, CA · On-site
$180K - $440K/yr
Collaborate directly with Grok Voice model, media, and product teams to deliver end-to-end experiences. * Drive performance, reliability, and quality of voice interactions at global scale. * Move ...
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
... voice model endpoints that handles bursty, real-time traffic patterns -- accounting for concurrent connection limits, streaming state, and hard latency ceilings. • Implement voice-specific API ...
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
... voice model endpoints that handles bursty, real-time traffic patterns -- accounting for concurrent connection limits, streaming state, and hard latency ceilings. • Implement voice-specific API ...
AI Engineer
Pittsburgh, PA · Remote
$70 - $76/hr
In this role, you will work on developing, training, and refining AI models for voice synthesis, voice cloning, speech recognition, and/or voice transformation. Your work will contribute to cutting ...
Quick apply
Apply Early
AI Engineer
Pittsburgh, PA · Remote
$70 - $76/hr
In this role, you will work on developing, training, and refining AI models for voice synthesis, voice cloning, speech recognition, and/or voice transformation. Your work will contribute to cutting ...
Apply Early
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns - accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$144K - $190K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns - accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Senior Platform Engineer, Voice AI
$144K - $190K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns - accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Senior Platform Engineer, Voice AI
$144K - $190K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns - accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns -- accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Senior Platform Engineer, Voice AI
San Francisco, CA · On-site
$123K - $169K/yr
Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns -- accounting for concurrent connection limits, streaming state, and hard latency ceilings.
Voice Model information
See salary details
$5.29 - $11.80
10% of jobs
$11.80 - $18.31
0% of jobs
$18.31 - $24.83
2% of jobs
$24.83 - $31.34
0% of jobs
$31.34 - $37.85
8% of jobs
$39.38 is the 25th percentile. Wages below this are outliers.
$37.85 - $44.36
18% of jobs
The median wage is $48.34 / hr.
$44.36 - $50.87
19% of jobs
$50.87 - $57.39
16% of jobs
$58.57 is the 75th percentile. Wages above this are outliers.
$57.39 - $63.90
11% of jobs
$63.90 - $70.41
8% of jobs
$70.41 - $76.92
7% of jobs
$5
$48
$76
How much do voice model jobs pay per hour?
Is 24 too old to start voice acting?
What is a Voice Model job?
A Voice Model job involves providing high-quality voice recordings that are used to create or enhance AI-powered speech systems. These recordings help train text-to-speech (TTS) models, virtual assistants, and other voice-enabled applications. Voice Models may work on projects requiring natural speech, emotional expression, or specific accents and tones. The role can involve reading scripts, responding to prompts, or improvising speech patterns to capture a variety of vocal nuances.
How much do TV narrators get paid?
Can I do voice acting with no experience?
What are the typical work arrangements and environments for a Voice Model?
Voice Models often work on a freelance basis or as part of talent agencies, providing voice recordings for commercials, animations, video games, and audiobooks. Many professionals operate from home studios using specialized equipment, while some projects require sessions at recording studios with directors and sound engineers present. Flexibility is important, as schedules can include last-minute bookings and varied project durations. Collaboration with creative teams, such as producers and scriptwriters, is common to ensure that the final product matches the intended vision. This dynamic environment offers both autonomy and opportunities for skill development across different media industries.
How to get hired for voice work?
What are the key skills and qualifications needed to thrive in the Voice Model position, and why are they important?
To thrive as a Voice Model, you need an excellent command of vocal techniques, clear diction, and the ability to adjust your voice for various styles or characters, often supported by formal vocal or acting training. Familiarity with professional recording equipment, audio editing software, and sometimes home studio setups is essential. Adaptability, reliability, and taking direction well are important soft skills for succeeding in client-driven environments. These skills and qualities ensure Voice Models can deliver high-quality vocal performances that meet diverse client needs across advertising, entertainment, and media industries.

Full-time
Posted 15 days ago
Key responsibilities
Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
Build and iterate a comprehensive evaluation framework covering objective metrics, human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
Job description
xAI is focused on creating AI systems that enhance humanity's understanding of the universe. The Member of Technical Staff - Voice Model will join the Grok Voice Model team to develop advanced voice AI systems, ensuring high-quality spoken interactions through innovative data processing and model training techniques.
Responsibilities:
• Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
• Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
• Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
• Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.
Qualifications:
Required:
• Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
• Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
• Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
• Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
• Experience building or working with large-scale distributed training and inference systems on Kubernetes.
• Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.
Company:
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities. It is a sub-organization of SpaceX. Founded in 2023, the company is headquartered in Palo Alto, USA, with a team of 1001-5000 employees. The company is currently Late Stage.