1

Audio Machine Learning Jobs in California (NOW HIRING)

Machine Learning Researcher, Audio Location: San Francisco, CA or Remote About Bland At Bland.com, our mission is to empower enterprises to build AI phone agents at scale. Based in San Francisco, we ...

Innovate in audio machine learning through fundamental and applied research, advancing the state-of-the-art in audio playback, capture, generation, and editing. * Research and develop novel ML models ...

Knowledgeable in at least one focus area of machine learning, such as computer vision, audio, or NLP * 2+ years experience managing machine learning teams * You have an ability to understand and make ...

Machine Learning Manager

San Francisco, CA ยท On-site

$180K - $250K/yr

Knowledgeable in at least one focus area of machine learning, such as computer vision, audio, or NLP * 2+ years experience managing machine learning teams * You have an ability to understand and make ...

You will explore Picture Quality (PQ) and Audio Quality (AQ) improvements using AI in a resource ... Hands-on experience with Machine Learning / Deep Learning frameworks like TensorFlow or PyTorch

Design, develop, and deploy deep-learning-based and classical DSP audio algorithms for our SPU ... Desired Skills and Experience Deep learning, Machine learning, DSP, Python, PyTorch Benefits ...

Machine Learning Engineer

San Francisco, CA ยท On-site

$120K - $180K/yr

You have successfully trained and deployed a deep learning machine model (image, NLP, video, or audio) into production, with measurably improved performance over baseline, either in industry or as a ...

Machine Learning Engineer

San Francisco, CA ยท On-site

$120K - $180K/yr

You have successfully trained and deployed a deep learning machine model (image, NLP, video, or audio) into production, with measurably improved performance over baseline, either in industry or as a ...

Machine Learning Engineer

Santa Monica, CA ยท On-site +1

$165K - $200K/yr

As Senior Machine Learning Engineer, you will work alongside gameplay engineers, producers ... Develop internal tools using ML to streamline workflows for production, art, animation, or audio ...

Machine Learning Engineer

Santa Monica, CA ยท On-site

$165K - $200K/yr

As Senior Machine Learning Engineer, you will work alongside gameplay engineers, producers ... Develop internal tools using ML to streamline workflows for production, art, animation, or audio ...

You will explore Picture Quality (PQ) and Audio Quality (AQ) improvements using AI in a resource ... Hands-on experience with Machine Learning / Deep Learning frameworks like TensorFlow or PyTorch

What You'll Do * Design and implement scalable machine learning pipelines for large-scale 3D ... Analyze diverse sensor inputs, including RGBD imagery, LiDAR point clouds, 360 photos, audio, and ...

What You'll Do * Design and implement scalable machine learning pipelines for large-scale 3D ... Analyze diverse sensor inputs, including RGBD imagery, LiDAR point clouds, 360 photos, audio, and ...

next page

Showing results 1-20

Audio Machine Learning information

See California salary details

$29.1K

$83.3K

$169.3K

How much do audio machine learning jobs pay per year?

As of Jul 1, 2026, the average yearly pay for audio machine learning in California is $83,350.00, according to ZipRecruiter salary data. Most workers in this role earn between $49,300.00 and $111,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive in the Audio Machine Learning position, and why are they important?

To thrive in Audio Machine Learning, you need a strong background in machine learning, digital signal processing, and proficiency with programming languages such as Python or MATLAB, typically supported by a relevant degree in computer science, electrical engineering, or a related field. Familiarity with frameworks like TensorFlow or PyTorch, experience with audio libraries (e.g., Librosa), and knowledge of cloud computing tools are highly valued, as are certifications in AI or data science. Strong problem-solving skills, creativity, and effective communication are essential soft skills for success in this field. These skills are crucial for developing innovative solutions, collaborating across multidisciplinary teams, and addressing complex audio data challenges in real-world projects.

Will MLE be replaced by AI?

In the context of an Audio Machine Learning (ML) role, AI tools and automation are increasingly used to assist with tasks like data processing and model deployment. However, MLE professionals are essential for designing, tuning, and maintaining complex models, making complete replacement unlikely in the near term. Human expertise remains critical for interpreting results and ensuring system performance.

What are the typical daily responsibilities of someone working in Audio Machine Learning?

Professionals in Audio Machine Learning typically spend their days designing, developing, and optimizing machine learning models tailored to audio data, such as speech or music recognition systems. You may also preprocess large datasets, extract and engineer relevant features, and collaborate closely with data scientists, audio engineers, and software developers to integrate your work into larger applications. Regular tasks often include running experiments, evaluating model performance, tuning hyperparameters, and keeping up with the latest advancements in the field. Team meetings, code reviews, and presenting findings to stakeholders are also common parts of the workweek.

What is an Audio Machine Learning job?

An Audio Machine Learning job involves developing algorithms and models that analyze, process, and generate audio data. Responsibilities typically include working with speech recognition, music analysis, sound classification, and audio enhancement. Professionals in this field use deep learning, signal processing, and neural networks to improve audio-based applications like voice assistants, noise reduction systems, and music recommendation engines. They often work with datasets of speech, music, or environmental sounds to build models that understand and manipulate audio signals effectively.

Which 5 jobs will survive AI?

Audio Machine Learning specialists are likely to continue in demand as AI advances because their expertise in developing and refining audio recognition systems requires specialized skills that are difficult to automate fully. Roles involving creative audio design, audio engineering, and human oversight of AI systems are also expected to persist. These jobs often require a combination of technical knowledge, domain expertise, and critical thinking that AI cannot easily replace.

What engineer makes $500,000 a year?

Senior audio machine learning engineers with extensive experience, advanced skills in deep learning and signal processing, and often working at large tech companies or specialized research labs can earn salaries approaching or exceeding $500,000 annually. Compensation typically includes base salary, bonuses, and stock options, especially in high-demand industries like AI and audio processing.

Do audio engineers get paid well?

Audio engineers typically earn competitive salaries that vary based on experience, location, and industry sector. Entry-level positions may start lower, but experienced professionals working in recording studios, broadcasting, or live sound often have higher earnings, especially with specialized skills and certifications. Overall, the profession offers the potential for good compensation, particularly for those with technical expertise and a strong portfolio.
What are the most commonly searched types of Audio Machine Learning jobs in California? The most popular types of Audio Machine Learning jobs in California are:
What cities in California are hiring for Audio Machine Learning jobs? Cities in California with the most Audio Machine Learning job openings:
Infographic showing various Audio Machine Learning job openings in California as of June 2026, with employment types broken down into 3% As Needed, 80% Full Time, 11% Part Time, 1% Temporary, and 5% Contract. Highlights an 87% Physical, 2% Hybrid, and 11% Remote job distribution, with an average salary of $83,350 per year, or $40.1 per hour.
Machine Learning Researcher, Audio

Machine Learning Researcher, Audio

Bland

San Francisco, CA โ€ข On-site

$140K - $250K/yr

Full-time

Medical, Dental, Vision

Posted 12 days ago


Job description

Machine Learning Researcher, Audio
Location: San Francisco, CA or Remote
About Bland
At Bland.com, our mission is to empower enterprises to build AI phone agents at scale. Based in San Francisco, we are a fast-growing team reimagining how customers interact with businesses through voice. We have raised $65 million from leading Silicon Valley investors, including Emergence Capital, Scale Venture Partners, Y Combinator, and founders of Twilio, Affirm, and ElevenLabs.
Voice is quickly becoming the primary interface between businesses and their customers. We are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.
The Role: Machine Learning Researcher, Audio
As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.
This is not a narrow research role. You will take ideas from theory to large-scale training to production inference systems serving millions of calls per day. You will design new modeling approaches, validate them with rigorous experimentation, and collaborate with engineering teams to deploy them into real customer environments.
What You Will Do
Build and Scale Next-Generation TTS Systems
  • Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
  • Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
  • Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
  • Optimize for real-time, low-latency inference in production.

Advance Speech-to-Text Modeling
  • Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
  • Leverage self-supervised pretraining and large-scale weak supervision.
  • Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.

Pioneer Neural Audio Codecs
  • Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
  • Explore discrete and continuous latent representations for scalable speech modeling.
  • Design codec architectures that enable downstream generative modeling and controllable synthesis.

Develop Scalable Training Pipelines
  • Curate and process massive audio datasets across languages, speakers, and environments.
  • Design staged training curricula and data filtering strategies.
  • Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.

Run Rigorous Experiments
  • Design ablation studies that isolate the impact of architectural changes.
  • Measure improvements using both objective metrics and perceptual evaluations.
  • Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.

What Makes You a Great Fit
Deep Research Foundations
  • Experience with self-supervised learning, multimodal modeling, or generative modeling.
  • Ability to derive new formulations and implement them efficiently.

Expertise in Voice Modeling
  • Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
  • Familiarity with large scale speech datasets and real-world audio variability.
  • Strong intuition for audio quality, prosody, and conversational dynamics.

Systems and Hardware Awareness
  • Experience training and serving large models on modern accelerators.
  • Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
  • Understanding of real-time constraints in telephony or streaming environments.

Experimental Rigor
  • Track record of designing controlled experiments and meaningful ablations.
  • Comfortable working with both offline benchmarks and live production metrics.
  • Ability to move quickly from hypothesis to validation.

Builder Mentality
  • Comfortable in fast-moving startup environments.
  • Strong ownership mindset from research through deployment.
  • Excited by ambiguous, unsolved problems.

How You Show Up
  • You treat unsolved problems as opportunities to invent new paradigms.
  • You identify the single experiment that can validate an idea in days, not months.
  • You measure everything and let data drive decisions.
  • You are obsessed with making voice agents sound truly human.
  • You use AI tools aggressively to amplify your own impact and accelerate research cycles.

Bonus Points
  • Experience with large scale distributed training.
  • Research publications or open source contributions in speech or language AI.
  • Background in real-time speech systems or telephony.
  • PhD in ML, AI, or a related field, or equivalent research impact.

Benefits and Compensation
  • Healthcare, dental, vision, all the good stuff
  • Meaningful equity in a fast-growing company
  • Every tool you need to succeed
  • Beautiful office in Jackson Square, SF with rooftop views
  • Competitive salary: $160,000 to $250,000

If you are energized by building and scaling TTS models, pioneering neural audio codecs, and pushing the boundaries of speech-to-text systems, we would love to hear from you.