1

Ai Reliability Engineer Jobs (NOW HIRING)

Reliability Engineer

Cupertino, CA ยท On-site

$2.0K/mo

About Etched Etched is building AI chips that are hard-coded for individual model architectures ... Reliability Engineer We are seeking a skilled and detail-oriented Reliability Engineer to join our ...

Staff Hardware Reliability Engineer

Boston, MA ยท On-site

$158K - $237K/yr

Follow Shield AI on LinkedIn, X, Instagram, and YouTube. As a Hardware Reliability Engineer at Shield AI, you will be responsible for ensuring the robustness and long-term performance of our VBAT ...

Digital - Principal SRE (AI Engineer)

Columbus, OH ยท On-site +1

$53.50 - $71.25/hr

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is ...

Digital - Principal SRE (AI Engineer)

Columbus, OH ยท On-site +1

$55 - $73.25/hr

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is ...

Digital - Principal SRE (AI Engineer)

Columbus, OH ยท On-site +1

$53.50 - $71.25/hr

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is ...

Reliability Engineer

Raleigh, NC ยท On-site

$99K - $125K/yr

Position : Reliability Engineer Location : Raleigh, NC RESPONSIBILITIES : Test, Validation ... Develop and execute reliability and qualification test plans specific to AI-scale cable assemblies ...

New

Site Reliability Engineer

San Francisco, CA ยท On-site

$130K - $500K/yr

We partner with leading AI labs and enterprises to provide the human intelligence essential to AI ... About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability ...

Reliability Engineer

Yocumtown, PA ยท On-site

$98K - $123K/yr

Position : Reliability Engineer Location : Etters, PA RESPONSIBILITIES : Test, Validation ... Develop and execute reliability and qualification test plans specific to AI-scale cable assemblies ...

Site Reliability Engineer

Austin, TX ยท On-site

$56.50 - $75/hr

We work at the frontier of AI, tackling big, real-world problems for global enterprises across ... This is a hands-on role for an engineer who enjoys owning reliability end-to-end and working ...

Reliability Engineer

Costa Mesa, CA

$108K - $136K/yr

Anduril's family of systems is powered by Lattice OS, an AI-powered operating system that turns ... Anduril's Reliability Engineering organization is seeking an experienced Reliability Engineer to ...

Reliability Engineer

Costa Mesa, CA ยท On-site

$110K - $138K/yr

Anduril's family of systems is powered by Lattice OS, an AI-powered operating system that turns ... Anduril's Reliability Engineering organization is seeking an experienced Reliability Engineer to ...

Site Reliability Engineer

Frederick, MD ยท On-site

$56.75 - $75.25/hr

Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights * Lead adoption of AI-enhanced SRE practices, including intelligent remediation and ...

next page

Showing results 1-20

Ai Reliability Engineer information

See salary details

$61K

$118K

$141K

How much do ai reliability engineer jobs pay per year?

As of Jun 19, 2026, the average yearly pay for ai reliability engineer in the United States is $117,973.00, according to ZipRecruiter salary data. Most workers in this role earn between $102,500.00 and $129,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an AI Reliability Engineer, and why are they important?

To thrive as an AI Reliability Engineer, you need a solid background in computer science or engineering, expertise in AI/ML concepts, and experience with software testing and reliability methodologies. Familiarity with tools like TensorFlow, PyTorch, CI/CD pipelines, and reliability testing frameworks, along with certifications in cloud platforms (e.g., AWS Certified Machine Learning), is highly valuable. Analytical thinking, problem-solving abilities, and strong collaboration skills set top performers apart in this role. These skills ensure robust, dependable AI systems that meet performance standards and maintain trust in critical applications.

What is the difference between Ai Reliability Engineer vs Data Scientist?

AspectAi Reliability EngineerData Scientist
Required CredentialsBachelor's or master's in CS, engineering, or related; certifications in AI/MLBachelor's or master's in CS, statistics, or related; certifications in data analysis or ML
Work EnvironmentTech companies, AI-focused teams, engineering departmentsResearch labs, tech firms, analytics teams
Employer & Industry UsageAI product development, machine learning systems, reliability testingData analysis, predictive modeling, business insights

While both roles involve AI and ML, Ai Reliability Engineers focus on ensuring AI system robustness and uptime, whereas Data Scientists analyze data to generate insights and models. The roles often collaborate but serve different primary functions within AI projects.

What are AI Reliability Engineers?

AI Reliability Engineers are professionals responsible for ensuring that artificial intelligence systems function reliably, safely, and effectively over time. They work on monitoring AI models in production, identifying and mitigating potential failures, and improving the robustness of AI systems. Their tasks often include testing, validation, performance monitoring, and implementing best practices for maintaining AI infrastructure. By focusing on reliability, they help organizations deploy AI solutions that are dependable and trustworthy in real-world environments.

What are some common challenges Ai Reliability Engineers face when ensuring model robustness in production environments?

Ai Reliability Engineers often encounter challenges such as monitoring AI model performance for drift or unexpected behavior, managing data quality issues, and implementing automated alerting systems for anomalies. In production, it's crucial to ensure that AI models operate consistently and remain reliable under varying conditions and data inputs. Collaborating closely with data scientists, software engineers, and DevOps teams is essential to address these challenges and to continuously improve model reliability and uptime.
More about Ai Reliability Engineer jobs
What cities are hiring for Ai Reliability Engineer jobs? Cities with the most Ai Reliability Engineer job openings:
What states have the most Ai Reliability Engineer jobs? States with the most job openings for Ai Reliability Engineer jobs include:
What job categories do people searching Ai Reliability Engineer jobs look for? The top searched job categories for Ai Reliability Engineer jobs are:
Infographic showing various Ai Reliability Engineer job openings in the United States as of June 2026, with employment types broken down into 75% Full Time, and 25% Contract. Highlights an 87% In-person, and 13% Remote job distribution, with an average salary of $117,973 per year, or $56.7 per hour.

Reliability Engineer

Etched

Cupertino, CA โ€ข On-site

$2.0K/mo

Full-time

Medical, Dental, Vision

Posted 25 days ago


Job description

About Etched
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep chain-of-thought reasoning.
Reliability Engineer
We are seeking a skilled and detail-oriented Reliability Engineer to join our team. As a Reliability Engineer at Etched, you will play a critical role in ensuring that all components and systems meet our rigorous reliability standards, essential for our datacenter applications. This position requires a deep understanding of reliability engineering principles, as well as experience working with suppliers, ODMs, and JDMs.
Representative Projects:
  • Lead the development, implementation, and management of reliability standards for all suppliers working with Etched. Ensure that all components and systems meet or exceed the required reliability benchmarks.
  • Review and verify reliability reports from suppliers, ensuring accuracy and adherence to Etched's standards. Provide guidance and feedback to suppliers to ensure continuous improvement in reliability performance.
  • Collaborate with cross-functional teams to review and recommend component selection criteria based on reliability performance. Ensure that all selected components are capable of meeting the long-term reliability requirements of our datacenter applications.
  • Evaluate and approve reliability test plans proposed by external vendors. Ensure that the test methodologies and conditions are sufficient to validate long-term reliability under expected operating conditions.
  • Conduct in-depth analysis of reliability data provided by suppliers and vendors. Identify trends, potential issues, and areas for improvement to enhance overall reliability.
  • Work closely with ODMs (Original Design Manufacturers) and JDMs (Joint Design Manufacturers) to ensure that all products meet Etched quality and reliability standards. Provide technical guidance and support to maintain maximum operational uptime and long-term reliability.
  • Review and establish reliability metrics and standards for silicon components, ensuring they meet the stringent requirements for long-term reliability in data center environments.

You maybe a good fit if you have
  • Bachelor's or Master's degree in Reliability Engineering, Electrical Engineering, or a related field.
  • 5+ years of experience in reliability engineering, with a focus on datacenter applications preferred.
  • Strong understanding of reliability standards, testing methodologies, and data analysis techniques. DFMEA / PFMEA / SPC Engineering analysis experience desired.
  • Experience working with suppliers, ODMs, and JDMs in a high-tech environment.
  • Excellent communication skills, with the ability to convey complex technical concepts to diverse stakeholders.
  • Proven ability to manage multiple projects and deliver results in a fast-paced environment.

We encourage you to apply even if you do not believe you meet every single qualification.
How we're different:
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in Cupertino, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Benefits:
  • Full medical, dental, and vision packages, with 100% of premium covered, 90% for dependents
  • Housing subsidy of $2,000/month for those living within walking distance of the office
  • Daily lunch and dinner in our office
  • Relocation support for those moving to Cupertino