1

Ai Reliability Engineer Jobs (NOW HIRING)

Site Reliability Engineer

Frederick, MD · Hybrid

$56.75 - $75.25/hr

Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights * Lead adoption of AI-enhanced SRE practices, including intelligent remediation and ...

Site Reliability Engineer - NYC

New York, NY · On-site

$62.25 - $82.75/hr

We democratize AI through high-performance, optimized, open-source and cutting-edge models ... What you will do As a Site Reliability Engineer, you balance the day-to-day operations on ...

Site Reliability Engineer - NYC

New York, NY · Remote

$58.25 - $77.50/hr

We democratize AI through high-performance, optimized, open-source and cutting-edge models ... What you will do As a Site Reliability Engineer, you balance the day-to-day operations on ...

SRE Engineer

Redmond, WA · On-site

$63.75 - $84.75/hr

Deploy and manage AI resources on Microsoft Azure, including AI Foundry and RAG solutions * Monitor and ensure service uptime, availability, reliability, and latency * Track and integrate SRE metrics ...

Hardware Reliability Engineer II (R4675)

Boston, MA · On-site

$111.40K - $140.10K/yr

Follow Shield AI on LinkedIn, X, Instagram, and YouTube. As a Hardware Reliability Engineer ... Engineer II) at Shield AI, you will support efforts to ensure the robustness and long-term ...

Site Reliability Engineer

Washington, DC · On-site

$64.25 - $85.50/hr

The role focuses on ensuring operational reliability and optimizing system performance for enterprise AI systems. Responsibilities : • Apply core reliability engineering principles to ensure high ...

Site Reliability Engineer

Frederick, CO · Hybrid

$61 - $81/hr

... AI/ML infrastructure, and zero-trust principles. You'll combine DevOps and SRE practices to support mission-driven scientific and clinical programs, emphasizing automation, reliability, compliance ...

Site Reliability Engineer

Frederick, MD · Hybrid

$56.75 - $75.25/hr

... AI/ML infrastructure, and zero-trust principles. You'll combine DevOps and SRE practices to support mission-driven scientific and clinical programs, emphasizing automation, reliability, compliance ...

Site Reliability Engineer (SRE)

Parsippany, NJ · On-site

$57.25 - $76.25/hr

We are looking for a talented Site Reliability Engineer (SRE) with a strong background in Google ... Familiarity with Google BI and AI/ML tools a plus (Looker, BigQuery ML, Vertex AI, etc.) Experience ...

Define and lead WEX's AI-Powered Reliability Engineering strategy, driving adoption of SRE agents across the software lifecycle-from design and development through deployment and operations, to ...

Define and lead WEX's AI-Powered Reliability Engineering strategy, driving adoption of SRE agents across the software lifecycle-from design and development through deployment and operations, to ...

Ability to design or integrate AI-driven workflows for operational efficiency and reliability * Familiarity with building or integrating autonomous agents for DevOps/SRE use cases Cloud & Multi-Cloud ...

SRE Architect, AI-Powered Reliability

Portland, ME · On-site

$58.25 - $77.50/hr

Define and lead WEX's AI-Powered Reliability Engineering strategy, driving adoption of SRE agents across the software lifecycle-from design and development through deployment and operations, to ...

Define and lead WEX's AI-Powered Reliability Engineering strategy, driving adoption of SRE agents across the software lifecycle-from design and development through deployment and operations, to ...

next page

Showing results 1-20

Ai Reliability Engineer information

See salary details

$61K

$118K

$141K

How much do ai reliability engineer jobs pay per year?

As of May 29, 2026, the average yearly pay for ai reliability engineer in the United States is $117,973.00, according to ZipRecruiter salary data. Most workers in this role earn between $102,500.00 and $129,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an AI Reliability Engineer, and why are they important?

To thrive as an AI Reliability Engineer, you need a solid background in computer science or engineering, expertise in AI/ML concepts, and experience with software testing and reliability methodologies. Familiarity with tools like TensorFlow, PyTorch, CI/CD pipelines, and reliability testing frameworks, along with certifications in cloud platforms (e.g., AWS Certified Machine Learning), is highly valuable. Analytical thinking, problem-solving abilities, and strong collaboration skills set top performers apart in this role. These skills ensure robust, dependable AI systems that meet performance standards and maintain trust in critical applications.

What are some common challenges Ai Reliability Engineers face when ensuring model robustness in production environments?

Ai Reliability Engineers often encounter challenges such as monitoring AI model performance for drift or unexpected behavior, managing data quality issues, and implementing automated alerting systems for anomalies. In production, it's crucial to ensure that AI models operate consistently and remain reliable under varying conditions and data inputs. Collaborating closely with data scientists, software engineers, and DevOps teams is essential to address these challenges and to continuously improve model reliability and uptime.

What are AI Reliability Engineers?

AI Reliability Engineers are professionals responsible for ensuring that artificial intelligence systems function reliably, safely, and effectively over time. They work on monitoring AI models in production, identifying and mitigating potential failures, and improving the robustness of AI systems. Their tasks often include testing, validation, performance monitoring, and implementing best practices for maintaining AI infrastructure. By focusing on reliability, they help organizations deploy AI solutions that are dependable and trustworthy in real-world environments.

What is a $900,000 AI job?

A $900,000 AI job typically refers to highly senior roles such as AI executives, chief AI officers, or lead AI engineers at top technology companies, often involving advanced expertise in machine learning, deep learning, and AI strategy. These positions usually require extensive experience, specialized skills, and may include performance-based bonuses or stock options that contribute to the high total compensation.

What is the difference between Ai Reliability Engineer vs Data Scientist?

AspectAi Reliability EngineerData Scientist
Required CredentialsBachelor's or master's in CS, engineering, or related; certifications in AI/MLBachelor's or master's in CS, statistics, or related; certifications in data analysis or ML
Work EnvironmentTech companies, AI-focused teams, engineering departmentsResearch labs, tech firms, analytics teams
Employer & Industry UsageAI product development, machine learning systems, reliability testingData analysis, predictive modeling, business insights

While both roles involve AI and ML, Ai Reliability Engineers focus on ensuring AI system robustness and uptime, whereas Data Scientists analyze data to generate insights and models. The roles often collaborate but serve different primary functions within AI projects.

More about Ai Reliability Engineer jobs
What cities are hiring for Ai Reliability Engineer jobs? Cities with the most Ai Reliability Engineer job openings:
What states have the most Ai Reliability Engineer jobs? States with the most job openings for Ai Reliability Engineer jobs include:
What job categories do people searching Ai Reliability Engineer jobs look for? The top searched job categories for Ai Reliability Engineer jobs are:

Site Reliability Engineer

Axle

Frederick, MD • Hybrid

$56.75 - $75.25/hr

Full-time

Medical, Dental, Vision, Retirement, PTO

Posted 22 days ago


Job description

(ID: 2025-1135)

Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).

Benefits We Offer:

  • 100% Medical, Dental & Vision Coverage for Employees
  • Paid Time Off and Paid Holidays
  • 401K match up to 5%
  • Educational Benefits for Career Growth
  • Employee Referral Bonus
  • Flexible Spending Accounts:
    • Healthcare (FSA)
    • Parking Reimbursement Account (PRK)
    • Dependent Care Assistant Program (DCAP)
    • Transportation Reimbursement Account (TRN)

The Site Reliability Engineer role centers on modernizing and consolidating a complex multi-cloud environment across AWS, Azure, and GCP, building a scalable, secure, and observable platform from the ground up using Kubernetes, AI/ML infrastructure, and zero-trust principles. You'll combine DevOps and SRE practices to support mission-driven scientific and clinical programs, emphasizing automation, reliability, compliance, and proactive monitoring while enabling innovation through AI-driven tooling. The team culture is highly collaborative and growth-oriented, valuing experimentation, continuous learning, and cross-functional leadership, with opportunities to shape future multi-cloud and platform engineering solutions.

Responsibilities:

  • Design and implement enterprise-grade monitoring and observability frameworks (metrics, logs, traces) across distributed systems using enterprise Splunk, Grafana and Open-telemetry tools

  • Establish and manage SLIs, SLOs, and error budgets to drive reliability improvements

  • Develop and maintain real-time asset inventory systems across cloud, on-prem, and hybrid environments

  • Automate workload onboarding and offboarding processes, ensuring standardization and governance

  • Track system ownership, dependencies, and lifecycle states for operational transparency

  • Build proactive detection mechanisms using AIOps and intelligent alerting to minimize incident impact

  • Design and operate scalable, resilient, and secure infrastructure platforms across cloud and hybrid environments

  • Implement automated compliance tracking and enforcement aligned with organizational and regulatory standards (e.g., NIST, FISMA, FedRAMP)

  • Embed ITIL processes (incident, change, problem, configuration management) into SRE workflows

  • Build and maintain automated deployment environments and pipelines that enforce security, compliance, and operational standards

  • Develop "golden paths" and standardized platform templates for consistent workload deployment

  • Automate provisioning, patching, configuration management, and environment lifecycle

  • Leverage AI/ML coding assistants and vibe coding practices to rapidly develop automation scripts, tools, and internal platforms

  • Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights

  • Lead adoption of AI-enhanced SRE practices, including intelligent remediation and predictive operations

  • Champion DevOps and SRE practices including Infrastructure as Code, CI/CD, observability, and reliability engineering

  • Build developer-friendly platforms ("golden paths") that simplify deployments, reduce friction, and improve velocity

  • Enable and optimize infrastructure for AI/ML workloads, including data pipelines, storage systems, and inference environments, GPU-enabled and high-performance compute workloads

  • Build and manage containerized and orchestrated platforms (Docker, Kubernetes)

  • Support cloud migration, modernization, and platform standardization initiatives

  • Ensure systems meet security, compliance, backup, and disaster recovery requirements

  • Evangelize and promote best practices in DevOps, SRE, and platform engineering to developer communities

  • Stay abreast of new technologies in your areas but not limited to AIOps, MLOps, cloud computing & deployment, site reliability engineering, infrastructure automation, security best practices, data engineering etc.

Requirements:

  • Must have total of 6+ experience DevOps / SRE roles with monitoring and observability tools (Prometheus, Grafana, ELK, or cloud-native equivalents) for on-prem and cloud hosted workloads.

  • Must have 4+ years of Hands-on Linux experience that includes Ubuntu/CentOS/Red Hat operating systems, containers, dependency management and administration support

  • Must have 4+ years of experience automating Infrastructure-as-Code (IaC) deployments to one of the following cloud platforms Amazon AWS, Google GCP and Microsoft Azure

  • Must have 4+ years with CI/CD and automation tools such as Terraform, Ansible, Chef, Puppet, Jenkins, GitHub Actions

  • Strong scripting skills (Python, Bash, PowerShell or similar)

  • Must be proficient using vibe coding and coding assistants to develop scripts, tools and applications for the DevOps and SRE use cases

  • Must have proficiency to debug or troubleshoot and/or deploying SQL and/or NoSQL databases, object storage, web servers, open-source programming stack for Node.JS, R, Python, .NET Core, Java is desired but not mandatory

  • Must be willing to learn new technologies, adopt and adapt to emerging technologies or needs from a project to a project

  • Cloud certifications is preferred

  • Certifications in Grafana, Splunk, Docker, Kubernetes is preferred but optional

Disclaimer: The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills, responsibilities, duties, and/or assignments required. Individuals may be required to perform duties outside of their position, job description or responsibilities as needed.

The diversity of Axle's employees is a tremendous asset. We are firmly committed to providing equal opportunity in all aspects of employment and will not tolerate any illegal discrimination or harassment based on age, race, gender, religion, national origin, disability, marital status, covered veteran status, sexual orientation, status with respect to public assistance, and other characteristics protected under state, federal, or local law and to deter those who aid, abet, or induce discrimination or coerce others to discriminate.

Accessibility: If you need an accommodation as part of the employment process please contact: careers@axleinfo.com

This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidate's experience, qualifications, skills, and location.

Salary Range
$140,000—$155,000 USD