1

Ai Reliability Engineer Jobs (NOW HIRING)

Digital - Principal SRE (AI Engineer)

Columbus, OH ยท On-site +1

$53.50 - $71.25/hr

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is ...

Digital - Principal SRE (AI Engineer)

Columbus, OH ยท On-site +1

$53.50 - $71.25/hr

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is ...

Site Reliability Engineer (SRE)

Atlanta, GA ยท On-site

$54.75 - $72.75/hr

Required : โ€ข Passionate about building reliable, scalable systems using modern, AI-enabled ... E principles in a production environment โ€ข Strong background in Linux, networking, and system ...

New

Site Reliability Engineer

San Francisco, CA ยท On-site

$130K - $500K/yr

We partner with leading AI labs and enterprises to provide the human intelligence essential to AI ... About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability ...

Site Reliability Engineer

Austin, TX ยท On-site

$56.50 - $75/hr

We work at the frontier of AI, tackling big, real-world problems for global enterprises across ... We are looking for a Site Reliability Engineer to help design, build, and operate the platforms ...

Reliability Engineer

Costa Mesa, CA ยท On-site

$110K - $138.40K/yr

Anduril's family of systems is powered by Lattice OS, an AI-powered operating system that turns ... Anduril's Reliability Engineering organization is seeking an experienced Reliability Engineer to ...

Staff Site Reliability Engineer

$58.25 - $77.50/hr

Wand AI is a company focused on integrating AI into labor to create a hybrid workforce. They are seeking a highly experienced Senior Staff SRE Engineer to build and operate SRE practices at scale ...

Site Reliability Engineer

Frederick, MD ยท Hybrid

$56.75 - $75.25/hr

Integrate AI-driven tooling into DevOps pipelines for code quality, security scanning, and operational insights * Lead adoption of AI-enhanced SRE practices, including intelligent remediation and ...

Platform Reliability Engineer

New York, NY ยท On-site

$120K - $190K/yr

Our AI-first, cloud-native approach delivers real-time intelligence and interactive business ... Role Overview We are seeking a Platform Reliability Engineer (SRE) to ensure the scalability ...

next page

Showing results 1-20

Ai Reliability Engineer information

See salary details

$61K

$118K

$141K

How much do ai reliability engineer jobs pay per year?

As of May 29, 2026, the average yearly pay for ai reliability engineer in the United States is $117,973.00, according to ZipRecruiter salary data. Most workers in this role earn between $102,500.00 and $129,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an AI Reliability Engineer, and why are they important?

To thrive as an AI Reliability Engineer, you need a solid background in computer science or engineering, expertise in AI/ML concepts, and experience with software testing and reliability methodologies. Familiarity with tools like TensorFlow, PyTorch, CI/CD pipelines, and reliability testing frameworks, along with certifications in cloud platforms (e.g., AWS Certified Machine Learning), is highly valuable. Analytical thinking, problem-solving abilities, and strong collaboration skills set top performers apart in this role. These skills ensure robust, dependable AI systems that meet performance standards and maintain trust in critical applications.

What are some common challenges Ai Reliability Engineers face when ensuring model robustness in production environments?

Ai Reliability Engineers often encounter challenges such as monitoring AI model performance for drift or unexpected behavior, managing data quality issues, and implementing automated alerting systems for anomalies. In production, it's crucial to ensure that AI models operate consistently and remain reliable under varying conditions and data inputs. Collaborating closely with data scientists, software engineers, and DevOps teams is essential to address these challenges and to continuously improve model reliability and uptime.

What are AI Reliability Engineers?

AI Reliability Engineers are professionals responsible for ensuring that artificial intelligence systems function reliably, safely, and effectively over time. They work on monitoring AI models in production, identifying and mitigating potential failures, and improving the robustness of AI systems. Their tasks often include testing, validation, performance monitoring, and implementing best practices for maintaining AI infrastructure. By focusing on reliability, they help organizations deploy AI solutions that are dependable and trustworthy in real-world environments.

What is a $900,000 AI job?

A $900,000 AI job typically refers to highly senior roles such as AI executives, chief AI officers, or lead AI engineers at top technology companies, often involving advanced expertise in machine learning, deep learning, and AI strategy. These positions usually require extensive experience, specialized skills, and may include performance-based bonuses or stock options that contribute to the high total compensation.

What is the difference between Ai Reliability Engineer vs Data Scientist?

AspectAi Reliability EngineerData Scientist
Required CredentialsBachelor's or master's in CS, engineering, or related; certifications in AI/MLBachelor's or master's in CS, statistics, or related; certifications in data analysis or ML
Work EnvironmentTech companies, AI-focused teams, engineering departmentsResearch labs, tech firms, analytics teams
Employer & Industry UsageAI product development, machine learning systems, reliability testingData analysis, predictive modeling, business insights

While both roles involve AI and ML, Ai Reliability Engineers focus on ensuring AI system robustness and uptime, whereas Data Scientists analyze data to generate insights and models. The roles often collaborate but serve different primary functions within AI projects.

More about Ai Reliability Engineer jobs
What cities are hiring for Ai Reliability Engineer jobs? Cities with the most Ai Reliability Engineer job openings:
What states have the most Ai Reliability Engineer jobs? States with the most job openings for Ai Reliability Engineer jobs include:
What job categories do people searching Ai Reliability Engineer jobs look for? The top searched job categories for Ai Reliability Engineer jobs are:

Digital - Principal SRE (AI Engineer)

Huntington

Columbus, OH โ€ข On-site, Remote

$53.50 - $71.25/hr

Other

Posted 26 days ago


Job description

Description The Digital - Principal SRE (AI Engineer) role is a position that blends expertise in artificial intelligence, machine learning, and reliability engineering. This professional is responsible for designing, deploying, and maintaining AI-driven solutions while ensuring the reliability, scalability, and performance of digital platforms and services. The ideal candidate will work closely with Digital SRE engineers, data scientists, DevOps, and operations teams to deliver robust, efficient, and automated systems that support business goals.

Job Description Summary: The IS Technical Specialist provides technical and consultative support on the most complex technical matters. This role typically reports to the Head of Digital SRE and may involve on-call responsibilities. The position provides opportunities to work on cutting-edge AI solutions, collaborate with cross segment teams, and drive reliability for mission-critical digital services Duties and Responsibilities: Design, develop, and implement AI-driven systems and automation tools to enhance the reliability and efficiency of digital platforms.

Monitor the health, availability, and performance of AI-enabled applications and infrastructure using SRE best practices. Collaborate with cross-functional teams to integrate machine learning models into production environments, ensuring seamless deployment and operation. Establish and enforce service-level objectives (SLOs), error budgets, and incident response procedures for AI-driven services.

Identify, troubleshoot, and resolve complex incidents related to AI systems, leveraging observability and monitoring tools. Drive continuous improvement by analyzing post-incident reviews, automating manual tasks, and optimizing system performance. Stay up to date with advancements in AI, SRE, and cloud technologies, recommending innovative solutions to enhance digital reliability.

Document processes and runbooks for operational transparency and knowledge sharing. AI Platform Integration: Develop abstraction layers across AI providers (Google, OpenAI, etc. ) to enable seamless integration and enablement.

Conduct design workshops, POCs, and code-with sessions to shape data-driven agent workflows with stakeholders, fostering trust and adoption. Measure & Improve: Define and use key metrics, test harnesses, and evaluation plans to measure agent accuracy, latency, safety, and cost effectiveness. Knowledge Sharing: Craft reusable patterns, documentation, and best practices to influence internal assets and client roadmaps.

Basic Qualifications: Bachelor's degree in computer science, Engineering, Data Science, or a related field and experience. 5+ years Hands-on with AI/ML engineering, SRE, DevOps, or related roles. Hands-on programming skills in Python, Java, or similar languages, with Hands-on with in developing and deploying machine learning models.

Hands-on with cloud platforms (e.g., AWS, GCP) and containerization technologies (Docker, Kubernetes). Familiarity with observability tools (Prometheus, Grafana, ELK stack) and Service Now incident management platforms. Solid understanding of SRE principles: monitoring, alerting, SLOs, error budgets, and automation

Hands-on with infrastructure-as-code (Terraform, Ansible) and CI/CD pipelines. Preferred Qualifications: Excellent problem-solving skills, attention to detail, and ability to work in a fast-paced, collaborative environment. Strong communication and documentation abilities Experience operationalizing large language models (LLMs) or generative AI systems in production settings.

Background in MLOps, data engineering, and/or cloud-native AI deployment. Knowledge of security best practices for AI and cloud infrastructure. Contributions to open source AI/SRE projects or relevant technical communities Exempt Status: (Yes = not eligible for overtime pay) (No = eligible for overtime pay) Yes Workplace Type: Office Our Approach to Office Workplace Type Certain positions outside our branch network may be eligible for a flexible work arrangement.

We're combining the best of both worlds: in-office and work from home. Our approach enables our teams to deepen connections, maintain a strong community, and do their best work. Remote roles will also have the opportunity to come together in our offices for moments that matter.

Specific work arrangements will be provided by the hiring team. Huntington is an Equal Opportunity Employer. Tobacco-Free Hiring Practice: Visit Huntington's Career Web Site for more details.

Note to Agency Recruiters: Huntington Bank will not pay a fee for any placement resulting from the receipt of an unsolicited resume. All unsolicited resumes sent to any Huntington Bank colleagues, directly or indirectly, will be considered Huntington Bank property. Recruiting agencies must have a valid, written and fully executed Master Service Agreement and Statement of Work for consideration.

Apply