1

Ai Reliability Engineer Jobs in Reston, VA (NOW HIRING)

Manage incident response, root cause analysis, and post-mortem processes for the AI platform ... , DevOps, or production operations. * Extensive experience with cloud-native infrastructure ...

Site Reliability Engineer

Herndon, VA · On-site

$86.80K - $198K/yr

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and ... Candidate AI Usage Policy AI is a part of our daily work at Booz Allen, and we are committed to the ...

Leidos Digital Modernization sector is seeking an experienced Senior Reliability Engineer to ... Knowledge of AI/ML model serving and deployment. * Experience in participating in Engineering ...

Leidos Digital Modernization sector is seeking an experienced Senior Reliability Engineer to ... Knowledge of AI/ML model serving and deployment. * Experience in participating in Engineering ...

Site Reliability Engineer - Hybrid

Reston, VA · On-site

$59.25 - $78.75/hr

AI/ML: We have certain machine learning projects which the SRE interacts with. So, AI/ML experience is a plus to have. * Previous Fannie Mae experience is a plus. Overall years of experience: 8+ ...

next page

Showing results 1-20

Ai Reliability Engineer information

See Reston, VA salary details

$63.5K

$122.7K

$146.7K

How much do ai reliability engineer jobs pay per year?

As of Jun 2, 2026, the average yearly pay for ai reliability engineer in Reston, VA is $122,734.00, according to ZipRecruiter salary data. Most workers in this role earn between $106,600.00 and $134,200.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an AI Reliability Engineer, and why are they important?

To thrive as an AI Reliability Engineer, you need a solid background in computer science or engineering, expertise in AI/ML concepts, and experience with software testing and reliability methodologies. Familiarity with tools like TensorFlow, PyTorch, CI/CD pipelines, and reliability testing frameworks, along with certifications in cloud platforms (e.g., AWS Certified Machine Learning), is highly valuable. Analytical thinking, problem-solving abilities, and strong collaboration skills set top performers apart in this role. These skills ensure robust, dependable AI systems that meet performance standards and maintain trust in critical applications.

What are some common challenges Ai Reliability Engineers face when ensuring model robustness in production environments?

Ai Reliability Engineers often encounter challenges such as monitoring AI model performance for drift or unexpected behavior, managing data quality issues, and implementing automated alerting systems for anomalies. In production, it's crucial to ensure that AI models operate consistently and remain reliable under varying conditions and data inputs. Collaborating closely with data scientists, software engineers, and DevOps teams is essential to address these challenges and to continuously improve model reliability and uptime.

What are AI Reliability Engineers?

AI Reliability Engineers are professionals responsible for ensuring that artificial intelligence systems function reliably, safely, and effectively over time. They work on monitoring AI models in production, identifying and mitigating potential failures, and improving the robustness of AI systems. Their tasks often include testing, validation, performance monitoring, and implementing best practices for maintaining AI infrastructure. By focusing on reliability, they help organizations deploy AI solutions that are dependable and trustworthy in real-world environments.

What is a $900,000 AI job?

A $900,000 AI job typically refers to highly senior roles such as AI executives, chief AI officers, or lead AI engineers at top technology companies, often involving advanced expertise in machine learning, deep learning, and AI strategy. These positions usually require extensive experience, specialized skills, and may include performance-based bonuses or stock options that contribute to the high total compensation.

What is the difference between Ai Reliability Engineer vs Data Scientist?

AspectAi Reliability EngineerData Scientist
Required CredentialsBachelor's or master's in CS, engineering, or related; certifications in AI/MLBachelor's or master's in CS, statistics, or related; certifications in data analysis or ML
Work EnvironmentTech companies, AI-focused teams, engineering departmentsResearch labs, tech firms, analytics teams
Employer & Industry UsageAI product development, machine learning systems, reliability testingData analysis, predictive modeling, business insights

While both roles involve AI and ML, Ai Reliability Engineers focus on ensuring AI system robustness and uptime, whereas Data Scientists analyze data to generate insights and models. The roles often collaborate but serve different primary functions within AI projects.

What are popular job titles related to Ai Reliability Engineer jobs in Reston, VA? For Ai Reliability Engineer jobs in Reston, VA, the most frequently searched job titles are:
What job categories do people searching Ai Reliability Engineer jobs in Reston, VA look for? The top searched job categories for Ai Reliability Engineer jobs in Reston, VA are:
What cities near Reston, VA are hiring for Ai Reliability Engineer jobs? Cities near Reston, VA with the most Ai Reliability Engineer job openings:
Site Reliability Engineer

$65.50 - $87.25/hr

Other

Posted 28 days ago


Accenture Federal Services rating

8.4

Company rating: 8.4 out of 10

Based on 19 frontline employees who took The Breakroom Quiz

47th of 425 rated business services


Job description

The work

As a Site Reliability Engineer, you will play a pivotal role in advancing operational AI adoption within a cutting-edge Hub-and-Spoke architecture. Your primary focus will be on ensuring the reliability, scalability, and continuous monitoring of enterprise AI systems that support mission-critical applications and enterprise AI governance

Key responsibilities:

  • Ensure the reliability, scalability, and performance of enterprise AI systems within a modern Hub-and-Spoke architecture
  • Lead incident response efforts to minimize downtime and maintain service continuity
  • Implement and manage SLOs/SLAs, capacity planning, and performance optimization strategies 
  • Operate and enhance observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo Drive FinOps practices to optimize operational costs and resource utilization
  • Collaborate with cross-functional teams in AI, DevSecOps, data engineering, platform engineering, and cybersecurity
  • Integrate monitoring and continuous feedback mechanisms for mission applications and agentic AI systems
  • Support enterprise AI governance and scalable software delivery through robust operational workflows Proactively identify and resolve reliability and performance issues in production environments
  • You will be responsible for incident response, performance optimization, and capacity planning, working closely with cross-functional teams to integrate AI, DevSecOps, data engineering, and cybersecurity into seamless operational workflows
  • Your expertise will be essential in maintaining robust observability operations and supporting scalable software delivery for agentic AI systems

Here's what you need:

  • Experience with OpenTelemetry, Prom, Grafana, Loki, and Tempo to enhance system observability and performance
  • Hands-on experience with SLO/SLA management, FinOps practices, and advanced monitoring techniques to proactively identify and resolve issues before they impact mission outcomes
  • Exposure to complex integration efforts, continuous delivery pipelines, and mission-focused operational environments will help you excel in this role
  • Experience with reliability engineering, incident response and FinOps 

Eligibility requirements:

  • Must be a U.S Citizen
  • An active TS/SCI clearance is required

What Accenture Federal Services employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom