1

Ai Reliability Engineer Jobs in Reston, VA (NOW HIRING)

Lead Site Reliability Engineer

Washington, DC · On-site

$64.50 - $85.75/hr

Integrate and optimize NVIDIA GPU infrastructure for AI/ML training and inference workloads ... reliability, systems engineering, or hardware operations roles * Deep expertise with physical ...

Site Reliability Engineer

Vienna, VA · On-site

$57.25 - $76/hr

We're seeking a skilled and proactive Site Reliability Engineer to join our team, ensuring the stability, security, and efficiency of our technological resources as we deliver cutting-edge AI ...

next page

Showing results 1-20

Ai Reliability Engineer information

See Reston, VA salary details

$63.5K

$122.7K

$146.7K

How much do ai reliability engineer jobs pay per year?

As of Jun 10, 2026, the average yearly pay for ai reliability engineer in Reston, VA is $122,734.00, according to ZipRecruiter salary data. Most workers in this role earn between $106,600.00 and $134,200.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an AI Reliability Engineer, and why are they important?

To thrive as an AI Reliability Engineer, you need a solid background in computer science or engineering, expertise in AI/ML concepts, and experience with software testing and reliability methodologies. Familiarity with tools like TensorFlow, PyTorch, CI/CD pipelines, and reliability testing frameworks, along with certifications in cloud platforms (e.g., AWS Certified Machine Learning), is highly valuable. Analytical thinking, problem-solving abilities, and strong collaboration skills set top performers apart in this role. These skills ensure robust, dependable AI systems that meet performance standards and maintain trust in critical applications.

What is the difference between Ai Reliability Engineer vs Data Scientist?

AspectAi Reliability EngineerData Scientist
Required CredentialsBachelor's or master's in CS, engineering, or related; certifications in AI/MLBachelor's or master's in CS, statistics, or related; certifications in data analysis or ML
Work EnvironmentTech companies, AI-focused teams, engineering departmentsResearch labs, tech firms, analytics teams
Employer & Industry UsageAI product development, machine learning systems, reliability testingData analysis, predictive modeling, business insights

While both roles involve AI and ML, Ai Reliability Engineers focus on ensuring AI system robustness and uptime, whereas Data Scientists analyze data to generate insights and models. The roles often collaborate but serve different primary functions within AI projects.

What are AI Reliability Engineers?

AI Reliability Engineers are professionals responsible for ensuring that artificial intelligence systems function reliably, safely, and effectively over time. They work on monitoring AI models in production, identifying and mitigating potential failures, and improving the robustness of AI systems. Their tasks often include testing, validation, performance monitoring, and implementing best practices for maintaining AI infrastructure. By focusing on reliability, they help organizations deploy AI solutions that are dependable and trustworthy in real-world environments.

What is a $900,000 AI job?

A $900,000 AI job typically refers to highly senior roles such as AI executives, chief AI officers, or lead AI engineers at top technology companies, often involving advanced expertise in machine learning, deep learning, and AI strategy. These positions usually require extensive experience, specialized skills, and may include performance-based bonuses or stock options that contribute to the high total compensation.

What are some common challenges Ai Reliability Engineers face when ensuring model robustness in production environments?

Ai Reliability Engineers often encounter challenges such as monitoring AI model performance for drift or unexpected behavior, managing data quality issues, and implementing automated alerting systems for anomalies. In production, it's crucial to ensure that AI models operate consistently and remain reliable under varying conditions and data inputs. Collaborating closely with data scientists, software engineers, and DevOps teams is essential to address these challenges and to continuously improve model reliability and uptime.
What are popular job titles related to Ai Reliability Engineer jobs in Reston, VA? For Ai Reliability Engineer jobs in Reston, VA, the most frequently searched job titles are:
What job categories do people searching Ai Reliability Engineer jobs in Reston, VA look for? The top searched job categories for Ai Reliability Engineer jobs in Reston, VA are:
What cities near Reston, VA are hiring for Ai Reliability Engineer jobs? Cities near Reston, VA with the most Ai Reliability Engineer job openings:
Site Reliability Engineer - TS/SCI with Poly

Site Reliability Engineer - TS/SCI with Poly

GDIT

Washington, DC

$64.50 - $85.75/hr

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 25 days ago


General Dynamics Information Technology rating

7.8

Company rating: 7.8 out of 10

Based on 62 frontline employees who took The Breakroom Quiz

71st of 204 rated it services


Job description

Type of Requisition:

Pipeline

Clearance Level Must Currently Possess:

Top Secret SCI + Polygraph

Clearance Level Must Be Able to Obtain:

Top Secret SCI + Polygraph

Public Trust/Other Required:

None

Job Family:

IT Infrastructure and Operations

Job Qualifications:

Skills:

Automation Tools, Enterprise Infrastructures, Enterprise Operations, Site Reliability Engineering

Certifications:

None

Experience:

5 + years of related experience

US Citizenship Required:

Yes

Job Description:

SITE RELIABILITY ENGINEER (SRE)

Own your opportunity. Make your impact

As a Site Reliability Engineer (SRE) supporting the CIO Infrastructure Services (CIS) program, you will help maintain the reliability, scalability, and performance of enterprise infrastructure services deployed across more than 250 global sites. You will engineer and optimize systems, automate operational workflows, strengthen monitoring capabilities, and ensure the stability and resilience of mission critical environments.

You will partner closely with Engineering, Operations, Tech Refresh, Cybersecurity, and Data Center teams to ensure seamless integration of new capabilities into a high availability production environment, helping the Defense Intelligence Enterprise remain secure, connected, and mission ready.

HOW A SITE RELIABILITY ENGINEER WILL MAKE AN IMPACT

  • Ensure the reliability, availability, and performance of enterprise IT systems across global environments
  • Develop automation solutions that reduce manual effort, streamline operational tasks, and improve system resiliency
  • Build and maintain monitoring, alerting, and observability capabilities supporting 24/7/365 enterprise operations
  • Perform root cause analysis (RCA), corrective action planning, and long-term problem remediation for infrastructure issues
  • Partner with engineering teams to validate, test, and integrate new systems, upgrades, baselines, and enhancements into production
  • Improve system performance through configuration tuning, capacity planning, and optimization of compute, storage, network, and virtualized environments
  • Develop and maintain infrastructure-as-code, scripts, and operational automation to support consistent and repeatable deployments
  • Support enterprise incident response, including triage, escalation, and service restoration for high visibility events
  • Maintain operational documentation including SOPs, runbooks, baselines, dashboards, and architectural diagrams
  • Ensure compliance with ITIL/ITSM processes-including Incident, Problem, Change, and Configuration Management
  • Strengthen the enterprise security posture by supporting patching, vulnerability remediation, and RMF related configuration updates
  • Coordinate with global operations teams to ensure service continuity, readiness, and adherence to SLAs and KPIs
  • Leverage analytics, metrics, and monitoring data to identify performance trends and drive continuous service improvement initiatives

WHAT YOU'LL NEED TO SUCCEED

Required:

  • CLEARANCE: Active TS/SCI with CI Polygraph
  • EDUCATION: Bachelor's degree in computer science, engineering, IT, or related technical field
    (Additional experience may substitute for degree)
  • 8+ years of experience in site reliability engineering, systems engineering, enterprise operations, or DevOps roles
  • Handson experience with automation tools (PowerShell, Python, Ansible, Terraform, etc.)
  • Strong experience supporting enterprise infrastructure domains including server compute, storage, virtualization, networking, and monitoring
  • Experience with enterprise monitoring platforms (e.g., SolarWinds, SCOM, Splunk, Nagios, ELK)
  • Strong understanding of ITIL/ITSM workflows and operational governance processes
  • Demonstrated ability to troubleshoot complex technical issues across distributed enterprise environments
  • Strong communication and collaboration skills working across multidisciplinary technical teams Excellent communication and stakeholder engagement skills
  • US citizenship required
  • LOCATION: Onsite

Preferred:

  • ITIL v4 Foundations certification
  • Experience supporting the client, DoDIIS, or Intelligence Community environments
  • Familiarity with CMMC, NIST 80053, policies, and RMF processes
  • Experience with ServiceNow/Service Central and automated ticketing workflows
  • Experience supporting hybrid cloud, virtual desktop infrastructure (VDI), or hyperconverged platforms

GDIT IS YOUR PLACE
At GDIT, the mission is our purpose, and our people are at the center of everything we do.
Growth: AI-powered career tool that identifies career steps and learning opportunities
Support: An internal mobility team focused on helping you achieve your career goals
Rewards: Comprehensive benefits and wellness packages, 401K with company match, and competitive pay and paid time off
Community: Award-winning culture of innovation and a military-friendly workplace
OWN YOUR OPPORTUNITY
Explore an enterprise IT career at GDIT and you'll find endless opportunities to grow alongside colleagues who share your desire to drive operations forward.

#CIS

The likely salary range for this position is $128,039 - $173,229. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.

Scheduled Weekly Hours:

40

Travel Required:

None

Telecommuting Options:

Onsite

Work Location:

USA MD Annapolis Junction

Additional Work Locations:

USA CO Colorado Springs, USA DC Washington, USA FL MacDill AFB, USA VA Springfield

Total Rewards at GDIT:

Our benefits package for all US-based employees includes a variety of medical plan options, some with Health Savings Accounts, dental plan options, a vision plan, and a 401(k) plan offering the ability to contribute both pre and post-tax dollars up to the IRS annual limits and receive a company match. To encourage work/life balance, GDIT offers employees full flex work weeks where possible and a variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave. To ensure our employees are able to protect their income, other offerings such as short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance are provided or available. We regularly review our Total Rewards package to ensure our offerings are competitive and reflect what our employees have told us they value most.We are GDIT. A global technology and professional services company that delivers consulting, technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 26,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 50 countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development. Together with our clients, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology.Join our Talent Community to stay up to date on our career opportunities and events at

gdit.com/tc.

Equal Opportunity Employer / Individuals with Disabilities / Protected Veterans

What General Dynamics Information Technology employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


General Dynamics Information Technology logo

About General Dynamics Information Technology

Sourced by ZipRecruiter

GDIT is a global technology and professional services company that delivers technology solutions and mission services to every major agency across the U.S. government, defense, and intelligence community. Its 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. The company operates across 50+ countries worldwide, offering leading capabilities in digital modernization, AI/ML, cloud, cyber, and application development.

Industry

It services

Company size

10,000+ Employees

Headquarters location

Falls Church, VA, US