1

Reliability Manager Jobs in Washington (NOW HIRING)

Site Reliability Engineer (SRE) (TS)

Washington, DC · On-site

$64.50 - $85.75/hr

Koniag Management Solutions, LLC (KMS), a Koniag Government Services (KGS) company, is hiring a Site Reliability Engineer (SRE). Position requires an active Top Secret/SCI clearance with ability to ...

Koniag Management Solutions, LLC (KMS), a Koniag Government Services (KGS) company, is hiring a Site Reliability Engineer (SRE). Position requires an active Top Secret/SCI clearance with ability to ...

SRE ENGINEER/ MANAGER

Reston, VA · On-site

$59.25 - $78.75/hr

Job Summary (Sr. Manager SRE): - Design, implement, and manage scalable, secure, and fault-tolerant cloud infrastructure using AWS, Azure, or GCP. - Automate infrastructure provisioning and ...

Site Reliability Engineer (DevOps/SRE), location is Remote. The start date is Targeting June 29 for ... Manage infrastructure through Terraform and configuration management tools * Support Kubernetes and ...

New

Site Reliability Engineer

Sterling, VA · On-site

$56.50 - $75/hr

The SRE executes and analyzes manual IT operations/admin tasks (log analysis, performance tuning, patch management, testing, and incident response) and converts them to automated tasks. The SRE works ...

Site Reliability Engineer

Sterling, VA

$56.50 - $75/hr

The SRE executes and analyzes manual IT operations/admin tasks (log analysis, performance tuning, patch management, testing, and incident response) and converts them to automated tasks. The SRE works ...

next page

Showing results 1-20

Reliability Manager information

See Washington salary details

$70.2K

$133.1K

$190.8K

How much do reliability manager jobs pay per year?

As of Jun 1, 2026, the average yearly pay for reliability manager in Washington is $133,066.00, according to ZipRecruiter salary data. Most workers in this role earn between $107,000.00 and $158,600.00 per year, depending on experience, location, and employer.

What does a Reliability Manager do?

A Reliability Manager is responsible for ensuring that equipment, processes, and systems operate efficiently and consistently to minimize downtime and maximize performance. They develop and implement reliability strategies, conduct root cause analyses, and oversee preventive and predictive maintenance programs. Their role involves working closely with maintenance teams, engineers, and production staff to improve asset reliability and extend equipment lifespan. Additionally, they analyze failure data, recommend improvements, and help optimize operational costs through reliability-centered maintenance practices.

What are the key skills and qualifications needed to thrive in the Reliability Manager position, and why are they important?

A Reliability Manager needs strong analytical skills, a solid background in engineering or maintenance, and experience with reliability-centered maintenance methodologies. Familiarity with tools like Failure Mode and Effects Analysis (FMEA), Root Cause Analysis (RCA), and certifications such as Certified Reliability Engineer (CRE) are often required. Leadership, problem-solving, and the ability to communicate complex technical information clearly are crucial soft skills for this role. These skills help ensure equipment uptime, optimize maintenance processes, and foster a culture of continuous improvement within the organization.

What are some typical daily responsibilities of a Reliability Manager?

A Reliability Manager typically spends their day analyzing equipment performance data, identifying trends, and implementing strategies to reduce downtime and improve asset reliability. They lead investigations into failures using proven methodologies like Root Cause Analysis, and work closely with maintenance, engineering, and operations teams to develop maintenance plans and improvements. Regular responsibilities also include managing reliability projects, training staff in best practices, and ensuring compliance with safety and regulatory standards. This role often requires a balance of hands-on technical work and cross-functional collaboration to drive operational excellence.
What are the most commonly searched types of Reliability jobs in Washington? The most popular types of Reliability jobs in Washington are:
What are popular job titles related to Reliability Manager jobs in Washington? For Reliability Manager jobs in Washington, the most frequently searched job titles are:
What cities in Washington are hiring for Reliability Manager jobs? Cities in Washington with the most Reliability Manager job openings:
Infographic showing various Reliability Manager job openings in Washington as of May 2026, with employment types broken down into 52% Full Time, 44% Part Time, 2% Temporary, and 2% Contract. Highlights an 88% Physical, 2% Hybrid, and 10% Remote job distribution, with an average salary of $133,066 per year, or $64 per hour.

Site Reliability Engineer (SRE) (TS)

kgs

Washington, DC • On-site

$64.50 - $85.75/hr

Other

Medical, Dental, Vision, Retirement, PTO

Posted 13 days ago


Job description

Koniag Management Solutions, LLC (KMS), a Koniag Government Services (KGS) company, is hiring a Site Reliability Engineer (SRE).  Position requires an active Top Secret/SCI clearance with ability to obtain additional security requirements.  Please do not apply if you do not possess the required Top-Secret Clearance.
We offer competitive compensation and an extraordinary benefits package including health, dental and vision insurance, 401K with company matching, flexible spending accounts, paid holidays, three weeks paid time off, and more.
Position Summary:

We are seeking an experienced Site Reliability Engineer (SRE) to blend software engineering and systems administration practices to ensure the reliability, availability, and performance of missioncritical applications. This role focuses on automation, observability, and incident response while upholding strict Service Level Objectives (SLOs). The SRE will help build resilient systems that scale, automate manual processes, manage fleetwide configurations, and ensure robust system monitoring. The selected candidate will support operations at Joint Base Anacostia–Bolling and must maintain an active TS/SCI clearance.

Key Responsibilities:

  • Ensure application reliability, performance, and availability through automation, monitoring, and systems engineering.
  • Develop infrastructure-as-code (IaC) solutions using Terraform, Ansible, and Desired State Configuration (DSC).
  • Build and manage containerized workloads using Kubernetes, Rancher, Docker, Helm, and related ecosystem tools.
  • Support service mesh and networking constructs such as Cilium, load balancing, ingress management, and distributed storage.
  • Engineer and maintain storage and object systems including Rook, Ceph, MinIO, and S3-compatible platforms.
  • Implement and maintain comprehensive observability platforms (metrics, logging, tracing) to support SLO monitoring and incident response.
  • Lead and participate in incident response activities, postmortem analysis, and reliability engineering improvements.
  • Develop automations, scripts, and tools using Python, PowerShell, and shell scripting.
  • Support CI/CD pipelines and cloud-native deployment methodologies.
  • Collaborate with development and operations teams to embed SRE practices into the application lifecycle.

Required Technical Certifications (at least two):

  • Security +
  • Cloud Associate (such as AWS Solutions Architect Associate, Azure AZ104, or Google Cloud Associate Cloud Engineer)
  • Terraform Associate
  • Cloud Professional/Architect (such as AWS Solutions Architect Professional or Azure Architect Expert)
  • CKA (Certified Kubernetes Administrator)

Preferred Certifications (Plus):

  • CKA (if not used to meet required cert)
  • RHCSA
  • AWS DevOps Engineer or AZ400
  • CCSP
  • Advanced observability certifications (Datadog, New Relic, Dynatrace, etc.)
  • Formal incident management or SREfocused training

Required Technical Knowledge:

Strong understanding of the following technologies:

  • Kubernetes, Rancher, Helm, Docker
  • Cilium, Rook, Ceph, MinIO, S3, PortWorx
  • Load balancing, ingress, and service networking
  • Ansible, Terraform, Desired State Configuration
  • Python, PowerShell, and scripting/automation
  • Distributed systems, cloud computing, and microservices architecture
  • Monitoring/observability practices and tools
  • Incident response frameworks and SLObased operations

Preferred Experience:

  • Building scalable, fault-tolerant cloud-native systems across hybrid or multicloud environments.
  • Developing or supporting enterprise CI/CD pipelines.
  • Managing complex Kubernetes clusters across onprem and cloud platforms.
  • Implementing enterprise observability stacks (e.g., Prometheus, Loki, Grafana, ELK, Open Telemetry).
  • Supporting large-scale infrastructure within DoD or Intelligence Community environments.

Requirements:

TS/SCI security clearance required, candidate will not be considered without.


Our Equal Employment Opportunity Policy:
 

The company is an equal opportunity employer. The company shall not discriminate against any employee or applicant because of race, color, religion, creed, ethnicity, sex, sexual orientation, gender or gender identity (except where gender is a bona fide occupational qualification), national origin or ancestry, age, disability, citizenship, military/veteran status, marital status, genetic information or any other characteristic protected by applicable federal, state, or local law. We are committed to equal employment opportunity in all decisions related to employment, promotion, wages, benefits, and all other privileges, terms, and conditions of employment.

 
The company is dedicated to seeking all qualified applicants. If you require an accommodation to navigate or to apply to a position on our website, please contact Heaven Wood via e-mail at accommodations@koniag-gs.com or by calling 703-488-9377 to request accommodations. 

 

About our Company:
 

Koniag Government Services (KGS) is an Alaska Native Owned corporation supporting the values and traditions of our native communities through an agile employee and corporate culture that delivers Enterprise Solutions, Professional Services and Operational Management to Federal Government Agencies. As a wholly owned subsidiary of Koniag, we apply our proven commercial solutions to a deep knowledge of Defense and Civilian missions to provide forward leaning technical, professional, and operational solutions. KGS enables successful mission outcomes for our customers through solution-oriented business partnerships and a commitment to exceptional service delivery. We ensure long-term success with a continuous improvement approach while balancing the collective interests of our customers, employees, and native communities. For more information, please visit www.koniag-gs.com.

 Equal Opportunity Employer/Veterans/Disabled. Shareholder Preference in accordance with Public Law 88-352

#LI-CT1