1

Linux Site Reliability Engineer Jobs in Washington

Site Reliability Engineer

Sterling, VA

$56.50 - $75/hr

The Site Reliability Engineer (SRE) collaboratively works closely with the contract leadership ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

Site Reliability Engineer

Sterling, VA · On-site

$56.50 - $75/hr

Site Reliability Engineer Location: Sterling, VA Clearance: TS/SCI Poly **This position is ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

SRE Engineer

Bethesda, MD · On-site

$61 - $81/hr

Job Title : SRE Engineer Location:Bethesda Maryland Duration : 12 months 3 days a week onsite at ... Required Linux Advanced (6-9 years experience) Required UNIX Advanced (6-9 years experience ...

Site Reliability Engineer (SRE)

Vienna, VA · On-site

$57.25 - $76/hr

The AWS Site Reliability Engineer (SRE) is responsible for the operational health, availability, and performance of the AWS and Databricks environments built by the Platform Engineering team. You ...

Senior Site Reliability Engineer

Mclean, VA · On-site +1

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

Site Reliability Engineer

Washington, DC · On-site

$64.25 - $85.50/hr

MANTECH seeks a motivated Site Reliability Engineer (SRE) for a new initiative that supports the rapid design and operation of enterprise-scale AI and data capabilities. The role focuses on ensuring ...

Senior Site Reliability Engineer

Mclean, VA

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

Senior Site Reliability Engineer

Mclean, VA

$57.50 - $76.50/hr

As an SRE, your primary responsibility is to combine aspects of software engineering with ... Experience with Linux and Windows operating systems, along with scripting tools and techniques such ...

next page

Showing results 1-20

Linux Site Reliability Engineer information

What are the key skills and qualifications needed to thrive as a Linux Site Reliability Engineer, and why are they important?

To thrive as a Linux Site Reliability Engineer, you need deep expertise in Linux system administration, scripting (such as Bash or Python), and a solid understanding of networking concepts, usually backed by a computer science degree or equivalent experience. Familiarity with configuration management tools (like Ansible, Puppet, or Chef), containerization (Docker, Kubernetes), and cloud platforms (AWS, GCP, or Azure) is typically required, along with relevant certifications like RHCE or AWS Certified SysOps Administrator. Strong problem-solving skills, effective communication, and the ability to work under pressure are crucial soft skills for this role. These competencies ensure the reliability, scalability, and security of complex infrastructure, minimizing downtime and supporting seamless operations.

What are some common challenges faced by Linux Site Reliability Engineers when scaling infrastructure, and how can they be addressed?

Linux Site Reliability Engineers often encounter challenges related to maintaining system stability and performance as infrastructure scales. Issues such as configuration drift, automation bottlenecks, and monitoring gaps can arise when managing numerous servers or services. Addressing these challenges typically involves implementing robust configuration management tools, investing in automated deployment pipelines, and enhancing observability through comprehensive monitoring and alerting solutions. Collaboration with development and operations teams is essential to ensure that scalability solutions align with business needs and technical requirements.

What is a Linux Site Reliability Engineer?

A Linux Site Reliability Engineer (SRE) is an IT professional responsible for ensuring the reliability, scalability, and performance of systems running on the Linux operating system. They bridge the gap between software development and operations by automating processes, monitoring infrastructure, and managing incidents. Linux SREs focus on system availability, building tools for deployment and monitoring, and improving system robustness through best practices and automation. Their work helps organizations deliver reliable online services and quickly recover from outages or system failures.

What is the difference between Linux Site Reliability Engineer vs Linux DevOps Engineer?

AspectLinux Site Reliability EngineerLinux DevOps Engineer
CredentialsLinux certifications, SRE-specific trainingLinux certifications, DevOps tools certifications
Work EnvironmentFocus on system reliability, monitoring, incident responseFocus on automation, CI/CD pipelines, deployment
Employer & IndustryTech companies, cloud providers, large enterprisesStartups, tech firms, software development teams
Search & Comparison IntentUnderstanding reliability roles, incident managementAutomation, deployment, continuous integration

While both roles involve Linux expertise, a Linux Site Reliability Engineer primarily focuses on maintaining system reliability, monitoring, and incident response. In contrast, a Linux DevOps Engineer emphasizes automation, continuous integration, and deployment processes. Both roles require Linux skills and often overlap, but their core responsibilities differ based on organizational needs.

What job categories do people searching Linux Site Reliability Engineer jobs in Washington look for? The top searched job categories for Linux Site Reliability Engineer jobs in Washington are:
What cities in Washington are hiring for Linux Site Reliability Engineer jobs? Cities in Washington with the most Linux Site Reliability Engineer job openings:

$64.25 - $85.50/hr

Other

Posted 20 days ago


Job description

Job Title: Site Reliability Engineer (SRE)

Location: Washington, DC (Onsite)

Clearance: TS/SCI

Position Overview

Seeking a highly motivated Site Reliability Engineer (SRE) to support mission-critical enterprise applications and infrastructure in a high-availability environment. The SRE will be responsible for ensuring system reliability, performance, scalability, and operational efficiency through proactive monitoring, automation, and rapid incident response.

This role bridges development and operations, partnering closely with engineering teams to ensure new capabilities are delivered without compromising production stability. The ideal candidate brings strong Linux expertise, automation skills, and hands-on experience with cloud-native and containerized environments.

Key Responsibilities

Monitoring & Performance

· Monitor system health, availability, and performance using enterprise observability tools

· Analyze metrics and logs to proactively detect and remediate issues

· Tune alerting to reduce noise and prioritize mission impact

Incident Management & Reliability

· Respond to and resolve production incidents across distributed environments

· Perform root cause analysis and lead post-incident reviews

· Implement corrective and preventive actions to improve resilience

· Participate in on-call rotation for outages, upgrades, and urgent activities

Automation & DevOps Enablement

· Automate repetitive operational tasks to improve efficiency and reduce human error

· Support CI/CD pipelines and automated deployment workflows

· Develop scripts and tooling to improve reliability and repeatability

Platform & Infrastructure Support

· Maintain Linux/Unix systems and containerized workloads

· Support Kubernetes/Docker environments and microservices architectures

· Assist with configuration management and environment standardization

· Ensure secure and compliant system configurations

Collaboration & Continuous Improvement

· Partner with development teams to improve service reliability and performance

· Support backlog refinement and reliability engineering initiatives

· Document runbooks, procedures, and knowledge articles

· Contribute to continuous service improvement efforts

Required Qualifications

Education & Experience

· Bachelor’s degree in Computer Science, Engineering, or related technical field

· Minimum 5 years of relevant technical experience

· At least 3 years of systems programming or SRE/DevOps experience

Technical Skills

· Strong proficiency in Python, Bash, or similar scripting languages

· Hands-on experience with Linux/Unix administration

· Experience with Kubernetes and Docker

· Familiarity with cloud platforms (AWS, Azure, or Google Cloud)

· Experience with monitoring and logging tools (e.g., Grafana, Kibana, Prometheus, ELK)

· Working knowledge of CI/CD tools (e.g., GitLab, Jenkins, ArgoCD)

· Understanding of microservices architecture and DevOps practices

· Experience with Git-based workflows

Infrastructure & Networking

· Knowledge of networking fundamentals, load balancers, and firewalls

· Experience with identity and access management (IAM, SSH, VPN, security groups)

· Experience deploying to on-premises or data center environments

Professional Skills

· Strong analytical and troubleshooting abilities

· Excellent time management and ability to work independently

· Effective written and verbal communication skills

· Experience using Jira and Confluence in an Agile environment

Preferred Qualifications

· Experience defining or working with SLIs, SLOs, and error budgets

· Familiarity with Helm and Kubernetes deployment pipelines

· Experience supporting high-availability or mission-critical systems

· Knowledge of security best practices and compliance frameworks