1

Linux Site Reliability Engineer Jobs in Washington

Site Reliability Engineer

Sterling, VA

$56.50 - $75/hr

The Site Reliability Engineer (SRE) collaboratively works closely with the contract leadership ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

Site Reliability Engineer

Sterling, VA ยท On-site

$56.50 - $75/hr

Site Reliability Engineer Location: Sterling, VA Clearance: TS/SCI Poly **This position is ... Linux/Unix Systems Administration: Strong knowledge of Linux/Unix operating systems, including ...

Site Reliability Engineer (SRE)

Reston, VA

$59.25 - $78.75/hr

The SRE team is responsible for maintaining the existing systems, supporting our development teams, and implementing innovative solutions. You will work alongside software developers, testers, and ...

SRE Engineer

Bethesda, MD ยท On-site

$61 - $81/hr

Job Title : SRE Engineer Location:Bethesda Maryland Duration : 12 months 3 days a week onsite at ... Required Linux Advanced (6-9 years experience) Required UNIX Advanced (6-9 years experience ...

The SRE will drive automation initiatives, observability improvements, and incident response ... Strong Linux administration experience. * Experience with cloud-native technologies and automation ...

Site Reliability Engineer

Washington, DC ยท On-site

$112K - $179K/yr

The SRE will drive automation initiatives, observability improvements, and incident response ... Strong Linux administration experience. * Experience with cloud-native technologies and automation ...

The SRE will drive automation initiatives, observability improvements, and incident response ... Strong Linux administration experience. * Experience with cloud-native technologies and automation ...

Associate Site Reliability Engineer

Millersville, MD ยท On-site +1

$55.50 - $73.50/hr

The Associate Site Reliability Engineer will help apply software engineering and operational best ... Foundational understanding of Linux systems, cloud infrastructure concepts, networking basics, and ...

next page

Showing results 1-20

Linux Site Reliability Engineer information

What are some common challenges faced by Linux Site Reliability Engineers when scaling infrastructure, and how can they be addressed?

Linux Site Reliability Engineers often encounter challenges related to maintaining system stability and performance as infrastructure scales. Issues such as configuration drift, automation bottlenecks, and monitoring gaps can arise when managing numerous servers or services. Addressing these challenges typically involves implementing robust configuration management tools, investing in automated deployment pipelines, and enhancing observability through comprehensive monitoring and alerting solutions. Collaboration with development and operations teams is essential to ensure that scalability solutions align with business needs and technical requirements.

What are the key skills and qualifications needed to thrive as a Linux Site Reliability Engineer, and why are they important?

To thrive as a Linux Site Reliability Engineer, you need deep expertise in Linux system administration, scripting (such as Bash or Python), and a solid understanding of networking concepts, usually backed by a computer science degree or equivalent experience. Familiarity with configuration management tools (like Ansible, Puppet, or Chef), containerization (Docker, Kubernetes), and cloud platforms (AWS, GCP, or Azure) is typically required, along with relevant certifications like RHCE or AWS Certified SysOps Administrator. Strong problem-solving skills, effective communication, and the ability to work under pressure are crucial soft skills for this role. These competencies ensure the reliability, scalability, and security of complex infrastructure, minimizing downtime and supporting seamless operations.

Who gets paid more, SRE or DevOps?

Generally, Site Reliability Engineers (SREs) tend to have higher salaries than DevOps engineers due to their specialized focus on system reliability, automation, and incident management. Both roles require strong skills in cloud platforms, scripting, and monitoring tools, but SREs often have more advanced expertise in reliability engineering practices, which can lead to higher compensation.

Will AI replace SRE jobs?

AI is expected to augment the work of Linux Site Reliability Engineers by automating routine tasks such as monitoring, incident response, and log analysis. However, SRE roles require complex problem-solving, system design, and decision-making that currently cannot be fully replaced by AI, making human expertise essential. SREs will likely focus more on overseeing automation tools and managing system reliability rather than being replaced entirely.

What engineer makes $500,000 a year?

A senior Linux Site Reliability Engineer or similar high-level engineering roles in cloud infrastructure and large-scale systems can earn $500,000 or more annually, especially with bonuses and stock options. These positions typically require extensive experience, advanced skills in automation, scripting, and cloud platforms, and often involve leadership responsibilities.

What engineers make $300,000 a year?

Senior Linux Site Reliability Engineers with extensive experience, advanced skills in automation, cloud platforms, and monitoring tools can earn $300,000 or more annually, especially in high-cost-of-living areas or large tech companies. Achieving this salary often requires specialized certifications, leadership roles, and a strong track record of managing complex infrastructure at scale.

What is the difference between Linux Site Reliability Engineer vs Linux DevOps Engineer?

AspectLinux Site Reliability EngineerLinux DevOps Engineer
CredentialsLinux certifications, SRE-specific trainingLinux certifications, DevOps tools certifications
Work EnvironmentFocus on system reliability, monitoring, incident responseFocus on automation, CI/CD pipelines, deployment
Employer & IndustryTech companies, cloud providers, large enterprisesStartups, tech firms, software development teams
Search & Comparison IntentUnderstanding reliability roles, incident managementAutomation, deployment, continuous integration

While both roles involve Linux expertise, a Linux Site Reliability Engineer primarily focuses on maintaining system reliability, monitoring, and incident response. In contrast, a Linux DevOps Engineer emphasizes automation, continuous integration, and deployment processes. Both roles require Linux skills and often overlap, but their core responsibilities differ based on organizational needs.

What is a Linux Site Reliability Engineer?

A Linux Site Reliability Engineer (SRE) is an IT professional responsible for ensuring the reliability, scalability, and performance of systems running on the Linux operating system. They bridge the gap between software development and operations by automating processes, monitoring infrastructure, and managing incidents. Linux SREs focus on system availability, building tools for deployment and monitoring, and improving system robustness through best practices and automation. Their work helps organizations deliver reliable online services and quickly recover from outages or system failures.
What are popular job titles related to Linux Site Reliability Engineer jobs in Washington? For Linux Site Reliability Engineer jobs in Washington, the most frequently searched job titles are:
What job categories do people searching Linux Site Reliability Engineer jobs in Washington look for? The top searched job categories for Linux Site Reliability Engineer jobs in Washington are:
What cities in Washington are hiring for Linux Site Reliability Engineer jobs? Cities in Washington with the most Linux Site Reliability Engineer job openings:

Site Reliability Engineer

Input Technology Solutions

Washington, DC โ€ข On-site

$64.25 - $85.50/hr

Other

Posted 12 days ago


Job description

Job Title: Site Reliability Engineer (SRE)

Location: Washington, DC (Onsite)

Clearance: TS/SCI

Position Overview

Seeking a highly motivated Site Reliability Engineer (SRE) to support mission-critical enterprise applications and infrastructure in a high-availability environment. The SRE will be responsible for ensuring system reliability, performance, scalability, and operational efficiency through proactive monitoring, automation, and rapid incident response.

This role bridges development and operations, partnering closely with engineering teams to ensure new capabilities are delivered without compromising production stability. The ideal candidate brings strong Linux expertise, automation skills, and hands-on experience with cloud-native and containerized environments.

Key Responsibilities

Monitoring & Performance

ยท Monitor system health, availability, and performance using enterprise observability tools

ยท Analyze metrics and logs to proactively detect and remediate issues

ยท Tune alerting to reduce noise and prioritize mission impact

Incident Management & Reliability

ยท Respond to and resolve production incidents across distributed environments

ยท Perform root cause analysis and lead post-incident reviews

ยท Implement corrective and preventive actions to improve resilience

ยท Participate in on-call rotation for outages, upgrades, and urgent activities

Automation & DevOps Enablement

ยท Automate repetitive operational tasks to improve efficiency and reduce human error

ยท Support CI/CD pipelines and automated deployment workflows

ยท Develop scripts and tooling to improve reliability and repeatability

Platform & Infrastructure Support

ยท Maintain Linux/Unix systems and containerized workloads

ยท Support Kubernetes/Docker environments and microservices architectures

ยท Assist with configuration management and environment standardization

ยท Ensure secure and compliant system configurations

Collaboration & Continuous Improvement

ยท Partner with development teams to improve service reliability and performance

ยท Support backlog refinement and reliability engineering initiatives

ยท Document runbooks, procedures, and knowledge articles

ยท Contribute to continuous service improvement efforts

Required Qualifications

Education & Experience

ยท Bachelorโ€™s degree in Computer Science, Engineering, or related technical field

ยท Minimum 5 years of relevant technical experience

ยท At least 3 years of systems programming or SRE/DevOps experience

Technical Skills

ยท Strong proficiency in Python, Bash, or similar scripting languages

ยท Hands-on experience with Linux/Unix administration

ยท Experience with Kubernetes and Docker

ยท Familiarity with cloud platforms (AWS, Azure, or Google Cloud)

ยท Experience with monitoring and logging tools (e.g., Grafana, Kibana, Prometheus, ELK)

ยท Working knowledge of CI/CD tools (e.g., GitLab, Jenkins, ArgoCD)

ยท Understanding of microservices architecture and DevOps practices

ยท Experience with Git-based workflows

Infrastructure & Networking

ยท Knowledge of networking fundamentals, load balancers, and firewalls

ยท Experience with identity and access management (IAM, SSH, VPN, security groups)

ยท Experience deploying to on-premises or data center environments

Professional Skills

ยท Strong analytical and troubleshooting abilities

ยท Excellent time management and ability to work independently

ยท Effective written and verbal communication skills

ยท Experience using Jira and Confluence in an Agile environment

Preferred Qualifications

ยท Experience defining or working with SLIs, SLOs, and error budgets

ยท Familiarity with Helm and Kubernetes deployment pipelines

ยท Experience supporting high-availability or mission-critical systems

ยท Knowledge of security best practices and compliance frameworks