1

Site Reliability Engineer Jobs in Springfield, VA

Site Reliability Engineer

Mclean, VA · On-site

$125K - $200K/yr

Overview As a Site Reliability Engineer (SRE) , you will help design, build, and operate reliable, secure, and observable cloud-native systems that support mission-critical applications and services.

Site Reliability Engineer

Mclean, VA · On-site

$125K - $200K/yr

Overview As a Site Reliability Engineer (SRE) , you will help design, build, and operate reliable, secure, and observable cloud-native systems that support mission-critical applications and services.

Site Reliability Engineer

Mclean, VA · On-site

$125K - $200K/yr

As a Site Reliability Engineer (SRE) , you will help design, build, and operate reliable, secure, and observable cloud-native systems that support mission-critical applications and services. You will ...

Senior Technology Site Reliability Engineer

Reston, VA · On-site

$59.25 - $78.75/hr

Senior Technology Site Reliability Engineer Cooley is seeking a Senior Site Reliability Engineer to join the Infrastructure & Development Operationsteam. Position summary: The Senior Technology Site ...

Staff Site Reliability Engineer

Reston, VA

$59.25 - $78.75/hr

The Site Reliability Engineering team drives reliability strategy, elevates engineering standards, and owns some of the most complex and consequential work on the platform. As a Staff Site ...

Staff Site Reliability Engineer

Reston, VA

$59.25 - $78.75/hr

The Site Reliability Engineering team drives reliability strategy, elevates engineering standards, and owns some of the most complex and consequential work on the platform. As a Staff Site ...

SRE Engineer

Bethesda, MD · On-site

$61 - $81/hr

Job Title : SRE Engineer Location:Bethesda Maryland Duration : 12 months 3 days a week onsite at client. Bethesda Maryland Description: You will deliver AWS engineering services to support the ...

Site Reliability Engineer II

Mclean, VA · On-site

$57.50 - $76.50/hr

As an SRE II, you will help operate and improve the reliability, scalability, and performance of services running across Kubernetes-based environments in cloud and hybrid infrastructure. You will ...

Staff Site Reliability Engineer

Fairfax, VA · On-site

$58.25 - $77.50/hr

S. Citizenship / No clearance needed / 100% remote within the US Staff Site Reliability Engineer / Cloud SME Location: 100% remote in the continental US Type: Long-term contract (3+ years) Role ...

next page

Showing results 1-20

Site Reliability Engineer information

See Springfield, VA salary details

$11

$66

$95

How much do site reliability engineer jobs pay per hour?

As of Jun 23, 2026, the average hourly pay for site reliability engineer in Springfield, VA is $66.58, according to ZipRecruiter salary data. Most workers in this role earn between $57.26 and $76.06 per hour, depending on experience, location, and employer.

Will SRE be replaced by AI?

Site Reliability Engineers (SREs) focus on maintaining system reliability, automation, and incident response, and AI tools are increasingly used to assist these tasks. While AI can automate routine processes, SREs' expertise in system design, troubleshooting, and decision-making remains essential, making complete replacement unlikely in the near future.

What Is a Site Reliability Engineer?

A site reliability engineer specializes in site reliability engineering, or SRE, a specific branch of operations first pioneered by Google. You are responsible for ensuring that when a website decides to scale a particular feature for various users to access, it does not break the underlying software or website functions. This means you need to use analytical problem-solving skills to determine how to make specific features on a new software release work on top of existing source code.

What engineers make $300,000 a year?

Senior-level engineers such as Site Reliability Engineers, Software Engineers, and Cloud Infrastructure Engineers can earn $300,000 or more annually, especially with extensive experience, specialized skills, and working at large tech companies or in high-cost-of-living areas. Compensation often includes base salary, bonuses, and stock options, with expertise in automation, cloud platforms, and monitoring tools being highly valued.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer, and why are they important?

To thrive as a Site Reliability Engineer, you need a strong background in computer science, systems administration, and software engineering, often supported by a degree in a technical field. Familiarity with cloud platforms (like AWS or GCP), container orchestration (such as Kubernetes), infrastructure as code (Terraform or Ansible), and monitoring tools (Prometheus, Grafana) is typically expected. Strong problem-solving skills, effective communication, and a proactive mindset help SREs excel at incident management and cross-functional collaboration. These skills are crucial for maintaining system reliability, minimizing downtime, and driving continuous improvement in complex technical environments.

Is SRE a stressful job?

Site Reliability Engineers (SREs) often work in high-pressure environments where they monitor system performance, troubleshoot outages, and ensure uptime. The role can involve on-call duties and incident response, which may contribute to stress, but it also offers opportunities for automation and process improvements to reduce workload. Overall, stress levels vary depending on the organization, team culture, and individual skills.

What are some of the most common challenges Site Reliability Engineers face when balancing system reliability with rapid software delivery?

Site Reliability Engineers (SREs) often navigate the challenge of maintaining highly reliable systems while supporting fast-paced software releases. This involves managing incidents, automating processes to reduce manual toil, and working closely with development teams to embed reliability into the software development lifecycle. SREs must carefully prioritize their efforts between proactive improvements and urgent, reactive fire-fighting. Effective communication and collaboration with both operations and development teams are crucial to ensuring service uptime without slowing down innovation.

What does a Site Reliability Engineer do?

A Site Reliability Engineer (SRE) is responsible for maintaining and improving the reliability, availability, and performance of software systems. They use automation, monitoring tools, and scripting to prevent outages and resolve issues quickly, often working closely with development teams to ensure scalable infrastructure. SREs typically have skills in systems engineering, coding, and cloud platforms, and may hold certifications like those in cloud services or DevOps practices.

What is the difference between Site Reliability Engineer vs DevOps Engineer?

AspectSite Reliability EngineerDevOps Engineer
CredentialsTypically requires a computer science degree, certifications like AWS, Google Cloud, or KubernetesSimilar credentials, often with cloud certifications and scripting skills
Work EnvironmentFocuses on maintaining and improving system reliability, often in large-scale production environmentsWorks on automation, CI/CD pipelines, and deployment processes across development and operations teams
Industry UsageCommon in tech, cloud services, and large-scale enterprise companiesWidely used in software development, cloud, and IT organizations

Both roles require strong technical skills and cloud knowledge, but SREs focus more on system reliability and uptime, while DevOps engineers emphasize automation and deployment processes. They often collaborate but have distinct primary responsibilities.

What is a Site Reliability Engineer?

A Site Reliability Engineer (SRE) is a professional who applies software engineering principles to infrastructure and operations problems. Their primary goal is to create scalable and highly reliable software systems, often bridging the gap between development and IT operations. SREs automate tasks, monitor system health, respond to incidents, and work to improve system reliability and performance. They also help define service level objectives (SLOs) and ensure systems meet customer expectations for uptime and availability.
What cities near Springfield, VA are hiring for Site Reliability Engineer jobs? Cities near Springfield, VA with the most Site Reliability Engineer job openings:
Systems Engineer - Site Reliability Engineering

Systems Engineer - Site Reliability Engineering

Marriott

Bethesda, MD • On-site

$60.75 - $80.75/hr

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 28 days ago


Marriott International rating

6.4

Company rating: 6.4 out of 10

Based on 1,146 frontline employees who took The Breakroom Quiz

50th of 105 rated hotels


Job description

JOB SUMMARY: 

The Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams, Applications teams, Infrastructure, and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability, availability and performance. The ideal candidate will bring strong communication skills, collaborating with key stakeholders across the company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic, fast-paced environment. 

CANDIDATE PROFILE:  

Required

  • Undergraduate degree in an engineering or computer science discipline and/or equivalent experience/certification 

  • 5+ years of hands-on experience in designing, building and operating production grade systems including:  

  • 2+ years of experience as a Site Reliability Engineer (SRE), building and managing highly available and mission critical systems 

  • Deep understanding of SRE practices, such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning 

  • Expertise in AWS services including designing highly available, multi-AZ and multi-region architectures, for example:  

  • Compute: EC2, Auto Scaling, Lambda 

  • Containers: EKS (Mandatory), ECS (good to have) 

  • Networking: VPC, subnets, route tables, NAT gateways, Transit Gateway 

  • Security: IAM roles/Policies, KMS, Secret manager 

  • Storage and Databases: S3, EBS, EFS, RDS, DocumentDB.  

  • Proven automation and programming experience in one or more of the following languages: Python, PowerShell 

  • Experience using modern, continuous development techniques and pipelines (e.g. Agile, Kanban, Jira, CI/CD, Helm, Harness, Jenkins, Git, Artifactory, Vault) 

  • Experience designing and implementing end-to-end observability solutions across metrics, logs, and traces using tools like Prometheus, Grafana, ELK Stack, and OpenTelemetry. 

  • Hands-on experience with Linux administration (RHEL, Ubuntu, CentOS, AWS Linux) 

  • Experience troubleshooting API-related issues in distributed systems, including latency, authentication/authorization failures, rate limiting, and upstream/downstream dependency failures. 

  • Experience with containerization orchestration engines such as Kubernetes (EKS, AKS, ACK) 

  • Familiarity with service mesh technologies to enable secure and resilient service communication, including mTLS, traffic shaping, and policy enforcement. 

  • Familiarity with Infrastructure as Code (Iac) tools like Terraform and CloudFormation. 

  • Familiarity with configuration management and automation tools such as Ansible.  

  • Familiarity with vulnerability management, OS hardening, patching, security compliance of infrastructure, applications and databases 

  • Understanding of basic networking fundamentals  

Preferred: 

  • Experience driving cloud cost optimization initiatives (rightsizing, reserved instances, autoscaling strategies, cost observability) 

  • Networking expertise including Load Balancing, Firewalls, Security Groups, NACLs, TCP/IP, DNS, HTTP/HTTPS, SSL/TLS etc 

CORE WORK ACTIVITIES

  • Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management. 

  • Oversee the management of high-severity incidents, driving quick resolution and post-incident analysis to identify root causes and prevent recurrence. 

  • Drive the automation of operational processes and ensure systems can scale effectively to support growing user demand, optimizing cloud and on-prem infrastructure and resource usage. 

  • Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders 

Drive Applications Performance Management and Monitoring: 

  • Assess application architectures to identify key monitoring points 

  • Identify Key Performance Indicators, apply monitoring, and report out on compliance. 

  • Gather information to develop reporting metrics and KPIs 

  • Ensure that all applications adhere to appropriate monitoring standards based on their technology/business process  

  • Determine forums and cadence to provide regular monitoring updates 

Building Successful Relationships: 

  • Collaborates with Enterprise Application and Architecture and Infrastructure teams to continuously improve processes and procedures.  

  • Liaises with vendors and Service Providers to select services and tools that best meet company goals 

Managing Projects and Priorities: 

  • Develops specific goals and plans to prioritize, organize, and accomplish work. 

  • Champions leaders' vision for product and service delivery. 

  • Executes the necessary decisions to keep moving forward toward achievement of goals. 

  • Determines priorities, schedules, plans and necessary resources to promote completion of any projects on schedule. 

Delivering on the Needs of Key Stakeholders

  • Understands and meets the needs of key stakeholders. 

  • Communicates concepts in a clear and persuasive manner that is easy to understand. 

  • Demonstrates an understanding of business priorities. 

  • Supports achievement of performance goals, budget goals, team goals, etc. 

Providing Technical Support and Consultation

  • Provides technical expertise within own and other teams. 

  • Provides recommendations to improve the effectiveness of processes and programs. 

  • Demonstrates advanced knowledge of job-relevant issues, products, systems, and processes.  

  • Keeps up-to-date technically and applies new knowledge to job. 

  • Performs other reasonable duties as required for this position. 

At Marriott International, we are dedicated to being an equal opportunity employer, welcoming all and providing access to opportunity. We actively foster an environment where the unique backgrounds of our associates are valued and celebrated.Our greatest strength lies in the rich blend of culture, talent, and experiences of our associates. We are committed to non-discrimination on any protected basis, including disability, veteran status, or other basis protected by applicable law. 

All positions offer a 401(k) plan, stock purchase plan, discounts at Marriott properties, commuter benefits, employee assistance plan, and childcare discounts.  Benefits are subject to terms and conditions, which may include rules regarding eligibility, enrollment, waiting period, contribution, benefit limits, election changes, benefit exclusions, and others. Click here to learn more.

Full-time positions also offer coverage for medical, dental, vision, health care flexible spending account, dependent care flexible spending account, life insurance, disability insurance, accident insurance, adoption expense reimbursements, paid parental leave and educational assistance. 

Washington Applicants Only: Employees will accrue paid sick leave, 0.077 PTO balance for every hour worked and be eligible to receive a minimum of 9 holidays annually.

Marriott HQ is committed to a hybrid work environment that enables associates to Be connected.  Headquarters-based positions are considered hybrid, for candidates within a commuting distance to Bethesda, MD; candidates outside of commuting distance to Bethesda, MD will be considered for Remote positions.
Marriott International is the world's largest hotel company, with more brands, more hotels and more opportunities for associates to grow and succeed. Be where you can do your best work, begin your purpose, belong to an amazing global team, and become the best version of you.

What Marriott International employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom