1

Customer Reliability Engineer Jobs in Virginia (NOW HIRING)

Site Reliability Engineer - CTJ - POLY

Reston, VA · On-site

$59.25 - $78.75/hr

Candidates must be able to meet Microsoft, customer and/or government security screening ... Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119 ...

ServiceNow SRE Engineering Manager

Mclean, VA · On-site

$57.50 - $76.50/hr

As a Manager, ServiceNow SRE Engineer , you will actively engage in your engineering craft, taking ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

ServiceNow SRE Engineering Manager

Richmond, VA · On-site

$56.50 - $75/hr

As a Manager, ServiceNow SRE Engineer , you will actively engage in your engineering craft, taking ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

Manager, SRE Engineer - PxE ERM

Rosslyn, VA

$65 - $86.25/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

Manager, SRE Engineer - PxE ERM

Arlington, VA · On-site

$65.50 - $87.25/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

ServiceNow SRE Engineering Manager

Rosslyn, VA · On-site

$65 - $86.25/hr

As a Manager, ServiceNow SRE Engineer , you will actively engage in your engineering craft, taking ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

Manager, SRE Engineer - PxE ERM

Richmond, VA · On-site

$56.50 - $75/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

Manager, SRE Engineer - PxE ERM

Mclean, VA · On-site

$57.50 - $76.50/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

ServiceNow SRE Engineering Manager

Arlington, VA · On-site

$65.50 - $87.25/hr

As a Manager, ServiceNow SRE Engineer , you will actively engage in your engineering craft, taking ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

SRE Engineering Manager - PxE ERM

Richmond, VA · On-site

$56.50 - $75/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

SRE Engineering Manager - PxE ERM

Rosslyn, VA · On-site

$65 - $86.25/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

SRE Engineering Manager - PxE ERM

Mclean, VA · On-site

$57.50 - $76.50/hr

As a Manager, SRE Engineer , you will actively engage in your engineering craft, taking a hands-on ... Customer-Centric Engineering: Develop lean engineering solutions through rapid, inexpensive ...

next page

Showing results 1-20

Customer Reliability Engineer information

See Virginia salary details

$60.5K

$117K

$139.8K

How much do customer reliability engineer jobs pay per year?

As of Jun 9, 2026, the average yearly pay for customer reliability engineer in Virginia is $116,961.00, according to ZipRecruiter salary data. Most workers in this role earn between $101,600.00 and $127,900.00 per year, depending on experience, location, and employer.

How does a Customer Reliability Engineer typically interact with both clients and internal engineering teams?

Customer Reliability Engineers serve as a vital bridge between clients and internal technical teams. They regularly communicate with customers to understand their needs, troubleshoot issues, and provide technical guidance. Internally, they collaborate closely with product, support, and development teams to relay customer feedback, help prioritize reliability improvements, and ensure seamless incident resolution. This cross-functional role requires strong communication skills and the ability to translate technical information for different audiences, making every day varied and impactful.

What is the difference between Customer Reliability Engineer vs Site Reliability Engineer?

AspectCustomer Reliability EngineerSite Reliability Engineer
CredentialsTypically requires engineering degrees, certifications in cloud platforms (AWS, Azure), and knowledge of customer supportRequires engineering degrees, certifications in cloud and systems management, with a focus on infrastructure
Work EnvironmentCustomer-facing, involves direct interaction with clients to resolve issues and improve reliabilityPrimarily internal, focused on maintaining and improving system reliability and scalability
Employer & Industry UsageUsed by cloud service providers and tech companies with a customer support componentCommon in large tech companies managing large-scale infrastructure and services

The main difference is that Customer Reliability Engineers focus on ensuring customer satisfaction and resolving client-specific issues, while Site Reliability Engineers concentrate on internal system stability and scalability. Both roles require technical expertise and cloud knowledge but serve different operational needs.

What is a Customer Reliability Engineer?

A Customer Reliability Engineer (CRE) is a technical professional who works closely with customers to ensure the reliability, performance, and uptime of software products and services. CREs act as a bridge between customers and engineering teams, helping to identify, troubleshoot, and resolve reliability issues. They often collaborate with multiple departments to implement best practices, monitor systems, and proactively address potential problems, ultimately aiming to improve the overall customer experience.

What are the key skills and qualifications needed to thrive as a Customer Reliability Engineer, and why are they important?

To thrive as a Customer Reliability Engineer, you need a solid background in systems engineering, incident management, and troubleshooting, often supported by a degree in computer science or related field. Familiarity with cloud platforms (such as AWS or GCP), monitoring tools (like Datadog or Prometheus), and automation scripts is typically required. Exceptional communication, problem-solving abilities, and a customer-centric mindset are vital soft skills for this role. These skills ensure efficient incident resolution, strong client relationships, and reliable system performance under pressure.
What job categories do people searching Customer Reliability Engineer jobs in Virginia look for? The top searched job categories for Customer Reliability Engineer jobs in Virginia are:
What cities in Virginia are hiring for Customer Reliability Engineer jobs? Cities in Virginia with the most Customer Reliability Engineer job openings:
Site Reliability Engineer - CTJ - POLY

Site Reliability Engineer - CTJ - POLY

Microsoft

Reston, VA • On-site

$59.25 - $78.75/hr

Full-time

Posted 4 days ago


Microsoft rating

8.6

Company rating: 8.6 out of 10

Based on 125 frontline employees who took The Breakroom Quiz

47th of 186 rated software companies


Job description

Overview
Microsoft has an exciting opportunity for a Senior Site Reliability Engineer (SRE) to join the Azure Silver and Sovereign Team as part of the Azure Data Transfer (ADT) team. Azure Data Transfer enables secure access and data transfer between enclaves and supports multiple transfer and access patterns for highly regulated industries. In this role, you will apply SRE principles-availability, latency, performance, efficiency, change management, and incident response-to help ensure ADT is dependable at scale.
We are looking for engineers to join a fast-paced team and solve complex reliability challenges in mission-critical distributed systems spanning data transmission across clouds. Our team works across all facets of isolated system engineering and is deeply involved in defining and improving service health through SLIs/SLOs and error budgets, building automation to reduce toil, strengthening observability (logs, metrics, traces), reducing systemic latency, validating and transforming data, and optimizing throughput and capacity. You will build, deploy, and operate systems that enable a broad set of Azure services to be consumed by customers in highly secured and regulated environments, meeting strict security policy and assurance requirements for public and private sector customers.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
  • Owns reliability architecture and end-to-end service understanding (dependencies, failure modes, and customer journeys) for distributed systems at scale. Defines and improves service health via SLIs/SLOs, error budgets, and well-defined operational readiness criteria. Drives cross-team reliability reviews and recommends design changes, runbooks, and safe rollout/rollback strategies that improve availability, latency, performance, and efficiency while managing cost.
  • Maintains deep, current expertise in cloud reliability practices and the evolving technology landscape. Drives adoption of new platform capabilities and operational patterns (e.g., progressive delivery, resilience testing, chaos engineering where appropriate). Mentors engineers through design reviews, incident walkthroughs, and knowledge sharing to raise the reliability bar across related services.
  • Implements reliable, scalable, and high-performance changes using SRE practices (progressive delivery, feature flags where applicable, safe rollouts/rollbacks). Owns implementation and rollback plans, validates operational readiness, and reduces toil through automation, self-healing, and standardized playbooks.
  • Leverages telemetry and production signals to identify reliability risks and recurring failure patterns, then ships configuration changes, code fixes, or automation to address root causes. Expands infrastructure-as-code and operational tooling so teams can manage platforms and services safely and repeatably through code and policy.
  • Builds and improves observability (metrics, logs, traces, dashboards, alerts) and uses it to detect, diagnose, and prevent incidents. Defines actionable alerting, reduces noise, and ensures instrumentation supports SLO reporting and rapid troubleshooting. Develops automation to validate telemetry pipelines and to enable automated mitigation and safer incident response.
  • Participates in on-call rotations and leads response for complex, high-impact incidents by establishing incident command, assessing impact, coordinating responders, and driving mitigations to restore service within SLOs. Produces and contributes to blameless postmortems with corrective and preventative actions (CPAs), tracks them to completion, and implements automation and guardrails to prevent recurrence.
  • Applies secure-by-design and compliance requirements to operations, monitoring, and automation (least privilege, auditability, change control, and data handling). Partners with security, privacy, and compliance teams to identify gaps, prioritize fixes, and implement automated controls and detection to prevent repeated violations
  • Embody our culture and values

Qualifications
Required / Minimum Qualifications:
  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.

Other Requirements:
Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
  • The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Ability to meet Microsoft, customer and/or government security screening requirements are required pre-offer and post-hire for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
  • Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance

Preferred Qualifications:
  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, service engineering, or systems engineering
    OR equivalent experience.
  • 3+ years technical experience working with large-scale cloud or distributed systems
  • Experience building automation with Ansible and developing/operating CI/CD pipelines (e.g., Azure DevOps, GitHub Actions) to deliver reliable, repeatable deployments.
  • Expertise in problem solving and analyzing distributed systems and critical production service environments
  • Expertise in Linux, specifically Rocky 9, Redhat, Mariner or similar in throughput management, troubleshooting and security hardening

Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

What Microsoft employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Microsoft logo

About Microsoft

Sourced by ZipRecruiter

Our infrastructure is comprised of a large global portfolio of more than 100 datacenters and 1 million servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider.

Industry

Computer and computer peripheral equipment and software wholesalers

Company size

10,000+ Employees

Headquarters location

Redmond, WA, US

Year founded

1975

Social media