1

Site Reliability Engineer Jobs in Springfield, VA

Site Reliability Engineer - Hybrid

Reston, VA · On-site

$59.25 - $78.75/hr

Title: Site Reliability Engineer V Location: Reston, VA (Hybrid onsite - 3 days a week from day 1) Assignment duration: 24 months with possibility of extension Interview process: 2 rounds. First ...

Site Reliability Engineer

Washington, DC · On-site

$112K - $179K/yr

The SRE will drive automation initiatives, observability improvements, and incident response operations. Site Reliability Engineer responsibilities: * Design and implement automation solutions for ...

Site Reliability Engineer

Chantilly, VA · On-site

$62K - $141K/yr

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network ...

Site Reliability Engineer

Washington, DC · On-site

$112K - $179K/yr

The SRE will drive automation initiatives, observability improvements, and incident response operations. Site Reliability Engineer responsibilities: * Design and implement automation solutions for ...

The SRE will drive automation initiatives, observability improvements, and incident response operations. Site Reliability Engineer responsibilities: * Design and implement automation solutions for ...

Site Reliability Engineer

Chantilly, VA · On-site

$62K - $141K/yr

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network ...

Site Reliability Engineer

Herndon, VA · On-site

$86K - $198K/yr

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network ...

Staff Cyber Site Reliability Engineer (SRE)

Bethesda, MD · On-site

$61 - $81/hr

Position Description As a Staff Cyber SRE, you will be embedded in the Cybersecurity Engineering & Analytics team, partnering directly with software developers and infrastructure engineers to improve ...

Site Reliability Engineer

Herndon, VA · On-site

$86K - $198K/yr

Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network ...

next page

Showing results 1-20

Site Reliability Engineer information

See Springfield, VA salary details

$11

$66

$95

How much do site reliability engineer jobs pay per hour?

As of Jun 22, 2026, the average hourly pay for site reliability engineer in Springfield, VA is $66.58, according to ZipRecruiter salary data. Most workers in this role earn between $57.26 and $76.06 per hour, depending on experience, location, and employer.

Will SRE be replaced by AI?

Site Reliability Engineers (SREs) focus on maintaining system reliability, automation, and incident response, and AI tools are increasingly used to assist these tasks. While AI can automate routine processes, SREs' expertise in system design, troubleshooting, and decision-making remains essential, making complete replacement unlikely in the near future.

What Is a Site Reliability Engineer?

A site reliability engineer specializes in site reliability engineering, or SRE, a specific branch of operations first pioneered by Google. You are responsible for ensuring that when a website decides to scale a particular feature for various users to access, it does not break the underlying software or website functions. This means you need to use analytical problem-solving skills to determine how to make specific features on a new software release work on top of existing source code.

What engineers make $300,000 a year?

Senior-level engineers such as Site Reliability Engineers, Software Engineers, and Cloud Infrastructure Engineers can earn $300,000 or more annually, especially with extensive experience, specialized skills, and working at large tech companies or in high-cost-of-living areas. Compensation often includes base salary, bonuses, and stock options, with expertise in automation, cloud platforms, and monitoring tools being highly valued.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer, and why are they important?

To thrive as a Site Reliability Engineer, you need a strong background in computer science, systems administration, and software engineering, often supported by a degree in a technical field. Familiarity with cloud platforms (like AWS or GCP), container orchestration (such as Kubernetes), infrastructure as code (Terraform or Ansible), and monitoring tools (Prometheus, Grafana) is typically expected. Strong problem-solving skills, effective communication, and a proactive mindset help SREs excel at incident management and cross-functional collaboration. These skills are crucial for maintaining system reliability, minimizing downtime, and driving continuous improvement in complex technical environments.

Is SRE a stressful job?

Site Reliability Engineers (SREs) often work in high-pressure environments where they monitor system performance, troubleshoot outages, and ensure uptime. The role can involve on-call duties and incident response, which may contribute to stress, but it also offers opportunities for automation and process improvements to reduce workload. Overall, stress levels vary depending on the organization, team culture, and individual skills.

What are some of the most common challenges Site Reliability Engineers face when balancing system reliability with rapid software delivery?

Site Reliability Engineers (SREs) often navigate the challenge of maintaining highly reliable systems while supporting fast-paced software releases. This involves managing incidents, automating processes to reduce manual toil, and working closely with development teams to embed reliability into the software development lifecycle. SREs must carefully prioritize their efforts between proactive improvements and urgent, reactive fire-fighting. Effective communication and collaboration with both operations and development teams are crucial to ensuring service uptime without slowing down innovation.

What does a Site Reliability Engineer do?

A Site Reliability Engineer (SRE) is responsible for maintaining and improving the reliability, availability, and performance of software systems. They use automation, monitoring tools, and scripting to prevent outages and resolve issues quickly, often working closely with development teams to ensure scalable infrastructure. SREs typically have skills in systems engineering, coding, and cloud platforms, and may hold certifications like those in cloud services or DevOps practices.

What is the difference between Site Reliability Engineer vs DevOps Engineer?

AspectSite Reliability EngineerDevOps Engineer
CredentialsTypically requires a computer science degree, certifications like AWS, Google Cloud, or KubernetesSimilar credentials, often with cloud certifications and scripting skills
Work EnvironmentFocuses on maintaining and improving system reliability, often in large-scale production environmentsWorks on automation, CI/CD pipelines, and deployment processes across development and operations teams
Industry UsageCommon in tech, cloud services, and large-scale enterprise companiesWidely used in software development, cloud, and IT organizations

Both roles require strong technical skills and cloud knowledge, but SREs focus more on system reliability and uptime, while DevOps engineers emphasize automation and deployment processes. They often collaborate but have distinct primary responsibilities.

What is a Site Reliability Engineer?

A Site Reliability Engineer (SRE) is a professional who applies software engineering principles to infrastructure and operations problems. Their primary goal is to create scalable and highly reliable software systems, often bridging the gap between development and IT operations. SREs automate tasks, monitor system health, respond to incidents, and work to improve system reliability and performance. They also help define service level objectives (SLOs) and ensure systems meet customer expectations for uptime and availability.
What cities near Springfield, VA are hiring for Site Reliability Engineer jobs? Cities near Springfield, VA with the most Site Reliability Engineer job openings:

Site Reliability Engineer - Hybrid

Volitiion IIT

Reston, VA • On-site

$59.25 - $78.75/hr

Other

Posted 7 days ago


Job description

Title: Site Reliability Engineer V
Location: Reston, VA (Hybrid onsite - 3 days a week from day 1)
Assignment duration: 24 months with possibility of extension
Interview process: 2 rounds. First round would be a video interview. Second round would be an in-person interview
Manager's call notes
  • This is an SRE role. SRE is under a shared services team within Fannie Mae who works with different application teams. So, multi-tasking is required.
  • In technical terms, we need expertise with AWS ECS, EC2, RDS, RedShift, EMR, Lambda, Route53, Step Functions etc.
  • Programming experience in Java or Python is required. We are not looking for a full fledged developer but someone who can take the code and modify as needed to create some small automations.
  • Exposure to DevOps is required. GitLab, Terraform and Jenkins would be preferred.
  • Experience with Observability using tools such as AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry would be helpful.
  • If the candidate has experience in release engineering/production support/performance engineering would be a bonus. This is not a show stopped though.
  • AWS, programming and DevOps are must haves.
  • The candidate has to come to the office 3 days a week in Reston, VA.
  • The SRE at Fannie Mae doesn't work 24*7. They get scheduled on a rotation basis. 20% of their job is production support activities. 80% of the time, they work with application teams studying applications, ability to understand the architecture, give suggestions on how the application can be made better, looking into the resiliency patterns and see if the application is resilient enough and suggest new things and work with them. Observability too. Identify gaps and weak points and work with the architecture team to resolve them. Look into the code scans, alarms to see if they are good enough.
  • SRE may not get 100% access to all the applications but the expectation is identifying the gaps/weak points and tell the application team to fix it.
  • On call rotation schedule: One day a week every week.
  • AI/ML: We have certain machine learning projects which the SRE interacts with. So, AI/ML experience is a plus to have.
  • Previous Fannie Mae experience is a plus.
Job Description
Overall years of experience:
8+ years of related experience in their specific area with experience leading teams on projects with similar scope and complexity.
Bachelor's or master's degree in computer science or equivalent.
Certifications: AWS Solutions Architect, Agile Certified Practitioner (ACP), or relevant cloud certifications.
We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a strong background in cloud platforms, DevOps practices, and modern software development frameworks. The SRE will play a critical role in designing, building, and maintaining highly scalable, fault-tolerant, and secure cloud infrastructure while ensuring operational excellence, high availability, and reliability.
Key Responsibilities:
1. Cloud Infrastructure & Automation:
Design, implement, and manage cloud-based infrastructure using platforms like AWS, Azure, or GCP.
Utilize Infrastructure-as-Code (IaC) tools such as Terraform, CloudFormation, and Ansible to automate deployments and configurations.
Create robust automation targeted at anomaly detection, toil reduction, recovery processes, and self-healing mechanisms, and optimize cloud costs.
2. DevSecOps & CI/CD:
Deep understanding of DevSecOps principles and CI/CD pipelines using tools like GitLab, Jenkins, SonarQube, Nexus/Artifactory, and Docker.
Implement security best practices, including IAM roles, RBAC, vulnerability remediation, and SAST/DAST/SCA tools.
3. Observability & Incident Management:
Design and implement monitoring, logging, and distributed tracing solutions using tools like AWS CloudWatch, Splunk/SignalFX, Dynatrace, and OpenTelemetry.
Lead root cause analysis, blameless postmortems, and proactive incident management to minimize MTTR and MTTD.
Define and monitor SLOs, SLIs, and error budgets to ensure system reliability.
4. Microservices & API Management:
Architect and manage microservices, serverless computing, and RESTful APIs.
Ensure fault tolerance and resilience using design patterns like Circuit Breaker, Retry, Timeout, and Bulkhead.
5. Chaos Engineering & Resiliency:
Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit.
Perform resiliency assessments using Resilience Hub and implement self-healing solutions.
6. Database & Application Support:
Manage and optimize database technologies such as PostgreSQL, MongoDB, DynamoDB, Oracle, and Redshift.
Provide production support, including incident response, problem management, and runbook creation. Participate in on-call rotations.
7. Collaboration & Communication:
Collaborate with cross-functional teams to implement shift-left testing practices (BDD, TDD, Unit, Regression).
Create and maintain architecture diagrams, knowledge articles, and disaster recovery plans.
Communicate effectively with stakeholders and demonstrate strong relationship management skills.
Required Skills & Qualifications:
Expertise in cloud platforms (AWS, Azure, or GCP) and container orchestration.
Proficiency in programming/scripting languages such as Python, Java, Node.js, Bash, and PowerShell.
Strong knowledge of database technologies (e.g., PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift).
Experience with DevOps tools (Jenkins, Docker, Nexus/Artifactory) and build tools (Maven, Gradle).
Familiarity with AI/ML integrations, event-driven architectures, and distributed systems.
Expertise in observability, logging, and monitoring tools (AWS CloudWatch, Splunk, Dynatrace, OpenTelemetry).
Strong understanding of security practices, including IAM, RBAC, and vulnerability management.
Experience with chaos engineering, resiliency assessments, and disaster recovery planning.
Proficiency in performance testing tools (JMeter, LoadRunner) and capacity planning.
Excellent verbal and written communication skills, with the ability to collaborate across teams.
Preferred Qualifications:
Experience with AI/ML libraries (e.g., NLTK, Transformers, Spacy, SciPy), Amazon SageMaker, and GenAI tools.
Familiarity with project management tools like JIRA, Confluence, and ServiceNow.
Knowledge of utilities like AWS CLI, POSTMAN, and curl.

4 Reasons to Join Volitiion IIT, Inc.:
1. Our Commitment to You - We offer competitive pay, multi-year projects, and a list of exciting clients.
2. Work-Life Balance - We work hard; we work smart and have quality time for family and "life."
3. Our Mantra - We treat our consultants the way we want to be treated: with integrity, professionalism, and trust.
4. Career Development - We help you meet your career goals and continuously support your efforts to build your skillset.
Check out our Referral Program!
Volitiion IIT Inc will pay you up to $1000 for every qualified professional that you refer and we place. If you see a position posted by Volitiion IIT Inc. and know the perfect person for the job, please send us your referral.

Volitiion IIT Inc. is an Equal Opportunity/Affirmative Action Employer.