1

Director Site Reliability Engineering Jobs in Georgia

Are you excited to lead Site Reliability Engineering teams that keep mission-critical, 24/7 services running reliably and securely? Do you enjoy building automated cloud platforms, hardening security ...

SRE Lead

Alpharetta, GA · On-site

$55.75 - $74/hr

Senior / Lead Role Overview We are seeking an experienced Site Reliability Engineering (SRE) Lead to own and drive the reliability, scalability, and operational excellence of cloud-native platforms.

Site Reliability Engineer - SRE

Atlanta, GA

$54.25 - $72/hr

Hands on experience in Site Reliability Engineering and solving problems through automation and instrumentation * Experience with Jenkins for CI/CD pipleine creation and CI/CD automation * Experience ...

Site Reliability Engineer - SRE

Atlanta, GA · On-site +1

$54.25 - $72/hr

Hands on experience in Site Reliability Engineering and solving problems through automation and instrumentation * Experience with Jenkins for CI/CD pipleine creation and CI/CD automation * Experience ...

Site Reliability engineer (SRE)

Atlanta, GA · Hybrid

$54.75 - $72.75/hr

Site Reliability engineer(SRE) Location: Atlanta, GA ( Hybrid - 3days Office - 2 days WFH) Duration: C2H : Dynatrace App dynamics ACI (Advanced Computing International) is a Global Technology ...

Site Reliability Engineer

Atlanta, GA · On-site +1

$100K - $120K/yr

Overview The Site Reliability Engineer is a key force behind improving Origami's time to resolution ... Provides an actionable feedback loop to Observability and Engineering teams toward improving MELT ...

next page

Showing results 1-20

Director Site Reliability Engineering information

What is a Director Site Reliability Engineering job?

A Director of Site Reliability Engineering (SRE) leads teams responsible for ensuring the availability, performance, and scalability of software systems. They define reliability best practices, drive automation, and collaborate with engineering and product teams to improve system resilience. This role requires strong leadership, technical expertise, and a focus on balancing innovation with operational stability.

What are the key skills and qualifications needed to thrive in the Director Site Reliability Engineering position, and why are they important?

To thrive as a Director Site Reliability Engineering, you need extensive experience in software engineering, infrastructure management, incident response, and people leadership, often supported by a degree in computer science or a related field. Familiarity with cloud platforms (such as AWS, GCP, or Azure), automation tools (Terraform, Ansible), monitoring systems (Prometheus, Datadog), and relevant certifications like CKA or AWS Solutions Architect is valued. Outstanding communication, stakeholder management, and strategic vision are key soft skills that set leaders apart in this role. These abilities ensure the reliability, scalability, and efficiency of critical systems while effectively guiding and motivating technical teams.

What are the main challenges faced by a Director of Site Reliability Engineering, and how can I prepare for them?

A Director of Site Reliability Engineering often encounters challenges such as balancing rapid feature delivery with system stability, managing complex incident responses, and fostering a culture of continuous improvement. Additionally, aligning reliability goals with business objectives and securing cross-functional buy-in can be demanding. To prepare, it is helpful to gain experience in high-scale system management, develop strong leadership and communication abilities, and cultivate a proactive approach to risk management and automation. Staying up to date with the latest SRE practices and building relationships with both engineering and business teams will also support your success in this pivotal role.
What are popular job titles related to Director Site Reliability Engineering jobs in Georgia? For Director Site Reliability Engineering jobs in Georgia, the most frequently searched job titles are:
What job categories do people searching Director Site Reliability Engineering jobs in Georgia look for? The top searched job categories for Director Site Reliability Engineering jobs in Georgia are:
What cities in Georgia are hiring for Director Site Reliability Engineering jobs? Cities in Georgia with the most Director Site Reliability Engineering job openings:
Infographic showing various Director Site Reliability Engineering job openings in Georgia as of May 2026, with employment types broken down into 89% Full Time, 7% Part Time, and 4% Contract. Highlights an 92% Physical, 4% Hybrid, and 4% Remote job distribution.
Site Reliability Engineering (SRE) Architect

Site Reliability Engineering (SRE) Architect

AceStack LLC

Atlanta, GA • On-site

$54.75 - $72.75/hr

Full-time

This job post has expired today. Applications are no longer accepted.


Job description

Role: Site Reliability Engineering (SRE) Architect
Location: Atlanta, GA (Hybrid on-site)
Contract
Role Summary:
As an SRE Architect, you will be a pivotal technical leader responsible for designing, building, and evolving the foundational systems and practices that ensure the reliability, scalability, performance, and efficiency of our critical services. Moving beyond day-to-day operations, you will focus on the strategic architectural direction of SRE function, defining standards, blueprints, and frameworks that enable development teams and fellow SRE operations team to build and operate highly resilient systems. Leverage deep expertise in software engineering, distributed systems, cloud infrastructure, and SRE principles to influence technology choices, establish best practices, and foster a proactive culture of reliability across the organization and much beyond observability pillar.
Key Responsibilities:
  • Reliability Strategy & Design:
    • Architect and design highly available, scalable, secure, and cost-effective infrastructure and application patterns on AWS
    • Define and evangelize SRE best practices, standards, and blueprints for service design, deployment, monitoring, and operational readiness across the engineering organization
    • Review current observability implementation to identify gaps and define steps to reach next level maturity of observability setup to provide deep insights into system health and behaviour
    • With overall maturity lead the definition and implementation strategy for Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets for critical services
  • Platform Architecture & Automation:
    • Design solutions to systematically reduce operational toil through automation and improved system design
    • Evaluate current SRE tools and automation frameworks (e.g., CI/CD pipelines, Infrastructure as Code modules, automated incident remediation, chaos engineering platforms) and suggest enhancement that will help overall enhancement of capability
    • Evaluate, prototype, and recommend new technologies, tools, and methodologies to enhance system reliability, developer productivity, and operational efficiency
  • Technical Leadership & Consultation:
    • Act as a senior technical advisor and subject matter expert on reliability, scalability, and performance for development and platform teams
    • Provide architectural guidance during the design phase of new services and features to ensure reliability principles are embedded early (shift-left)
    • Mentor and coach other SREs and engineers, fostering technical excellence and adherence to SRE principles
    • Lead architectural reviews and production readiness assessments for critical systems
  • Resilience:
    • Lead blameless postmortems for significant incidents, ensuring root causes are identified and systemic architectural improvements are prioritized and implemented
    • Architect and advocate for resilience patterns (e.g., circuit breaking, rate limiting, graceful degradation, chaos engineering) within applications and infrastructure

Required Qualifications:
  • Proven experience in an architectural role, designing solutions for reliability, scalability, and performance
  • Deep understanding and practical application of SRE principles (SLIs/SLOs, error budgets, toil reduction, automation, incident management, postmortems)
  • Expertise in cloud computing platforms (e.g., AWS) including infrastructure, networking, and security services
  • Strong experience with containerization and orchestration technologies (Kubernetes, Docker, serverless computing)
  • Solid experience designing and implementing observability solutions (e.g., Dynatrace, Prometheus, Grafana, ELK/EFK Stack, Jaeger, OpenTelemetry)
  • Strong programming/scripting skills (e.g., Python, Go, Bash) for automation and tool development
  • Excellent analytical, problem-solving, and strategic thinking skills.
  • Strong communication, collaboration, and leadership skills with the ability to influence technical direction across teams
Preferred Qualifications:
  • Experience designing and implementing chaos engineering practices and platforms

AceStack logo

About AceStack

Sourced by ZipRecruiter

AceStack is a global IT consulting & Staffing agency. We deal in Health care (Nursing, Allied, Clinical/Non-clinical) Staffing, Engineering Staffing & I.T. Staffing. Founded in 2017 in New Jersey, AceStack has reported consistent growth and profit every year and carries zero debt. AceStack consultants are placed across USA, Canada, Mexico, and Asia. In addition to our headquarters in New Jersey – USA, we also have offices in Canada, Noida, and Ahmedabad. AceStack’s exceptionally high-touch service keeps our clients satisfied and our Consultants/Travelers engaged. We believe in investing in our Consultants/Clients in a variety of ways. We employ AceStack ambassador who helps guide Consultants through the on-boarding process and ensure the transition into their new role with our Client is seamless. We also have dedicated Consultant care representatives located throughout our organization who provide the same level of attention throughout our Consultant’s tenure. Due to this level of attention and care, AceStack enjoys not only one of the highest retention rates in the staffing industry but also one of the highest redeployment rates in the industry.

Company size

51 - 200 Employees

Headquarters location

NJ, US

Year founded

2017

Social media