1

Site Reliability Engineer Manager Jobs (NOW HIRING)

Site Reliability Engineer (SRE)

Austin, TX · On-site

$56.50 - $75/hr

Site Reliability Engineer (SRE) Location: Austin, TX Job Type: Full Time Job Summary - Seasoned ... Highly skilled in managing production failures, conducting root cause analysis, and driving ...

Site Reliability Engineer

Chandler, AZ · Hybrid

$60.24 - $68.24/hr

This role combines senior hands-on engineering with leadership across incident management, problem ... Lead execution of SRE workstreams for canary automation, platform dashboards, alert tuning ...

Site Reliability Engineer (SRE)

Austin, TX · On-site

$56.50 - $75/hr

Austin, TX Job Type: Full Time Technical Skills: * 6+ years of professional engineering experience developing, managing, or supporting distributed systems * 4+ SRE experience managing multi-cloud ...

Site Reliability Engineer

Chicago, IL · On-site

$58.75 - $78/hr

W (flexible on other 2 days) Site Reliability Engineer - Northern Trust, Goals Driven Wealth Management We are searching for a candidate who has extensive experience in Site Reliability Engineering ...

Site Reliability Engineer

Charlotte, NC · On-site

$55.75 - $74/hr

This role sits within an established SRE community of practice (15+ engineers, mix of onsite and ... Hands-on experience with incident management and postmortems * Strong understanding of application ...

New

About the Role As a Site Reliability Engineer (SRE) at Mercor, you'll own production reliability across our most critical systems, partnering directly with infrastructure leadership. You'll play a ...

Key Responsibilities Observability Engineering • Design, scale, optimize, and manage Prometheus ... Site Reliability Engineering • Apply and evolve a n SRE Maturity Model to help teams mature ...

Site Reliability Engineer

Houston, TX · On-site

$54.50 - $72.25/hr

Site Reliability Engineer (SRE) Role Overview Looking for a Site Reliability Engineer (SRE) to ... Manage Dynatrace monitoring, including RUM and synthetic monitoring * Configure Adobe Analytics for ...

Site Reliability Engineer (SRE)

Omaha, NE · On-site

$54.50 - $72.50/hr

Site Reliability Engineer (SRE) Location: Omaha, NE / Dallas, TX Job Type: Full Time Job Summary ... Highly skilled in managing production failures, conducting root cause analysis, and driving ...

Site Reliability Engineer

Frederick, MD · On-site

$56.75 - $75.25/hr

With experts in biomedical science, software engineering, and program management, we focus on ... Transportation Reimbursement Account (TRN) The Site Reliability Engineer role centers on ...

Site Reliability Engineer

Irondale, AL · On-site

$48.25 - $64/hr

Site Reliability Engineer Site Reliability Engineer (SRE) Hybrid Opportunity | Enterprise Cloud ... Participate in system architecture, platform management, and capacity planning activities * Drive ...

next page

Showing results 1-20

Site Reliability Engineer Manager information

See salary details

$10

$63

$91

How much do site reliability engineer manager jobs pay per hour?

As of Jun 15, 2026, the average hourly pay for site reliability engineer manager in the United States is $63.74, according to ZipRecruiter salary data. Most workers in this role earn between $54.81 and $72.84 per hour, depending on experience, location, and employer.

Will AI replace SRE jobs?

AI is expected to augment Site Reliability Engineer (SRE) roles by automating routine tasks such as monitoring, incident response, and data analysis. However, SREs will continue to be essential for designing systems, managing complex issues, and making strategic decisions that require human judgment and expertise. The role is likely to evolve with AI tools rather than be fully replaced.

What is a Site Reliability Engineer Manager?

A Site Reliability Engineer (SRE) Manager oversees a team of site reliability engineers tasked with maintaining the reliability, scalability, and performance of software systems. Their role combines leadership and technical expertise, focusing on automating operations, managing incidents, and ensuring high availability of services. They work closely with engineering and operations teams to implement best practices in monitoring, incident response, and system design. SRE Managers also mentor their teams, set reliability goals, and help drive a culture of continuous improvement within the organization.

What engineers make $500,000?

Senior-level Site Reliability Engineers (SREs) with extensive experience, advanced skills in cloud infrastructure, automation, and monitoring tools can earn $500,000 or more annually, especially in high-cost-of-living areas or large tech companies. Achieving this level often requires specialized certifications, leadership responsibilities, and a strong track record of system reliability improvements.

How much do SRE managers make in the US?

Site Reliability Engineering (SRE) managers in the US typically earn between $130,000 and $180,000 annually, with senior roles and large tech companies offering higher compensation. Salaries can vary based on experience, location, and company size, and often include bonuses and stock options.

What is the role of site reliability engineer manager?

A Site Reliability Engineer Manager oversees a team responsible for maintaining the availability, performance, and reliability of large-scale systems and services. They coordinate incident response, implement automation, and collaborate with development teams to improve system resilience, often using tools like monitoring and alerting platforms. Strong leadership, technical expertise, and understanding of cloud infrastructure are essential for this role.

What is the difference between Site Reliability Engineer Manager vs Site Reliability Engineer?

AspectSite Reliability Engineer (SRE)Site Reliability Engineer Manager
ResponsibilitiesFocuses on designing, implementing, and maintaining reliable systems and automationOversees SRE teams, manages projects, and aligns reliability goals with business objectives
Required SkillsStrong coding, system design, and troubleshooting skillsLeadership, team management, strategic planning
CertificationsGoogle Cloud, AWS certifications, Linux, scriptingSame as SRE, plus management certifications (e.g., PMP) often preferred
Work EnvironmentTechnical, hands-on with systems and automationManagerial, coordinating teams and projects

The main difference is that a Site Reliability Engineer focuses on technical system reliability, while a Site Reliability Engineer Manager oversees teams and strategic initiatives to ensure reliability goals are met across projects.

How does a Site Reliability Engineer Manager typically balance technical leadership with team management responsibilities?

A Site Reliability Engineer Manager often splits their time between overseeing technical projects, such as system reliability improvements and incident response strategies, and managing the growth and well-being of their engineering team. This includes mentoring SREs, facilitating communication between teams, setting priorities, and ensuring that operational goals align with business objectives. Balancing these responsibilities requires strong organizational skills and a proactive approach to both technical challenges and people management. Successful managers regularly engage in hands-on problem-solving while also fostering a collaborative team environment.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer Manager, and why are they important?

To thrive as a Site Reliability Engineer Manager, you need expertise in systems engineering, incident management, and a strong background in software development or computer science, often supported by a bachelor’s degree or equivalent experience. Familiarity with cloud platforms (like AWS, GCP, or Azure), infrastructure as code tools (such as Terraform), monitoring systems (like Prometheus), and certifications in cloud or DevOps practices are highly valued. Strong leadership, effective communication, and problem-solving abilities help you guide teams and foster collaboration across departments. These skills and qualities ensure the stability, scalability, and reliability of critical systems while enabling teams to respond effectively to complex technical challenges.
What cities are hiring for Site Reliability Engineer Manager jobs? Cities with the most Site Reliability Engineer Manager job openings:
What are the most commonly searched types of Site Reliability Engineer jobs? The most popular types of Site Reliability Engineer jobs are:
What states have the most Site Reliability Engineer Manager jobs? States with the most job openings for Site Reliability Engineer Manager jobs include:
Infographic showing various Site Reliability Engineer Manager job openings in the United States as of June 2026, with employment types broken down into 1% Locum Tenens, 95% Full Time, 1% Part Time, and 3% Contract. Highlights an 87% Physical, 5% Hybrid, and 8% Remote job distribution, with an average salary of $132,583 per year, or $63.7 per hour.

Principal Site Reliability Engineer (SRE)

INFINITE CHOICE LLC

Dallas, TX • On-site

$180K - $210K/yr

Full-time

Posted 21 days ago


Job description

About the Role

We're seeking an exceptional Principal Site Reliability Engineer to architect, design, and build our SRE foundation from the ground up at InfiniteChoice. This is a rare greenfield opportunity to establish SRE practices, develop custom tooling, and create the reliability culture that will support our platform serving millions of users and billions in transaction volume.

As our Principal SRE, you'll combine deep technical expertise with strategic vision to build world-class monitoring, observability, and automation systems. You'll have the autonomy to define our SRE processes, select technologies, and create the framework that ensures our systems are reliable, scalable, and performant.

Location: Remote - US based

What You Will DoSRE Foundation & Process Development
  • Build SRE practices from scratch - define SLIs, SLOs, error budgets, and reliability metrics

  • Establish incident response procedures, on-call rotations, and post-mortem processes

  • Create reliability engineering standards and best practices across all engineering teams

  • Develop disaster recovery and business continuity strategies

  • Design and implement capacity planning and performance optimization frameworks

Architecture & Tool Development
  • Drive architecture decisions for comprehensive application and infrastructure monitoring solutions

  • Design and develop custom SRE tools for automated monitoring, alerting, and remediation

  • Build observability platforms that provide deep insights into system performance and user experience

  • Create automation frameworks for deployment, scaling, and incident response

  • Architect logging, metrics, and tracing systems for distributed microservices environments

Google Cloud Infrastructure Excellence
  • Leverage Google Cloud Platform services to build resilient, scalable infrastructure

  • Implement cloud-native monitoring using Stackdriver, Cloud Monitoring, and Cloud Logging

  • Design auto-scaling and self-healing systems using GKE, Cloud Functions, and managed services

  • Optimize cloud costs while maintaining high availability and performance standards

  • Establish security and compliance frameworks within GCP environments

Innovation & Continuous Improvement
  • Research and implement cutting-edge SRE tools and methodologies

  • Leverage AI and machine learning for predictive analytics, anomaly detection, and automated remediation

  • Create dashboards and reporting systems that provide actionable insights to engineering and business teams

  • Establish feedback loops for continuous improvement of reliability and performance

  • Stay current with industry best practices and emerging technologies in the SRE space

What You Must HaveSRE & Infrastructure Expertise
  • 12+ years of experience in Site Reliability Engineering or Infrastructure Engineering

  • 5+ years in lead SRE roles building and scaling SRE teams and processes

  • Proven track record designing and implementing monitoring and observability solutions at scale

  • Deep understanding of distributed systems, microservices architectures, and cloud-native patterns

  • Experience with infrastructure as code, configuration management, and deployment automation

Google Cloud Platform Proficiency
  • Hands-on experience with Google Cloud Platform is required

  • Expertise with GCP monitoring and observability stack (Cloud Monitoring, Cloud Logging, Cloud Trace)

  • Experience with GKE, Compute Engine, Cloud Functions, and other core GCP services

  • Knowledge of GCP networking, security, and compliance capabilities

  • Understanding of GCP cost optimization and resource management

Technical Skills
  • Strong programming skills in Python, Go, Java, or similar languages

  • Experience with monitoring tools (Prometheus, Grafana, Datadog, New Relic, or similar)

  • Proficiency with containerization (Docker, Kubernetes) and orchestration platforms

  • Knowledge of CI/CD pipelines, automated testing, and deployment strategies

  • Understanding of database performance tuning and optimization (SQL and NoSQL)

AI & Automation
  • Familiarity with AI-driven development tools and methodologies is a huge plus

  • Experience with machine learning for operations (AIOps), anomaly detection, or predictive analytics

  • Knowledge of automated incident response and self-healing systems

  • Understanding of AI/ML tools for log analysis, pattern recognition, and intelligent alerting

Problem-Solving & Mindset
  • Strong analytical and troubleshooting skills for complex distributed systems

  • Experience with high-pressure incident response and crisis management

  • Detail-oriented with commitment to operational excellence and continuous improvement

  • Comfortable with ambiguity and building processes in a fast-growing environment

  • Passion for reliability, automation, and engineering best practices

  • Demonstrated experience building SRE programs and processes from the ground up is a HUGE plus

Education
  • Bachelor's degree in Computer Science, Engineering, or equivalent professional experience

  • Industry certifications (Google Cloud Professional, SRE or related certifications preferred)

What We Offer
  • Ground-floor opportunity to build SRE practices and culture from scratch

  • Full autonomy to define processes, select technologies, and establish best practices

  • Direct impact on platform reliability serving millions of users

  • Opportunity to create lasting engineering culture and operational excellence

  • Remote-first culture with in-person meeting in Dallas, TX on need basis

  • Collaborative environment with smart, passionate engineers and cross-functional teams

  • Access to cutting-edge technologies and AI-driven development tools

  • Competitive compensation, equity participation, and comprehensive benefits

Ready to Build World-Class Reliability?

Join us in creating the SRE foundation that will power InfiniteChoice's next phase of growth. If you're passionate about reliability engineering, love building systems from scratch, and want to establish the operational excellence that scales with our business, we'd love to hear from you.

About InfiniteChoice

InfiniteChoice was founded to help people find the experiences they want simply and effortlessly. We leverage a new type of business model and platform that uniquely applies automation and technology to solve the challenges of scale and complexity in experience discovery.


Existing business and marketing technologies can no longer handle the demands of connecting millions of consumers with vast inventories of experiences across a fragmented, global marketplace of people, partners, and providers.


Our mission is to disrupt this status quo by creating seamless connections between consumers and experiences. We're just at the beginning of this journey, but our approach is working: we've helped over 275 million visitors connect to millions of experiences, generating over $2 billion in revenue for our brands and partners.