1

Site Reliability Engineer Manager Jobs in Bothell, WA

We're looking for an experienced Site Reliability Engineer (SRE) to join our infrastructure team ... Incident Management : Lead on-call rotations, troubleshoot production issues, conduct blameless ...

Senior Site Reliability Engineer I

Seattle, WA · On-site

$64.75 - $86.25/hr

Your Impact As a Senior SRE on the APX SRE CloudOps team, you will design and build the cloud ... Proficiency in one or more managed languages (Go, Python, Java, C#) in the context of building ...

Senior Site Reliability Engineer I

Seattle, WA · On-site

$64.75 - $86.25/hr

Your Impact As a Senior SRE on the APX SRE CloudOps team, you will design and build the cloud ... Proficiency in one or more managed languages (Go, Python, Java, C#) in the context of building ...

Work in close collaboration with SRE team members and Engineering organizations based in California, Paris, Nantong, Singapore, Australia and others. About you: * Academic experience or projects in ...

SRE Architect, AI-Powered Reliability

Seattle, WA · On-site

$64.75 - $86.25/hr

Our SRE practice is in its early stages, and the decisions made now will define how we build ... Incident Management * Establish the enterprise incident management framework: severity definitions ...

Sr. Site Reliability Engineer

Seattle, WA · On-site

$64.75 - $86.25/hr

PitchBook, a Morningstar company, is looking for a Sr. Site Reliability Engineer to join their Product and Engineering team. The role involves creating and evolving systems to ensure the reliability ...

Our SRE practice is in its early stages, and the decisions made now will define how we build ... Incident Management * Establish the enterprise incident management framework: severity definitions ...

Sr. Site Reliability Engineer

Seattle, WA · On-site

$64.75 - $86.25/hr

PitchBook, a Morningstar company, is seeking a Sr. Site Reliability Engineer to enhance their ... in available cloud-managed services (PaaS/SaaS/IaaS), libraries, frameworks, and platforms ...

next page

Showing results 1-20

Site Reliability Engineer Manager information

See Bothell, WA salary details

$12

$72

$104

How much do site reliability engineer manager jobs pay per hour?

As of Jun 11, 2026, the average hourly pay for site reliability engineer manager in Bothell, WA is $72.46, according to ZipRecruiter salary data. Most workers in this role earn between $62.31 and $82.79 per hour, depending on experience, location, and employer.

Will AI replace SRE jobs?

AI is expected to augment Site Reliability Engineer (SRE) roles by automating routine tasks such as monitoring, incident response, and data analysis. However, SREs will continue to be essential for designing systems, managing complex issues, and making strategic decisions that require human judgment and expertise. The role is likely to evolve with AI tools rather than be fully replaced.

What is a Site Reliability Engineer Manager?

A Site Reliability Engineer (SRE) Manager oversees a team of site reliability engineers tasked with maintaining the reliability, scalability, and performance of software systems. Their role combines leadership and technical expertise, focusing on automating operations, managing incidents, and ensuring high availability of services. They work closely with engineering and operations teams to implement best practices in monitoring, incident response, and system design. SRE Managers also mentor their teams, set reliability goals, and help drive a culture of continuous improvement within the organization.

What engineers make $500,000?

Senior-level Site Reliability Engineers (SREs) with extensive experience, advanced skills in cloud infrastructure, automation, and monitoring tools can earn $500,000 or more annually, especially in high-cost-of-living areas or large tech companies. Achieving this level often requires specialized certifications, leadership responsibilities, and a strong track record of system reliability improvements.

How much do SRE managers make in the US?

Site Reliability Engineering (SRE) managers in the US typically earn between $130,000 and $180,000 annually, with senior roles and large tech companies offering higher compensation. Salaries can vary based on experience, location, and company size, and often include bonuses and stock options.

What is the role of site reliability engineer manager?

A Site Reliability Engineer Manager oversees a team responsible for maintaining the availability, performance, and reliability of large-scale systems and services. They coordinate incident response, implement automation, and collaborate with development teams to improve system resilience, often using tools like monitoring and alerting platforms. Strong leadership, technical expertise, and understanding of cloud infrastructure are essential for this role.

What is the difference between Site Reliability Engineer Manager vs Site Reliability Engineer?

AspectSite Reliability Engineer (SRE)Site Reliability Engineer Manager
ResponsibilitiesFocuses on designing, implementing, and maintaining reliable systems and automationOversees SRE teams, manages projects, and aligns reliability goals with business objectives
Required SkillsStrong coding, system design, and troubleshooting skillsLeadership, team management, strategic planning
CertificationsGoogle Cloud, AWS certifications, Linux, scriptingSame as SRE, plus management certifications (e.g., PMP) often preferred
Work EnvironmentTechnical, hands-on with systems and automationManagerial, coordinating teams and projects

The main difference is that a Site Reliability Engineer focuses on technical system reliability, while a Site Reliability Engineer Manager oversees teams and strategic initiatives to ensure reliability goals are met across projects.

How does a Site Reliability Engineer Manager typically balance technical leadership with team management responsibilities?

A Site Reliability Engineer Manager often splits their time between overseeing technical projects, such as system reliability improvements and incident response strategies, and managing the growth and well-being of their engineering team. This includes mentoring SREs, facilitating communication between teams, setting priorities, and ensuring that operational goals align with business objectives. Balancing these responsibilities requires strong organizational skills and a proactive approach to both technical challenges and people management. Successful managers regularly engage in hands-on problem-solving while also fostering a collaborative team environment.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer Manager, and why are they important?

To thrive as a Site Reliability Engineer Manager, you need expertise in systems engineering, incident management, and a strong background in software development or computer science, often supported by a bachelor’s degree or equivalent experience. Familiarity with cloud platforms (like AWS, GCP, or Azure), infrastructure as code tools (such as Terraform), monitoring systems (like Prometheus), and certifications in cloud or DevOps practices are highly valued. Strong leadership, effective communication, and problem-solving abilities help you guide teams and foster collaboration across departments. These skills and qualities ensure the stability, scalability, and reliability of critical systems while enabling teams to respond effectively to complex technical challenges.
What cities near Bothell, WA are hiring for Site Reliability Engineer Manager jobs? Cities near Bothell, WA with the most Site Reliability Engineer Manager job openings:
MTS - Site Reliability Engineer

MTS - Site Reliability Engineer

Microsoft

Redmond, WA • On-site

$188K - $304K/yr

Full-time

Posted 24 days ago


Microsoft rating

8.6

Company rating: 8.6 out of 10

Based on 125 frontline employees who took The Breakroom Quiz

48th of 188 rated software companies


Job description

Overview
As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad - to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It's also inclusive: we aim to make AI accessible to all - consumers, businesses, developers - so that everyone can realize its benefits.
We're looking for an experienced Site Reliability Engineer (SRE) to join our infrastructure team. In this role, you'll blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You'll work closely with ML researchers, data engineers, and product developers to design and operate the platforms that power training, fine-tuning, and serving generative AI models.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
  • Reliability & Availability: Ensure uptime, resiliency, and fault tolerance of AI model training and inference systems.
  • Observability: Design and maintain monitoring, alerting, and logging systems to provide real-time visibility into model serving pipelines and infra.
  • Performance Optimization: Analyze system performance and scalability, optimize resource utilization (compute, GPU clusters, storage, networking).
  • Automation & Tooling: Build automation for deployments, incident response, scaling, and failover in hybrid cloud/on-prem CPU+GPU environments.
  • Incident Management: Lead on-call rotations, troubleshoot production issues, conduct blameless postmortems, and drive continuous improvements.
  • Security & Compliance: Ensure data privacy, compliance, and secure operations across model training and serving environments.
  • Collaboration: Partner with ML engineers and platform teams to improve developer experience and accelerate research-to-production workflows.

Qualifications
Required Qualifications
  • 4+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.

Preferred Qualifications
  • Strong proficiency in Kubernetes, Docker, and container orchestration.
  • Knowledge of CI/CD pipelines for Inference and ML model deployment.
  • Hands-on experience with public cloud platforms like Azure/AWS/GCP and infrastructure-as-code.
  • Expertise in monitoring & observability tools (Grafana, Datadog, OpenTelemetry, etc.).
  • Strong programming/scripting skills in Python, Go, or Bash.
  • Solid knowledge of distributed systems, networking, and storage.
  • Experience running large-scale GPU clusters for ML/AI workloads (preferred).
  • Familiarity with ML training/inference pipelines.
  • Experience with high-performance computing (HPC) and workload schedulers ( Kubernetes operators).
  • Background in capacity planning & cost optimization for GPU-heavy environments.
  • Work on cutting-edge infrastructure that powers the future of Generative AI.
  • Collaborate with world-class researchers and engineers.
  • Impact millions of users through reliable and responsible AI deployments.
  • Competitive compensation, equity options, and comprehensive benefits.

Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800.00 - $234,700.00 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $160,200.00 - $261,000.00 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $142,800.00 - $274,800.00 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000.00 - $304,200.00 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

What Microsoft employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Microsoft logo

About Microsoft

Sourced by ZipRecruiter

Our infrastructure is comprised of a large global portfolio of more than 100 datacenters and 1 million servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide. With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider.

Industry

Computer and computer peripheral equipment and software wholesalers

Company size

10,000+ Employees

Headquarters location

Redmond, WA, US

Year founded

1975

Social media