1

Site Reliability Engineer Manager Jobs in Raleigh, NC

Use IaC(InfrastructureasCode) and config management to standardize and automate provisioning ... HPC/SRE problems and their solutions. * Maintainer or comaintainer responsibilities for an open ...

Associate Site Reliability Engineer

Raleigh, NC · On-site +1

$92K - $147K/yr

Work within a small agile team to develop and improve SRE software, support peers, plan and self-improve. What You Will Bring: * Bachelor's degree (U.S. or foreign equivalent) in Computer Science ...

Associate Software Engineer, Site Reliability

Raleigh, NC · Hybrid

$55.50 - $73.75/hr

Associate Software Engineer, Site Reliability Engineering Relay is the Intelligent System of Action ... Manage and evolve infrastructure-as-code using Terraform, Ansible, and Packer on AWS. * Support and ...

Associate Software Engineer, Site Reliability

Raleigh, NC · Hybrid

$55.50 - $73.75/hr

Associate Software Engineer, Site Reliability Engineering Relay is the Intelligent System of Action ... Manage and evolve infrastructure-as-code using Terraform, Ansible, and Packer on AWS. * Support and ...

Associate Software Engineer, Site Reliability

Raleigh, NC · On-site

$55.50 - $73.75/hr

Associate Software Engineer, Site Reliability Engineering Relay is the Intelligent System of Action ... Manage and evolve infrastructure-as-code using Terraform, Ansible, and Packer on AWS. * Support and ...

... SRE best practices for effective resolution. * Document system knowledge as you acquire it, create runbooks, and ensure critical system information is readily accessible. Security Management: Stay ...

Lead DevOps Engineer

Raleigh, NC · On-site

$51.25 - $70.25/hr

... DevOps and SRE practices across the organization. Key Responsibilities Technical Leadership ... management, and vulnerabilityscanning Well experienced with all kind of scanning SCA, DAST, SAST ...

Reliability Engineer

Morrisville, NC · On-site

$95K - $120K/yr

The site offers comprehensive nasal development capabilities, including analytical support, device ... Create and manage detailed project plans to ensure timely implementation of equipment and fixtures ...

next page

Showing results 1-20

Site Reliability Engineer Manager information

See Raleigh, NC salary details

$10

$61

$89

How much do site reliability engineer manager jobs pay per hour?

As of Jun 16, 2026, the average hourly pay for site reliability engineer manager in Raleigh, NC is $61.96, according to ZipRecruiter salary data. Most workers in this role earn between $53.27 and $70.82 per hour, depending on experience, location, and employer.

Will AI replace SRE jobs?

AI is expected to augment Site Reliability Engineer (SRE) roles by automating routine tasks such as monitoring, incident response, and data analysis. However, SREs will continue to be essential for designing systems, managing complex issues, and making strategic decisions that require human judgment and expertise. The role is likely to evolve with AI tools rather than be fully replaced.

What is a Site Reliability Engineer Manager?

A Site Reliability Engineer (SRE) Manager oversees a team of site reliability engineers tasked with maintaining the reliability, scalability, and performance of software systems. Their role combines leadership and technical expertise, focusing on automating operations, managing incidents, and ensuring high availability of services. They work closely with engineering and operations teams to implement best practices in monitoring, incident response, and system design. SRE Managers also mentor their teams, set reliability goals, and help drive a culture of continuous improvement within the organization.

What engineers make $500,000?

Senior-level Site Reliability Engineers (SREs) with extensive experience, advanced skills in cloud infrastructure, automation, and monitoring tools can earn $500,000 or more annually, especially in high-cost-of-living areas or large tech companies. Achieving this level often requires specialized certifications, leadership responsibilities, and a strong track record of system reliability improvements.

How much do SRE managers make in the US?

Site Reliability Engineering (SRE) managers in the US typically earn between $130,000 and $180,000 annually, with senior roles and large tech companies offering higher compensation. Salaries can vary based on experience, location, and company size, and often include bonuses and stock options.

What is the role of site reliability engineer manager?

A Site Reliability Engineer Manager oversees a team responsible for maintaining the availability, performance, and reliability of large-scale systems and services. They coordinate incident response, implement automation, and collaborate with development teams to improve system resilience, often using tools like monitoring and alerting platforms. Strong leadership, technical expertise, and understanding of cloud infrastructure are essential for this role.

What is the difference between Site Reliability Engineer Manager vs Site Reliability Engineer?

AspectSite Reliability Engineer (SRE)Site Reliability Engineer Manager
ResponsibilitiesFocuses on designing, implementing, and maintaining reliable systems and automationOversees SRE teams, manages projects, and aligns reliability goals with business objectives
Required SkillsStrong coding, system design, and troubleshooting skillsLeadership, team management, strategic planning
CertificationsGoogle Cloud, AWS certifications, Linux, scriptingSame as SRE, plus management certifications (e.g., PMP) often preferred
Work EnvironmentTechnical, hands-on with systems and automationManagerial, coordinating teams and projects

The main difference is that a Site Reliability Engineer focuses on technical system reliability, while a Site Reliability Engineer Manager oversees teams and strategic initiatives to ensure reliability goals are met across projects.

How does a Site Reliability Engineer Manager typically balance technical leadership with team management responsibilities?

A Site Reliability Engineer Manager often splits their time between overseeing technical projects, such as system reliability improvements and incident response strategies, and managing the growth and well-being of their engineering team. This includes mentoring SREs, facilitating communication between teams, setting priorities, and ensuring that operational goals align with business objectives. Balancing these responsibilities requires strong organizational skills and a proactive approach to both technical challenges and people management. Successful managers regularly engage in hands-on problem-solving while also fostering a collaborative team environment.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer Manager, and why are they important?

To thrive as a Site Reliability Engineer Manager, you need expertise in systems engineering, incident management, and a strong background in software development or computer science, often supported by a bachelor’s degree or equivalent experience. Familiarity with cloud platforms (like AWS, GCP, or Azure), infrastructure as code tools (such as Terraform), monitoring systems (like Prometheus), and certifications in cloud or DevOps practices are highly valued. Strong leadership, effective communication, and problem-solving abilities help you guide teams and foster collaboration across departments. These skills and qualities ensure the stability, scalability, and reliability of critical systems while enabling teams to respond effectively to complex technical challenges.
What are the most commonly searched types of Site Reliability Engineer jobs in Raleigh, NC? The most popular types of Site Reliability Engineer jobs in Raleigh, NC are:
What cities near Raleigh, NC are hiring for Site Reliability Engineer Manager jobs? Cities near Raleigh, NC with the most Site Reliability Engineer Manager job openings:
Infographic showing various Site Reliability Engineer Manager job openings in Raleigh, NC as of June 2026, with employment types broken down into 75% Full Time, and 25% Contract. Highlights an 87% In-person, and 13% Remote job distribution, with an average salary of $128,882 per year, or $62 per hour.
Senior Site Reliability Engineer - HPC

Senior Site Reliability Engineer - HPC

Nvidia

Durham, NC

$55 - $73.25/hr

Full-time

Posted 19 days ago


Job description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is looking for phenomenal people like you to help us accelerate the next wave of artificial intelligence.

We're looking for a Senior SRE to join our Compute Farm team and help build the next generation of our global services platform. At NVIDIA, you'll keep critically important systems running while working on the technologies that are redefining computing. You'll harness the power of AI to deliver groundbreaking solutions to some of the world's toughest problems-and see your work have real, lasting impact!

What you'll be doing:

  • Own SRE solutions endtoend, from design and implementation to operation and continuous improvement, ensuring they integrate cleanly with HPC schedulers, storage, and network fabrics.

  • Use IaC(InfrastructureasCode) and config management to standardize and automate provisioning everywhere.

  • Deliver solutions in a globally distributed, multicloud hybrid environment - Onprem, AWS, GCP, and OCI.

  • Design for failure with redundancy, failure domains, progressive delivery, and strict change control.

  • Ensure the highest level of uptime and Quality of Service (QoS) for internal customers through operational excellence.

  • Conduct capacity management and planning to meet ongoing operational needs.

  • Detects performance issues and recommends solutions to maintain worldclass service quality.

  • Collaborate with various teams in a fastpaced environment to ensure seamless project completion.

  • Participate in on-call, incident reviews, assist in root cause identification, and produce high-quality RCA reports.

What we need to see:

  • B.S. degree in Computer Science or related technical field (or equivalent experience) with 5+ years professional experience building and supporting critical services.

  • Experience supporting largescale HPC clusters using Slurm, LSF or Kubernetes clusters, including setup, tuning, and troubleshooting.

  • Proficiency in modern CI/CD techniques, and Infrastructure as Code (IaC) for managing services.

  • Strong experience crafting large-scale infrastructure platforms for automated host lifecycle management, fleet reliability/auto-healing, E2E observability or data-driven operations (AIOps/ML-driven signals) that materially reduce manual intervention.

  • Proficient in monitoring, metrics, container management, and log collection tools.

  • 5+ years of coding/scripting experience in at least two highlevel programming languages such as Python, Go, Perl, or Ruby.

  • Mentored other engineers and influenced technical direction through design reviews, architecture documents, and strong partnership with product and leadership.

  • Creative problem solver with excellent debugging skills and strong communication and documentation abilities.

Ways to stand out from the crowd:

  • Published technical writeups or talks (conference presentations, meetups, engineering blogs) that deepdive into realworld reliability, observability, or largescale HPC/SRE problems and their solutions.

  • Maintainer or comaintainer responsibilities for an open source component used in production (plugins, operators, exporters, controllers, or SDKs) at large scale.

Widely considered to be one of the technology world's most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most brilliant and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 19, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993