1

Sr Reliability Engineer Jobs in Raleigh, NC (NOW HIRING)

Senior ML Platform Engineer

Durham, NC ยท On-site

$101K - $138K/yr

They are seeking a Senior ML Platform Engineer to architect, build, and scale high-performance ML ... E principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack ...

Senior ML Platform Engineer

Durham, NC ยท On-site

$101K - $138K/yr

Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack, ensuring high availability and performance for critical AI workloads. * Develop robust ...

Senior Platform Engineer

Apex, NC ยท On-site +1

$80K - $109K/yr

Senior Platform Engineer (AWS / Kubernetes) Remote (United States) Security Journey is hiring a ... Improve reliability, scalability, and performance of systems supporting our learning platform.

Senior Software Engineer

Raleigh, NC ยท On-site

$119K - $157K/yr

Red Hat is seeking a Senior Software Engineer to join the CI/CD and delivery engineering team ... You are accountable for the health and reliability of the delivery path the same way a platform ...

Senior Database Engineer

Raleigh, NC ยท Remote

$130K - $155K/yr

About the Opportunity We are looking for a highly skilled Senior D atabase Reliability Engineer ... You'll work closely with SRE, Platform, and Engineering teams to ensure performance, reliability ...

next page

Showing results 1-20

Sr Reliability Engineer information

See Raleigh, NC salary details

$20

$62

$89

How much do sr reliability engineer jobs pay per hour?

As of Jun 11, 2026, the average hourly pay for sr reliability engineer in Raleigh, NC is $62.61, according to ZipRecruiter salary data. Most workers in this role earn between $51.63 and $75.00 per hour, depending on experience, location, and employer.

What is the difference between Sr Reliability Engineer vs Reliability Engineer?

AspectSr Reliability EngineerReliability Engineer
CredentialsBachelor's or higher in engineering, certifications like CRE or Six Sigma often preferredBachelor's degree in engineering or related field, similar certifications
Work EnvironmentTypically in manufacturing, energy, or tech industries focusing on system reliability and failure analysisSimilar industries, focusing on designing, testing, and improving product or system reliability
Employer UsageUsed in companies seeking experienced engineers to lead reliability projectsUsed for entry to mid-level roles focused on reliability assessments

The main difference is experience level and responsibility. Sr Reliability Engineers often lead projects and have more advanced certifications, while Reliability Engineers focus on supporting reliability tasks. Both roles require similar credentials and work in comparable environments, but the senior role involves more leadership and strategic planning.

How does a Sr Reliability Engineer typically collaborate with cross-functional teams to improve system reliability?

As a Sr Reliability Engineer, you will regularly work alongside operations, development, and QA teams to identify potential reliability risks and implement solutions. This often involves facilitating root cause analyses after incidents, sharing best practices, and leading reliability-focused design reviews. Effective communication and the ability to translate complex technical findings into actionable recommendations are key, as your insights directly influence infrastructure and product decisions. This collaborative approach helps foster a culture of reliability across the organization.

What are the key skills and qualifications needed to thrive as a Sr Reliability Engineer, and why are they important?

To thrive as a Sr Reliability Engineer, you need expertise in reliability engineering principles, root cause analysis, and a relevant engineering degree, often with several years of industry experience. Familiarity with reliability software (such as ReliaSoft or Minitab), maintenance management systems, and certifications like Certified Reliability Engineer (CRE) are commonly required. Strong problem-solving abilities, proactive communication, and leadership skills help you drive reliability initiatives and collaborate across teams. These competencies are vital to ensure equipment uptime, reduce failures, and improve operational efficiency in complex industrial environments.

What does a Sr Reliability Engineer do?

A Sr Reliability Engineer is responsible for ensuring that products, systems, or processes operate reliably and efficiently over their expected lifecycle. They analyze failure data, develop reliability test plans, and implement strategies to predict and prevent failures. Their role often involves collaborating with design, manufacturing, and maintenance teams to improve product quality and reduce downtime. Additionally, they may use reliability modeling tools and statistical techniques to assess risk and recommend improvements. This position typically requires advanced engineering knowledge and experience in reliability engineering principles.
What are popular job titles related to Sr Reliability Engineer jobs in Raleigh, NC? For Sr Reliability Engineer jobs in Raleigh, NC, the most frequently searched job titles are:
What job categories do people searching Sr Reliability Engineer jobs in Raleigh, NC look for? The top searched job categories for Sr Reliability Engineer jobs in Raleigh, NC are:
What cities near Raleigh, NC are hiring for Sr Reliability Engineer jobs? Cities near Raleigh, NC with the most Sr Reliability Engineer job openings:
Infographic showing various Sr Reliability Engineer job openings in Raleigh, NC as of June 2026, with employment types broken down into 1% As Needed, 95% Full Time, 1% Part Time, and 3% Contract. Highlights an 87% Physical, 5% Hybrid, and 8% Remote job distribution, with an average salary of $130,233 per year, or $62.6 per hour.
Senior ML Platform Engineer

Senior ML Platform Engineer

NVIDIA

Durham, NC โ€ข On-site

$101K - $138K/yr

Full-time

Posted 5 days ago


Job description

Job Summary:
NVIDIA is at the forefront of innovations in Artificial Intelligence, High-Performance Computing, and Visualization. They are seeking a Senior ML Platform Engineer to architect, build, and scale high-performance ML infrastructure, ensuring reliable platforms for scientists and engineers to train and deploy advanced ML models.
Responsibilities:
โ€ข Design, build, and maintain our core ML platform infrastructure as code, primarily using Ansible and Terraform, ensuring reproducibility and scalability across large-scale, distributed GPU clusters.
โ€ข Apply SRE principles to diagnose, troubleshoot, and resolve complex system issues across the entire stack, ensuring high availability and performance for critical AI workloads.
โ€ข Develop robust internal automation and tooling for ML workflow orchestration, resource scheduling, and platform operations, with a strong focus on software engineering best practices.
โ€ข Collaborate with ML researchers and applied scientists to understand infrastructure needs and build solutions that streamline their end-to-end experimentation.
โ€ข Evolve and operate our multi-cloud and hybrid (on-prem + cloud) environments, implementing monitoring, alerting, and incident response protocols.
โ€ข Participate in on-call rotation to provide support for platform services and infrastructure running critical ML jobs, driving root cause analysis and implementing preventative measures.
โ€ข Write high-quality, maintainable code (Python, Go) to contribute to the core orchestration platform and automate manual processes.
โ€ข Drive the adoption of modern GPU technologies and ensure smooth integration of next-generation hardware into ML pipelines (e.g., GB200, NVLink, etc.).
Qualifications:
Required:
โ€ข BS/MS in Computer Science, Engineering, or equivalent experience.
โ€ข 5+ years in software/platform engineering or SRE roles, including 3+ years focused on ML infrastructure or distributed compute systems.
โ€ข Strong proficiency in Infrastructure-as-Code (IaC) tools, specifically Ansible and Terraform, with a proven track record of building and managing production infrastructure.
โ€ข SRE-oriented mindset with extensive experience in diagnosing system-level issues, performance tuning, and ensuring platform reliability.
โ€ข Solid understanding of ML workflows and lifecycleโ€”from data preprocessing to deployment.
โ€ข Proficiency in operating containerized workloads with Kubernetes and Docker.
โ€ข Strong software engineering skills in languages such as Python or Go, with a focus on automation, tooling, and writing production-grade code.
โ€ข Experience with Linux systems internals, networking, and performance tuning at scale.
Preferred:
โ€ข Experience building or operating ML platforms supporting frameworks like PyTorch or TensorFlow at scale.
โ€ข Deep understanding of distributed training techniques (e.g., data/model parallelism, Horovod, NCCL).
โ€ข Expertise with modern CI/CD methodologies and GitOps practices.
โ€ข Passion for building developer-centric platforms with great UX and strong operational reliability.
โ€ข Proven ability to contribute code to complex orchestration or automation platforms.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993