$57 - $75.75/hr
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and ... Much of our software development focuses on eliminating manual work through automation, performance ...
$57 - $75.75/hr
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and ... Much of our software development focuses on eliminating manual work through automation, performance ...
$57 - $75.75/hr
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and ... Much of our software development focuses on eliminating manual work through automation, performance ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
$57 - $75.75/hr
Hands-on experience building the SRE function from scratch and had complete ownership. * Experience with a multi-site company where PaaS or microservices are required. * CI/CD pipeline ownership in ...
OR · Remote
$57 - $75.75/hr
... central Site Reliability Engineering team. You will be responsible for building and leading ... You will be leveraging your software engineering expertise to develop software platforms and tools ...
OR · Remote
$57 - $75.75/hr
... central Site Reliability Engineering team. You will be responsible for building and leading ... You will be leveraging your software engineering expertise to develop software platforms and tools ...
$57 - $75.75/hr
Applicants with SRE or equivalent experience are encouraged. What you will be doing: You will build and deploy sophisticated AI-powered tools and products. These tools support the operation and ...
$101K - $161K/yr
We leverage the latest advancements in cloud computing, artificial intelligence, and software ... Who You'll Work With We're looking for Site Reliability Engineers to join Arista's FedRAMP ...
$101K - $161K/yr
We leverage the latest advancements in cloud computing, artificial intelligence, and software ... Who You'll Work With We're looking for Site Reliability Engineers to join Arista's FedRAMP ...
$57 - $75.75/hr
POSITION SUMMARY The ideal candidate will have 7+ years of experience in Linux systems and software ... Measuring and achieving reliability through engineering and operations automation. * Monitoring and ...
OR · On-site
$89K - $148K/yr
Collaborate with team members and cross-departmental partners to establish and maintain SRE ... Minimum Four(4) to Eight (8) years of experience in IT administration, software engineering, or ...
New
OR · On-site
$89K - $148K/yr
Collaborate with team members and cross-departmental partners to establish and maintain SRE ... Minimum Four(4) to Eight (8) years of experience in IT administration, software engineering, or ...
New
While experience in roles like Software Engineer, SRE, Systems Engineer, or DevOps is valuable, we care most about your problem-solving skills and mindset. If you enjoy tackling complex challenges ...
While experience in roles like Software Engineer, SRE, Systems Engineer, or DevOps is valuable, we care most about your problem-solving skills and mindset. If you enjoy tackling complex challenges ...
$57 - $75.75/hr
Bachelor's degree in Computer Science, Software Engineering, or a related field. * Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role ...
$57 - $75.75/hr
Bachelor's degree in Computer Science, Software Engineering, or a related field. * Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role ...
OR · Hybrid
Zocdoc is looking for a Senior Site Reliability Engineer to help develop, monitor, and maintain our distributed production systems. You'll be challenged with building frameworks and processes for ...
$57 - $75.75/hr
We're looking for a Lead SRE to own reliability outcomes for a modern split-plane, multi-region ... This position involves access to software/technology that is subject to U.S. export controls. Any ...
... (SRE) practices. What you'll be doing: * Develop and manage software for hands-off datacenter ... provisioning and lifecycle management, including rack installation, bare-metal networking ...
OR · On-site
$57 - $75.75/hr
Establish and drive adoption of SRE best practices (SLOs, SLIs, error budgets, reliability engineering standards) Cross-Team Leadership & Influence * Serve as a technical leader and advisor across ...
OR · On-site
$57 - $75.75/hr
Establish and drive adoption of SRE best practices (SLOs, SLIs, error budgets, reliability engineering standards) Cross-Team Leadership & Influence * Serve as a technical leader and advisor across ...
OR · On-site +1
$108.40K - $147.40K/yr
What we are looking for: * 5+ years experience in DevOps, Site Reliability Engineering, Production ... Production experience with database software such as PostgreSQL * Experience with GitOps tooling ...
OR · On-site +1
$108.40K - $147.40K/yr
What we are looking for: * 5+ years experience in DevOps, Site Reliability Engineering, Production ... Production experience with database software such as PostgreSQL * Experience with GitOps tooling ...
OR · Remote
A Software Engineer in Platform Operations is responsible for helping design, build, and operate ... , or Site Reliability Engineering (SRE) role. * Deep understanding of Kubernetes, underlying ...
OR · Remote
A Software Engineer in Platform Operations is responsible for helping design, build, and operate ... , or Site Reliability Engineering (SRE) role. * Deep understanding of Kubernetes, underlying ...
$158.20K - $200.70K/yr
Our CRE team adapts the best practices of Site Reliability Engineering (SRE) and applies them to our customers. This role is focused on bringing this practice to the Hypershield software suite ...
New
$132.40K - $220.60K/yr
As a Site Reliability Engineer at CoverMyMeds, you support engineers by creating highly automated services to make building shipping and running software as efficient as possible. You will build ...
$11.44 - $19.22
1% of jobs
$19.22 - $27.01
0% of jobs
$27.01 - $34.80
0% of jobs
$34.80 - $42.58
2% of jobs
$42.58 - $50.37
4% of jobs
$50.37 - $58.16
17% of jobs
$58.23 is the 25th percentile. Wages below this are outliers.
$58.16 - $65.94
27% of jobs
$65.94 - $73.73
20% of jobs
$75.07 is the 75th percentile. Wages above this are outliers.
$73.73 - $81.51
17% of jobs
$81.51 - $89.30
6% of jobs
$89.30 - $97.09
4% of jobs
$11
$67
$97
| Aspect | Software Engineer Site Reliability Engineer | DevOps Engineer |
|---|---|---|
| Credentials | Bachelor's in CS or related, sometimes certifications in cloud or SRE practices | Bachelor's in CS, IT, or related, with certifications in cloud, automation, or CI/CD tools |
| Work Environment | Focus on reliability, scalability, and automation within software development teams | Bridge between development and operations, emphasizing automation, deployment, and infrastructure |
| Employer & Industry Usage | Tech companies, cloud providers, large enterprises | Startups, tech firms, organizations adopting DevOps practices |
While both roles focus on automation and system stability, Software Engineer Site Reliability Engineers primarily ensure system reliability and performance, whereas DevOps Engineers focus on streamlining development and deployment processes. The roles often overlap but differ in their core focus areas and daily responsibilities.

$57 - $75.75/hr
Full-time
Posted 21 days ago
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems.
As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting dynamic day-to-day work. SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.
What you'll be doing:
Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, real time monitoring, logging and alerting
Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement
Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Practice sustainable incident response and blameless postmortems
Be part of an on call rotation to support production systems
What we need to see:
BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
8+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production
5+ years experience delivering foundational infrastructure and observability platforms.
Experience in one or more of the following: Python, Go, Perl or Ruby
In depth knowledge on Linux, Networking and Containers
Ways to stand out from the crowd:
Interest in crafting, analyzing and fixing large-scale distributed systems
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Ability to debug and optimize code and automate routine tasks
Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker. Experience running Grafana, OpenTelemetry, Prometheus, and similar observability focused tools
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Computer and electronic product manufacturing
10,000+ Employees
Santa Clara, CA, US
1993