OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role of Software Engineer, Reliability involves ...

60 Openai Software Reliability Engineer Jobs Hiring Near You
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role of Software Engineer, Reliability involves ...
About the Role As OpenAI continues to grow, we are looking for experienced, problem-solving ... You will work closely with cross-functional teams, including software engineers, product managers ...
About the Role As OpenAI continues to grow, we are looking for experienced, problem-solving ... You will work closely with cross-functional teams, including software engineers, product managers ...
Software Engineer, Reliability
San Francisco, CA · On-site
$230K - $490K/yr
About the Role As OpenAI continues to grow, we are looking for experienced, problem-solving ... You will work closely with cross-functional teams, including software engineers, product managers ...
Software Engineer, Reliability
San Francisco, CA · On-site
$230K - $490K/yr
About the Role As OpenAI continues to grow, we are looking for experienced, problem-solving ... You will work closely with cross-functional teams, including software engineers, product managers ...
Software Engineer, Infrastructure Reliability
San Francisco, CA · On-site
$255K - $405K/yr
About the Team We're hiring software engineers to join our broader Infrastructure organization ... About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general ...
Software Engineer, Infrastructure Reliability
San Francisco, CA · On-site
$255K - $405K/yr
About the Team We're hiring software engineers to join our broader Infrastructure organization ... About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general ...
About the Team We're hiring software engineers to join our broader Infrastructure organization ... About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general ...
About the Team We're hiring software engineers to join our broader Infrastructure organization ... About OpenAI OpenAI is an AI research and deployment company dedicated to ensuring that general ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... reliability, and cost across Codex's production fleet. • Support model rollouts, capacity ...
New
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... reliability, and cost across Codex's production fleet. • Support model rollouts, capacity ...
New
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... reliability, and cost across Codex's production fleet. • Support model rollouts, capacity ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... reliability, and cost across Codex's production fleet. • Support model rollouts, capacity ...
Software Engineer, Infrastructure Reliability
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role of Software Engineer, Infrastructure Reliability ...
Software Engineer, Infrastructure Reliability
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role of Software Engineer, Infrastructure Reliability ...
Security Reliability Engineer
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. They are seeking a Security Reliability Engineer to design ...
Security Reliability Engineer
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. They are seeking a Security Reliability Engineer to design ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... The role involves improving Codex agents' performance in real software engineering tasks and ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... The role involves improving Codex agents' performance in real software engineering tasks and ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... The role involves improving Codex agents' performance in real software engineering tasks and ...
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... The role involves improving Codex agents' performance in real software engineering tasks and ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Reliability/DFX Engineer
San Francisco, CA · On-site
$225K - $445K/yr
About the Team OpenAI's Hardware organization develops silicon and system-level solutions designed ... with software and research partners to co-design hardware tightly integrated with AI models. In ...
Reliability/DFX Engineer
San Francisco, CA · On-site
$225K - $445K/yr
About the Team OpenAI's Hardware organization develops silicon and system-level solutions designed ... with software and research partners to co-design hardware tightly integrated with AI models. In ...
Site Reliability Engineer, Infrastructure - Analytics Platform
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... day with software engineers, embedding reliability into design, implementation, and release ...
Site Reliability Engineer, Infrastructure - Analytics Platform
San Francisco, CA · On-site
$67.25 - $89.25/hr
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose ... day with software engineers, embedding reliability into design, implementation, and release ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$255K - $490K/yr
About the Team The Frontier Systems team at OpenAI builds, launches, and supports the largest ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$255K - $490K/yr
About the Team The Frontier Systems team at OpenAI builds, launches, and supports the largest ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$67.25 - $89.25/hr
About the Team The Frontier Systems team at OpenAI builds, launches, and supports the largest ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Site Reliability Engineer, Frontier Systems Infrastructure
San Francisco, CA · On-site
$67.25 - $89.25/hr
About the Team The Frontier Systems team at OpenAI builds, launches, and supports the largest ... Build software abstractions that unify multiple clusters and present a seamless interface to ...
Software Engineer, Security Observability
New York, NY · On-site +1
$260K - $385K/yr
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
Software Engineer, Security Observability
New York, NY · On-site +1
$260K - $385K/yr
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
Software Engineer, Security Observability
New York, NY · On-site +1
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
Software Engineer, Security Observability
New York, NY · On-site +1
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
The Security team protects OpenAI's technology, people, and products. We are technical in what we ... Proactively improve the resilience and reliability of data systems to ensure high platform ...
Security Reliability Engineer
San Francisco, CA · Hybrid
$67.25 - $89.25/hr
... as OpenAI scales. About the Role We are looking for a Security Reliability Engineer to design ... build, and operate reliable, secure, and scalable infrastructure that underpins identity, access ...
Security Reliability Engineer
San Francisco, CA · Hybrid
$67.25 - $89.25/hr
... as OpenAI scales. About the Role We are looking for a Security Reliability Engineer to design ... build, and operate reliable, secure, and scalable infrastructure that underpins identity, access ...
OpenAI Jobs Information
What are the key skills and qualifications needed to thrive as a Software Reliability Engineer, and why are they important?
How does a Software Reliability Engineer typically interact with development and operations teams to improve system stability?
What are Software Reliability Engineers?
What is the difference between Software Reliability Engineer vs Software Test Engineer?
| Aspect | Software Reliability Engineer | Software Test Engineer |
|---|---|---|
| Primary Focus | Ensuring software reliability, stability, and performance over time | Designing and executing tests to identify bugs and verify functionality |
| Skills & Certifications | Knowledge of reliability engineering, scripting, monitoring tools | Testing methodologies, automation tools, scripting |
| Work Environment | Collaborates with development and operations teams, often in DevOps | Works primarily in QA/testing teams, often in dedicated testing phases |
| Industry Usage | Common in software companies focusing on product stability | Widely used in software development and QA departments |
The main difference is that Software Reliability Engineers focus on maintaining long-term software stability and performance, while Software Test Engineers concentrate on identifying bugs through testing. Both roles require technical skills and often collaborate, but their core objectives differ: reliability versus defect detection.

Full-time
This job post has expired today. Applications are no longer accepted.
Job description
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role of Software Engineer, Reliability involves ensuring the reliability, scalability, and performance of OpenAI's systems while collaborating with cross-functional teams to build resilient infrastructure that can handle a growing user base.
Responsibilities:
• Design and implement solutions to ensure the scalability of our infrastructure to meet rapidly increasing demands.
• Build and maintain the load, chaos and synthetic testing software leveraged by development teams to make the systems they design and operate more reliable.
• Build and maintain automation tools to streamline repetitive tasks and improve system reliability.
• Build and maintain the platform for CPU/storage, GPU, and network lifecycle management to drive efficiency, accountability and support dynamic optimization of our resources.
• Implement fault-tolerant and resilient design patterns to minimize service disruptions.
• Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure and ensure system reliability.
• Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
• Participate in an on-call rotation to respond to critical incidents and ensure 24/7 system availability.
Qualifications:
Required:
• Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
• Proven experience as an SWE focused on reliability or a similar role in a fast-paced, rapidly scaling company.
• Strong proficiency in cloud infrastructure.
• Proficiency in programming languages.
• Experience with containerization technologies and container orchestration platforms like Kubernetes.
• Knowledge of IaC tools such as Terraform or CloudFormation.
• Excellent problem-solving and troubleshooting skills.
• Strong communication and collaboration skills.
• Experience with observability tools such as DataDog, Prometheus, Grafana and Splunk.
• Experience with microservices architecture and service mesh technologies.
• Knowledge of security best practices in cloud environments.
Company:
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation. Founded in 2015, the company is headquartered in San Francisco, USA, with a team of 1001-5000 employees. The company is currently Late Stage.
About OpenAI
Sourced by ZipRecruiter
Industry
Scientific research and development services
Company size
201 - 500 Employees
Headquarters location
San Francisco, CA, US
Year founded
2015