1

Systems Reliability Engineer Jobs (NOW HIRING)

You'll architect the systems and strategies that allow SimSpace to deliver software seamlessly ... What will you be doing as a Staff SRE at SimSpace? * Technical Strategy & Architecture: Design and ...

Lead all reliability activities of government mobility products, antenna systems, and crypto products. * Build and maintain a strong working relationship with Viasat Government business, engineering ...

Reliability Engineer

Duluth, GA ยท On-site

$176K - $264K/yr

Lead all reliability activities of government mobility products, antenna systems, and crypto products. * Build and maintain a strong working relationship with Viasat Government business, engineering ...

SRE Engineer

Arlington, VA ยท On-site

$65.75 - $87.25/hr

The role focuses on improving the reliability and performance of mission-critical systems in a complex multi-environment ecosystem while collaborating with various engineering teams. Responsibilities ...

Senior SRE Engineer

New York, NY ยท On-site

$62.25 - $82.75/hr

Lead and manage the SRE team to uphold system reliability, availability, and performance standards. * Design, implement, and optimize scalable infrastructure and automation solutions to support ...

Reliability Engineer

Gurabo, PR ยท On-site

$98K - $124K/yr

Acquires and analyzes data from connected systems and continually improves maintenance program ... Reliability Engineering * Equipment preventative maintenance (PM) task creation and management for ...

GCP Site Reliability Engineer Interview Mode: candidates local to Parsippany, Nj who can attend an ... System Reliability: Ensure the reliability and uptime of critical services and infrastructure.

Site Reliability Engineer (SRE)

Decatur, TX ยท On-site

$129K - $160K/yr

Join Our Team as a Site Reliability Engineer (SRE)! About Us At Energy Worldnet, Inc. (EWN), we ... If you're driven by operational excellence, system reliability, and continuous improvement, we'd ...

Site Reliability Engineer (SRE) - II

Columbus, OH ยท On-site +1

$53.50 - $71.25/hr

Analyze system performance and recommend optimizations for scalability and reliability. Support ... Collaborate with software engineering teams to influence the design of new services and ...

next page

Showing results 1-20

Systems Reliability Engineer information

See salary details

$61K

$118K

$141K

How much do systems reliability engineer jobs pay per year?

As of Jun 22, 2026, the average yearly pay for systems reliability engineer in the United States is $117,973.00, according to ZipRecruiter salary data. Most workers in this role earn between $102,500.00 and $129,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Systems Reliability Engineer, and why are they important?

To thrive as a Systems Reliability Engineer, you need expertise in infrastructure management, automation, and software engineering, often supported by a degree in computer science or a related field. Familiarity with tools like Kubernetes, Docker, CI/CD pipelines, monitoring systems (e.g., Prometheus, Grafana), and relevant cloud certifications (AWS, GCP, or Azure) is typically required. Strong problem-solving abilities, communication skills, and a proactive mindset help you prevent and resolve incidents efficiently. These skills ensure systems remain robust, scalable, and highly available, which is critical for maintaining business continuity and user trust.

What are Systems Reliability Engineers?

Systems Reliability Engineers (SREs) are IT professionals responsible for ensuring the reliability, availability, and performance of software systems and infrastructure. They combine software engineering and systems administration skills to automate processes, monitor system health, respond to incidents, and improve system resilience. SREs work closely with development and operations teams to optimize deployment pipelines, manage outages, and implement best practices for scalability and reliability. Their goal is to minimize downtime and ensure a seamless user experience.

How does a Systems Reliability Engineer typically collaborate with development and operations teams to ensure system stability?

Systems Reliability Engineers (SREs) work closely with both development and operations teams to bridge the gap between software engineering and IT operations. They participate in design reviews to ensure reliability is built into new features, coordinate with developers to automate deployments, and work with operations to monitor system health and respond to incidents. By fostering a culture of shared responsibility for uptime and performance, SREs help streamline troubleshooting and drive improvements across the organization. Regular communication and joint post-incident reviews are key practices in this collaborative environment.

What is the difference between Systems Reliability Engineer vs DevOps Engineer?

AspectSystems Reliability EngineerDevOps Engineer
Primary FocusEnsuring system reliability, availability, and performanceAutomating deployment, integration, and continuous delivery
Skills & CertificationsSRE certifications, Linux, scripting, monitoring toolsCI/CD tools, cloud platforms, scripting, automation
Work EnvironmentOperations, infrastructure, and reliability teamsDevelopment and operations collaboration
Industry UsageTech, finance, e-commerceTech, startups, cloud services

While both roles focus on improving system performance, Systems Reliability Engineers primarily concentrate on maintaining system uptime and reliability, whereas DevOps Engineers focus on streamlining development and deployment processes. Both roles often collaborate but serve different core functions within an organization.

More about Systems Reliability Engineer jobs
What cities are hiring for Systems Reliability Engineer jobs? Cities with the most Systems Reliability Engineer job openings:
What states have the most Systems Reliability Engineer jobs? States with the most job openings for Systems Reliability Engineer jobs include:
What job categories do people searching Systems Reliability Engineer jobs look for? The top searched job categories for Systems Reliability Engineer jobs are:
Infographic showing various Systems Reliability Engineer job openings in the United States as of June 2026, with employment types broken down into 3% As Needed, 50% Full Time, 31% Part Time, 15% Contract, and 1% Nights. Highlights an 87% Physical, 5% Hybrid, and 8% Remote job distribution, with an average salary of $117,973 per year, or $56.7 per hour.
Staff Site Reliability Engineer

$165K - $230K/yr

Full-time

Medical, Dental, Vision, Retirement, PTO

Posted 19 hours ago


Job description

SimSpace serves as an AI Proving Ground where organizations can confidently train, test, and outmaneuver adversaries in any environment. Trusted by allied governments, militaries, enterprises, and research institutions worldwide, SimSpace enables adaptive, AI-ready defenses that stay ahead of evolving threats. Founded in 2015 by experts from U.S. Cyber Command and MIT Lincoln Laboratory, the platform unifies training, testing, and validation in a realistic, live-fire simulation-helping teams evaluate security investments, optimize performance, and compress cyber readiness cycles from months to days.
Why join SimSpace? We are an organization that is focused on building our culture and mindfully enhancing our atmosphere every day which is why we have collaborated on an integral value system. Our governing philosophy of being Human Centered is deeply embedded within our value system. We apply this philosophy to every one of our internal team members, external clients, and their customers.
How Do We Work? We believe that people are at the center of everything we do. SimSpace fosters a culture of continuous learning, curiosity, and professional growth. That belief shows up in action: in-house training, internal and external learning platforms, cyber conferences, industry events, and dedicated time for skill development. Our people are empowered to shape their careers - and it shows. Year over year, SimSpace consistently outperforms industry benchmarks in internal mobility, promotions, and total rewards growth.
Who Thrives Here? We are a team of innovators, protectors, and problem-solvers. We believe diversity of thought and experience fuels better solutions, and we're committed to building teams that reflect the communities we serve. Whether you're remote or office-based, you'll collaborate with talented colleagues across departments and time zones, united by the mission to create a safer digital world.
We invite you to apply today!
About the Role We are looking for a Staff Site Reliability Engineer to define the technical vision, lead the architecture, and secure the infrastructure that powers the SimSpace cyber range platform. The ideal candidate is a deeply experienced SRE and exceptional software engineer who thinks strategically about distributed systems, reliability, and operability at a global scale. At the Staff level, you will act as a force multiplier-architecting resilient systems, driving engineering standards, and solving our most complex infrastructure challenges rather than relying on manual processes or localized fixes.
In this position, you'll provide overarching technical leadership across our SRE practice, bridging traditional site reliability, DevOps, and DevSecOps. You'll architect the systems and strategies that allow SimSpace to deliver software seamlessly across our own data centers, to customers who bring their own hardware, and as pre-packaged appliances with bundled hardware and software. As our on-premises product matures and scales, you will design the long-term automation frameworks that make these varied deployments robust, secure, and repeatable.
What will you be doing as a Staff SRE at SimSpace?
  • Technical Strategy & Architecture: Design and architect the overarching infrastructure strategy that enables consistent, repeatable, and secure deployments across SimSpace-hosted data centers, customer-provided hardware, and highly restricted air-gapped environments.
  • Platform Evolution & Configuration Management: Lead the evolution of our CI/CD and Kubernetes platforms. Drive advanced application packaging, templating, and configuration management strategies using Jsonnet and Grafana Tanka (alongside Kustomize). Move beyond maintaining pipelines to architecting multi-cluster, multi-environment deployment frameworks that drastically improve developer velocity.
  • Reliability Leadership: Define, measure, and govern Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across the engineering organization. Partner with product and engineering leadership to balance feature delivery with platform stability.
  • Advanced Observability: Architect our enterprise observability strategy using the Grafana stack. Design frameworks for proactive monitoring, complex anomaly detection, and distributed tracing that give teams unparalleled visibility into system health, pod scaling, and latency bottlenecks.
  • Security & Compliance Architecture: Drive the infrastructure security posture at an architectural level. Embed advanced container security, zero-trust network segmentation, and automated compliance policies directly into our deployment pipelines and runtime environments.
  • Cross-Functional Enablement: Serve as a strategic partner and consultant to development teams. Advocate for an "SRE culture" by designing self-service tooling, establishing "paved roads" for developers, and reducing operational toil across the entire engineering org.
  • Incident Command: Act as an Incident Commander during complex, high-severity outages. Drive blameless post-mortems and engineer long-term, systemic, and architectural fixes to ensure classes of failures never repeat.
  • Mentorship & Multiplier: Act as a technical mentor to senior and mid-level engineers. Raise the baseline of engineering excellence across the company by coaching, documenting best practices, and leading by example.

Who you are:
  • Experience: 8+ years of experience in Site Reliability, Platform, or DevOps engineering, with a proven track record of operating at a Staff, Principal, or Lead level to drive organization-wide infrastructure initiatives.
  • Expert Software Engineering: You possess deep software engineering skills (beyond scripting) and can architect complex, production-quality systems. You design clean interfaces, build maintainable tooling, and can dictate the technical direction of our internal toolchain. Language agnostic, but highly proficient in at least one modern language (e.g., Go, Python).
  • Advanced Kubernetes & Configuration Mastery: Deep, architectural understanding of Kubernetes in multi-tenant and multi-cluster production environments. You possess expert-level knowledge of Jsonnet and Grafana Tanka for managing complex, scalable Kubernetes configurations and application packaging.
  • GitOps & IaC Expertise: Extensive experience architecting sophisticated CI/CD pipelines and GitOps workflows using GitHub Actions, ArgoCD, and infrastructure-as-code principles at an enterprise scale.
  • Complex Deployments: Systems-level thinking with the ability to design architectures that span self-hosted, on-premises, VMware-based, and air-gapped deployment models.
  • Observability Expert: Deep expertise with observability platforms (Grafana stack preferred) and a proven ability to design alerting and monitoring strategies for complex distributed systems.
  • Security Mindset: Strong background in infrastructure security architecture, including container hardening, network security, vulnerability management, and delivering software to heavily regulated or customer-managed environments.
  • Influential Communicator: Exceptional communication and stakeholder management skills. You have a service-oriented mindset, but you also have the ability to influence cross-functional leadership, negotiate reliability tradeoffs, and align engineering teams behind a unified technical vision.

We're proud to offer a competitive and comprehensive package designed to support your well-being, growth, and success:
  • Compensation. Base salary range: $165,000 - $230,000 reflecting our confidence in your expertise and impact, with the opportunity for bonuses tied to company performance and individual contributions.
  • Health & Wellness. Comprehensive medical, dental, and vision benefits, plus savings plans-coverage starts on day one!
  • Mental Health Support. Access to company-paid counseling, coaching, and resources for you and your family through Spring Health.
  • Financial Well-Being. Plan for your future with a 401(k)-retirement savings plan featuring a company match.
  • Flexible Time Off. Take the time you need with unlimited vacation and dedicated health & wellness days. SimSpace provides flexible solutions to meet the diverse work-life needs of team members.
  • Parental Leave. Paid leave plans to support you and your loved ones during life's most important moments.
  • Ownership Opportunities. Equity stock options at hire, with annual performance-based grants-become an invested stakeholder in our shared success.
  • Referral Rewards. Earn $1,500-$3,500 for every qualified hire through our employee referral program.
  • Peloton Interactive Wellness Program. Full- and partial- subsidized membership plans and equipment discounts to help you reach your personalized fitness goals.
  • Continuous Learning. Access a LinkedIn Learning membership to prioritize your personal and professional development.
  • Social Connections. Monthly reimbursements for meaningful connections with teammates through our SocialSpace Community.
  • Extra Perks. Legal plan coverage, pet insurance, wellness reimbursements, and more to simplify life's details.

Join SimSpace and enjoy benefits that enhance your career, health, and happiness!
SimSpace is an Equal Opportunity Employer:
In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification document form upon hire.
SimSpace is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, pregnancy, genetic information, disability, status as a protected veteran, or any other protected category under applicable federal, state, and local laws. We are committed to providing an inclusive and welcoming environment for all members of our staff, clients, volunteers, subcontractors, vendors, and clients.
Research shows that women and people from underrepresented groups only apply to jobs if they meet all of the qualifications. However, no one ever meets 100% of the qualifications. SimSpace encourages you to break that statistic and to apply. We look forward to your application!
We also consider qualified applicants regardless of criminal histories, in accordance with applicable law. We are committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need assistance or accommodation due to a disability, please contact careers@simspace.com.
SimSpace does not accept unsolicited resumes from employment agencies.
Actual compensation for the position is based on a variety of factors, including, but not limited to affordability, skills, qualifications and experience, and may vary from the range.