2

Remote Chaos Engineering Jobs (NOW HIRING)

Site Reliability Engineer

San Francisco, CA · Remote

$67.25 - $89.25/hr

Remote (US) Department: Cloud Platform Engineering / SRE/Reliability Position summary The Site ... Drive chaos engineering, game days, and reliability testing programs * Produce SLA performance ...

Lead the organization from reactive firefighting to a predictive, self-healing culture through the aggressive adoption of Chaos Engineering and AlOps Job Designation Remote: Employee is not required ...

Lead the organization from reactive firefighting to a predictive, self-healing culture through the aggressive adoption of Chaos Engineering and AlOps Job Designation Remote: Employee is not required ...

next page

Showing results 1-20

Remote Chaos Engineering information

See salary details

$73K

$194.7K

$254K

How much do remote chaos engineering jobs pay per year?

As of Jun 30, 2026, the average yearly pay for remote chaos engineering in the United States is $194,709.00, according to ZipRecruiter salary data. Most workers in this role earn between $141,500.00 and $253,000.00 per year, depending on experience, location, and employer.

What is Remote Chaos Engineering?

Remote Chaos Engineering is the practice of testing distributed systems' resilience by intentionally introducing failures and disruptions in remote or cloud environments. The goal is to identify weaknesses and improve system reliability by simulating real-world incidents, such as network outages or server crashes, in a controlled manner. This approach helps teams understand how their applications behave under stress and develop strategies to mitigate future incidents. Remote Chaos Engineering is particularly valuable for organizations leveraging cloud infrastructure and remote services, ensuring robust performance even under unexpected conditions.

What are some common challenges faced by professionals working in remote chaos engineering roles?

Professionals in remote chaos engineering often encounter challenges such as coordinating experiments across distributed teams, ensuring clear communication about system vulnerabilities, and managing the complexity of large-scale systems without direct, on-site access. Establishing robust monitoring and rollback procedures is essential to minimize risk during remote testing. Additionally, building trust with development and operations teams is key, as chaos engineering often involves intentionally introducing failures to improve system resilience.

What is the least stressful remote job?

Remote Chaos Engineering roles typically involve analyzing system resilience and running experiments to improve infrastructure stability, often requiring strong problem-solving skills and familiarity with cloud platforms. These positions can be less stressful when they have clear protocols, manageable workloads, and minimal on-call responsibilities. Overall, jobs with predictable schedules and low urgency tend to be less stressful in remote technical roles.

What are the key skills and qualifications needed to thrive as a Remote Chaos Engineer, and why are they important?

To thrive as a Remote Chaos Engineer, you need a strong background in software engineering, systems architecture, and site reliability, often supported by a degree in computer science or a related field. Familiarity with chaos engineering platforms (such as Gremlin or Chaos Monkey), cloud environments (AWS, Azure, GCP), and automation tools is typically required. Strong problem-solving abilities, clear communication, and a collaborative mindset help you effectively identify weaknesses and drive reliability improvements across distributed teams. These skills are crucial for proactively uncovering system vulnerabilities, ensuring system resilience, and maintaining high availability in complex, remote-first infrastructures.

Is it possible to work remotely as an engineer?

Remote chaos engineering roles are common in the tech industry, allowing engineers to work from anywhere with a reliable internet connection. These positions often require familiarity with cloud platforms, scripting, and monitoring tools, and may involve collaboration across time zones.

What engineers make $500,000?

Senior engineers in specialized fields such as software engineering, cloud infrastructure, or cybersecurity can earn $500,000 or more annually, especially with extensive experience, advanced skills, and leadership roles. High compensation often includes base salary, bonuses, and stock options, particularly in large tech companies or startups with significant growth potential.

What is the difference between Remote Chaos Engineering vs Remote Site Reliability Engineer?

AspectRemote Chaos EngineeringRemote Site Reliability Engineer
Primary FocusDesigning and executing chaos experiments to improve system resilienceEnsuring system reliability, availability, and performance through monitoring and automation
Skills & CertificationsKnowledge of chaos engineering tools, scripting, cloud platformsMonitoring tools, scripting, cloud infrastructure, SRE certifications
Work EnvironmentCollaborates with development and operations teams, often in DevOps cultureWorks closely with engineering teams to maintain system health and SLAs

While both roles focus on system stability, Remote Chaos Engineering specializes in testing system resilience through chaos experiments, whereas Remote Site Reliability Engineers focus on maintaining overall system reliability and performance. Both roles require scripting skills and cloud knowledge, but their core objectives differ: one proactively tests, the other maintains system health.

Is chaos engineering still used today?

Chaos engineering is actively used in modern IT environments to improve system resilience by intentionally introducing failures and testing responses. Professionals in roles like remote chaos engineering often utilize tools such as Chaos Monkey and conduct experiments in cloud or distributed systems to identify weaknesses before outages occur.
More about Remote Chaos Engineering jobs
What cities are hiring for Remote Chaos Engineering jobs? Cities with the most Remote Chaos Engineering job openings:
What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:
What states have the most Remote Chaos Engineering jobs? States with the most job openings for Remote Chaos Engineering jobs include:
What job categories do people searching Remote Chaos Engineering jobs look for? The top searched job categories for Remote Chaos Engineering jobs are:
Infographic showing various Remote Chaos Engineering job openings in the United States as of June 2026, with employment types broken down into 99% Full Time, and 1% Part Time. Highlights an 37% Physical, 3% Hybrid, and 60% Remote job distribution, with an average salary of $194,709 per year, or $93.6 per hour.
Manager, Software Engineering (Resilience Engineering)

Manager, Software Engineering (Resilience Engineering)

Affirm

Remote

Full-time

Medical, Dental, Vision

Posted 23 hours ago


Job description

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.
We are seeking a seasoned Engineering Manager to lead our Resilience Engineering team. This role is critical in ensuring the safety and reliability of our production systems through proactive validation techniques, including production load testing and chaos engineering.
You will lead the development of systems and practices that allow engineers to safely test system behavior under stress and failure conditions in production, ensuring issues are discovered and mitigated before they impact real users.
What you'll do
Leadership & Strategy
  • Define and drive the vision for resilience engineering at Affirm, with a focus on production load testing and chaos engineering as first-class engineering practices.
  • Lead and mentor a team of engineers building platforms and tooling for safe production experimentation.
  • Partner with infrastructure, product, and security leadership to embed resilience validation into the software development lifecycle.
  • Establish best practices for safely testing system limits and failure scenarios in production.

Systems & Operations
  • Own the design and evolution of platforms that enable safe, controlled production load testing and fault injection.
  • Ensure strong safeguards are in place, including isolation boundaries, approval workflows, and automated rollback mechanisms to protect real users.
  • Build systems that provide end-to-end observability, traceability, and auditability for all resilience experiments.
  • Drive reliability improvements by systematically identifying weaknesses through load testing and chaos experiments.
  • Establish monitoring, alerting, and incident response practices tailored to proactive resilience validation.

Collaboration & Enablement
  • Work closely with engineering teams to design and execute production load tests and chaos experiments safely.
  • Partner with infrastructure teams to build guardrails around tests and experimentations.
  • Enable teams to adopt resilience practices by providing reusable tooling, frameworks, and standardized workflows.
  • Identify systemic weaknesses and lead cross-functional efforts to improve reliability and fault tolerance.
  • Evangelize a culture of "test failure before failure tests you" across the organization.

What we look for
  • Proven experience leading engineering teams in reliability, infrastructure, or distributed systems.
  • Hands-on experience with production load testing, chaos engineering, or large-scale system validation.
  • Experience with leveraging a chaos engineering vendor such as Gremlin, Harness, or something similar.
  • Strong understanding of failure modes in distributed systems, including latency, partial failure, and cascading outages.
  • Experience building or operating systems with strong safety guarantees (isolation, rate limiting, guardrails, auditability).
  • Familiarity with cloud-native environments (AWS, Kubernetes) and observability tooling.
  • Strong programming background (e.g., Python, Kotlin, Java, or similar).
  • Excellent problem-solving skills and the ability to balance long-term resilience investments with immediate business needs.
  • Strong communication and leadership skills, with a track record of influencing engineering practices across teams.
  • This position requires either equivalent practical experience or a Bachelor's degree in a related field.

Base Pay Grade - P
Equity Grade - 13
Employees new to Affirm typically come in at the start of the pay range. Affirm focuses on providing a simple and transparent pay structure which is based on a variety of factors, including location, experience and job-related skills.
Base pay is part of a total compensation package that may include equity rewards, monthly stipends for health, wellness and tech spending, and benefits (including 100% subsidized medical coverage, dental and vision for you and your dependents.)
USA base pay range (CA, WA, NY, NJ, CT) per year: 230,000 - 290,000
USA base pay range (all other U.S. states) per year: 204,000 - 264,000
#LI-Remote
Affirm is proud to be a remote-first company! The majority of our roles are remote and you can work almost anywhere within the country of employment. Affirmers in proximal roles have the flexibility to work remotely, but will occasionally be required to work out of their assigned Affirm office. A limited number of roles remain office-based due to the nature of their job responsibilities.
We're extremely proud to offer competitive benefits that are anchored to our core value of people come first. Some key highlights of our benefits package include:
  • Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
  • Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
  • Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
  • ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount

We believe It's On Us to provide an inclusive interview experience for all, including people with disabilities. We are happy to provide reasonable accommodations to candidates in need of individualized support during the hiring process.
[For U.S. positions that could be performed in Los Angeles or San Francisco] Pursuant to the San Francisco Fair Chance Ordinance and Los Angeles Fair Chance Initiative for Hiring Ordinance, Affirm will consider for employment qualified applicants with arrest and conviction records.
By clicking "Submit Application," you acknowledge that you have read Affirm's Global Candidate Privacy Notice and hereby freely and unambiguously give informed consent to the collection, processing, use, and storage of your personal information as described therein.