2

Remote Chaos Engineering Jobs (NOW HIRING)

Sr Site Reliability Engineer

Leesburg, VA ยท Remote

$57.75 - $76.50/hr

Familiarity with chaos engineering practices and tooling.Experience with data pipeline reliability ... This is a remote position. While performing the duties of this job, the employee regularly works in ...

Sr Site Reliability Engineer

Leesburg, VA ยท Remote

$57.75 - $76.50/hr

Familiarity with chaos engineering practices and tooling. Experience with data pipeline reliability ... This is a remote position. While performing the duties of this job, the employee regularly works in ...

This role is based in Toronto, Canada, with flexibility for remote work across Canada. The Team ... Expertise with observability and reliability tooling, plus resiliency/chaos testing. * Success ...

REMOTE Recruiting Coordinator

Seattle, WA ยท Remote

$45 - $57.50/hr

Support large-scale Engineering, Product, Business, GTM, and non-technical hiring * Partner with ... Strong organizational skills with the ability to manage "organized chaos" * Exceptional written and ...

Sr Java SDET - 100% Remote

San Antonio, TX ยท Remote

$51.75 - $65.75/hr

... Remote) Work closely with software engineers to build quality by ensuring proper test and ... in Chaos Testing Exposure to Software Engineering Principles. Technology Skills Selenium RS ...

Sr Java SDET - 100% Remote

San Antonio, TX ยท On-site +1

$51.75 - $65.75/hr

... Remote) Work closely with software engineers to build quality by ensuring proper test and ... in Chaos Testing Exposure to Software Engineering Principles. Technology Skills Selenium RS ...

Senior Software Engineer - USA

Manhattan, NY ยท Remote

$135K - $178K/yr

We're a 55-person, fully remote team (40 in engineering) across North and South America, operating ... Why Join Canals Bootstrapped and profitable: stability without the chaos of venture pivots. Real ...

Senior Software Engineer - USA

Richmond, VA ยท Remote

$121K - $159K/yr

We're a 55-person, fully remote team (40 in engineering) across North and South America, operating ... Why Join Canals Bootstrapped and profitable: stability without the chaos of venture pivots. Real ...

next page

Showing results 1-20

Remote Chaos Engineering information

See salary details

$73K

$194.7K

$254K

How much do remote chaos engineering jobs pay per year?

As of Jun 6, 2026, the average yearly pay for remote chaos engineering in the United States is $194,709.00, according to ZipRecruiter salary data. Most workers in this role earn between $141,500.00 and $253,000.00 per year, depending on experience, location, and employer.

What is Remote Chaos Engineering?

Remote Chaos Engineering is the practice of testing distributed systems' resilience by intentionally introducing failures and disruptions in remote or cloud environments. The goal is to identify weaknesses and improve system reliability by simulating real-world incidents, such as network outages or server crashes, in a controlled manner. This approach helps teams understand how their applications behave under stress and develop strategies to mitigate future incidents. Remote Chaos Engineering is particularly valuable for organizations leveraging cloud infrastructure and remote services, ensuring robust performance even under unexpected conditions.

What are some common challenges faced by professionals working in remote chaos engineering roles?

Professionals in remote chaos engineering often encounter challenges such as coordinating experiments across distributed teams, ensuring clear communication about system vulnerabilities, and managing the complexity of large-scale systems without direct, on-site access. Establishing robust monitoring and rollback procedures is essential to minimize risk during remote testing. Additionally, building trust with development and operations teams is key, as chaos engineering often involves intentionally introducing failures to improve system resilience.

What are the key skills and qualifications needed to thrive as a Remote Chaos Engineer, and why are they important?

To thrive as a Remote Chaos Engineer, you need a strong background in software engineering, systems architecture, and site reliability, often supported by a degree in computer science or a related field. Familiarity with chaos engineering platforms (such as Gremlin or Chaos Monkey), cloud environments (AWS, Azure, GCP), and automation tools is typically required. Strong problem-solving abilities, clear communication, and a collaborative mindset help you effectively identify weaknesses and drive reliability improvements across distributed teams. These skills are crucial for proactively uncovering system vulnerabilities, ensuring system resilience, and maintaining high availability in complex, remote-first infrastructures.

What is the difference between Remote Chaos Engineering vs Remote Site Reliability Engineer?

AspectRemote Chaos EngineeringRemote Site Reliability Engineer
Primary FocusDesigning and executing chaos experiments to improve system resilienceEnsuring system reliability, availability, and performance through monitoring and automation
Skills & CertificationsKnowledge of chaos engineering tools, scripting, cloud platformsMonitoring tools, scripting, cloud infrastructure, SRE certifications
Work EnvironmentCollaborates with development and operations teams, often in DevOps cultureWorks closely with engineering teams to maintain system health and SLAs

While both roles focus on system stability, Remote Chaos Engineering specializes in testing system resilience through chaos experiments, whereas Remote Site Reliability Engineers focus on maintaining overall system reliability and performance. Both roles require scripting skills and cloud knowledge, but their core objectives differ: one proactively tests, the other maintains system health.

More about Remote Chaos Engineering jobs
What cities are hiring for Remote Chaos Engineering jobs? Cities with the most Remote Chaos Engineering job openings:
What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:
What states have the most Remote Chaos Engineering jobs? States with the most job openings for Remote Chaos Engineering jobs include:
What job categories do people searching Remote Chaos Engineering jobs look for? The top searched job categories for Remote Chaos Engineering jobs are:
Infographic showing various Remote Chaos Engineering job openings in the United States as of May 2026, with employment types broken down into 90% Full Time, 6% Part Time, 3% Contract, and 1% Nights. Highlights an 89% Physical, 3% Hybrid, and 8% Remote job distribution, with an average salary of $194,709 per year, or $93.6 per hour.

RELIABILITY ENGINEER with Security Clearance

Blue Obsidian Solutions

Tampa, FL โ€ข On-site, Remote

$96K - $121K/yr

Other

Medical, Dental, Vision

Posted 7 days ago


Job description

We are seeking a skilled and proactive Reliability Engineer to join our team. Reliability engineers are responsible for identifying potential issues or areas for improvement by analyzing data and recognizing patterns within it. Once problems are detected, the reliability engineer develops and implements solutions to prevent them, ultimately enhancing the reliability of systems, equipment, and processes. Responsibilities: * Analyzing equipment failure data to detect patterns and trends.
  • Conducting root cause analysis to identify the underlying causes of issues.
  • Creating and implementing new maintenance procedures.
  • Designing and establishing new protocols for monitoring and testing equipment.
  • Exploring new technologies and processes to enhance equipment performance and reliability.
  • Developing and executing training programs for employees.
  • Collaborating with other departments to ensure reliability is incorporated into all areas of the organization.
  • System Reliability: Design and implement strategies to improve the availability, reliability, and performance of critical systems and applications.
  • Incident Management: Lead root cause analysis for major incidents, identify systemic issues, and implement long-term solutions to prevent recurrences.
  • Monitoring and Alerting: Develop and maintain robust monitoring systems to detect issues proactively and optimize alerting mechanisms to ensure timely response.
  • Capacity Planning: Analyze system usage patterns to predict future growth, optimize capacity, and ensure scalability.
  • Failure Analysis: Conduct thorough failure analysis and implement fault tolerant systems to minimize the impact of potential failures.
  • Collaboration: Work closely with software engineering, DevOps, and infrastructure teams to design reliable architecture and improve operational workflows.
  • Documentation: Create and maintain comprehensive documentation of reliability practices, system designs, and incident reports.
  • Continuous Improvement: Regularly evaluate current processes and systems, identifying areas for improvement and implementing enhancements.
Required Skills/Qualifications/Duty Experience Essential: * The ability to think critically and logically, and to work with large datasets to draw meaningful conclusions.
  • A solid understanding of the systems, equipment, and processes involved, including knowledge of engineering principles and specific organizational systems.
  • The capacity to think creatively and develop innovative solutions for complex issues.
  • Be able to clearly explain technical concepts and collaborate effectively with others.
  • The ability to recognize potential hazards and take appropriate actions to mitigate risks..
  • Hands-on experience with cloud platforms (AWS, Azure, GCP) and securing cloud environments.
  • Strong understanding of containerization technologies (Docker, Kubernetes) and their security.
  • Knowledge of security tools like SAST, DAST, vulnerability scanners, and SIEM solutions.
  • Strong experience in system reliability, site reliability engineering (SRE), or a similar role.
  • Proficiency in cloud platforms (AWS, Azure, GCP) and associated reliability tools.
  • Hands-on experience with monitoring and logging tools such as Prometheus, Grafana, Datadog, Splunk, or ELK stack.
  • Proficiency in scripting languages like Python, Bash, or Go for automation.
  • Familiarity with containerization and orchestration tools (Docker, Kubernetes).
  • Strong understanding of distributed systems, fault tolerant design, and high availability architectures.
Experience in root cause analysis and implementing systemic improvements. Preferred: Certifications in cloud platforms (e.g., AWS Certified Solutions Architect, Google Cloud Engineer).
  • Certifications in Security+, CCNA/CCNP, Linux+
  • Experience in capacity planning and performance tuning of largescale systems.
  • Familiarity with chaos engineering practices.
Strong communication skills and the ability to work collaboratively with cross functional teams. Security Requirements Must possess and maintain a TS/SCI clearance at time of hire Education/Certification Requirements Bachelor's degree in program or project management, information technology, or a related technical discipline; or the equivalent Travel: Ability to travel as needed, estimate less than 25% What We Offer: * Competitive salary and performance-based incentives
  • Comprehensive health, dental, and vision benefits
  • Opportunities for professional development and certifications
  • Flexible work environment with hybrid or remote options
  • A supportive, innovative, and growth-oriented culture How to Apply: If you are passionate about building and maintaining reliable systems and thrive in a fast-paced environment, we want to hear from you! Please submit your resume and a cover letter detailing your experience and enthusiasm for the role to .