1

Chaos Engineering Jobs (NOW HIRING)

Cloud Solution Architect

$65 - $89/hr

Chaos engineering experience. * Strong understanding of virtual networks and general network management functions. * Solid knowledge of concepts of designing and developing dynamic cloud solutions.

Preferred Qualifications: - Experience in reliability/chaos engineering or disaster recovery planning. - Familiarity with AWS Well-Architected Framework and resilient design patterns. - AWS Certified ...

... Chaos Engineering- Tools (Chaos Monkey, Gremlin),Performance Testing -Emerging Tools (K6,Gatling),Performance Testing -Execution (Baseline, Load, Endurance, Stress, Volume, Network, DR, Failover ...

Site Reliability Engineer - Hybrid

Reston, VA · On-site

$59.25 - $78.75/hr

Conduct chaos engineering experiments using tools like AWS FIS and Chaos Toolkit. Perform resiliency assessments using Resilience Hub and implement self-healing solutions. 6. Database & Application ...

$61.50 - $81.75/hr

Drive continuous improvement through automation, self-healing systems, chaos engineering, and capacity planning. * Maintain runbooks, playbooks, and knowledge repositories, linking documentation to ...

SRE ENGINEER/ MANAGER

Reston, VA · On-site

$59.25 - $78.75/hr

... chaos engineering, resiliency assessments, and implement self-healing architectures. - Manage and optimize databases (PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift) and provide production support ...

Chaos Labs - AI Engineer

New York, NY · On-site +1

$130K - $150K/yr

Chaos Labs builds technology that powers safer, more accessible financial markets. Our risk ... engineering; 2+ years building production AI/ML systems * Hands-on experience with agentic ...

MLOps Lead Engineer

Plano, TX

$95.80K - $126.20K/yr

Experience with Chaos Engineering and proactive resilience testing * AWS certifications (e.g., Solutions Architect, Machine Learning Specialty) or equivalent cloud certifications Chase is a leading ...

CI/CD, high code coverage, chaos engineering - Understand, retain, and perform complex procedures- Be proficient with git or other version control- Develop strong UNIX debugging skills- Communicate ...

next page

Showing results 1-20

Chaos Engineering information

See salary details

$46.5K

$146.9K

$174K

How much do chaos engineering jobs pay per year?

As of May 29, 2026, the average yearly pay for chaos engineering in the United States is $146,868.00, according to ZipRecruiter salary data. Most workers in this role earn between $116,500.00 and $173,000.00 per year, depending on experience, location, and employer.

What is a Chaos Engineering job?

A Chaos Engineering job involves proactively identifying weaknesses in complex systems by intentionally injecting failures and observing how they respond. Professionals in this role design and execute controlled experiments to improve system resilience, ensuring that services remain reliable under unexpected conditions. They work closely with development, operations, and security teams to enhance fault tolerance and incident response strategies.

What are the key skills and qualifications needed to thrive in the Chaos Engineering position, and why are they important?

To thrive in Chaos Engineering, a strong background in software engineering, distributed systems, and reliability testing is essential, often supported by a degree in computer science or a related field. Familiarity with chaos engineering tools like Gremlin or Chaos Monkey and experience with cloud platforms, container orchestration, and monitoring systems are highly valued. Excellent problem-solving abilities, communication skills, and a mindset oriented toward experimentation help engineers collaborate effectively and analyze complex failure modes. These skills are crucial for proactively identifying system weaknesses and ensuring the resilience of large-scale technology infrastructures.

What are some typical challenges a Chaos Engineer faces, and how do they overcome them?

Chaos Engineers often face the challenge of designing effective experiments that simulate real-world failures without disrupting production systems. Balancing the need to discover vulnerabilities with maintaining uptime requires careful planning, communication, and coordination with development and operations teams. They address these challenges by thoroughly testing in controlled environments, documenting procedures, and establishing clear rollback strategies. Continuous learning and cross-functional collaboration are also key to staying ahead of new complexities in evolving systems.
What cities are hiring for Chaos Engineering jobs? Cities with the most Chaos Engineering job openings:
What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:
What states have the most Chaos Engineering jobs? States with the most job openings for Chaos Engineering jobs include:
Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Seattle, WA • On-site

$64.75 - $86.25/hr

Full-time

Posted 3 days ago


Anduril rating

9.4

Company rating: 9.4 out of 10

Based on 7 frontline employees who took The Breakroom Quiz


Job description

Job Summary:
Anduril Industries is a defense technology company focused on transforming military capabilities with advanced technology. They are seeking a Senior Site Reliability Engineer to ensure the reliability, performance, and scalability of their mission-critical systems, particularly those supporting the Lattice platform. The role involves building resilient systems, implementing monitoring and incident response strategies, and collaborating with engineering teams to enhance operational excellence.
Responsibilities:
• Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform
• Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues
• Build and maintain infrastructure automation using tools like Terraform, Kubernetes operators, and custom tooling to manage large-scale distributed systems
• Establish and track Service Level Objectives (SLOs) and Error Budgets to balance feature velocity with system reliability
• Partner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineering
• Develop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demands
• Create runbooks, documentation, and training materials to enable teams to operate production systems effectively
• Lead cross-functional efforts to improve deployment safety through progressive rollouts, automated testing, and rollback capabilities
• Implement security best practices and compliance controls for production environments handling sensitive defense data
• Build tooling and automation to reduce toil and improve operational efficiency for the engineering organization
• Participate in on-call rotations and serve as an escalation point for critical production incidents
Qualifications:
Required:
• 7+ years of engineering experience with at least 3+ years focused on SRE, production operations, or infrastructure engineering
• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
• Deep expertise with Kubernetes in production environments, including operational challenges at scale (100+ nodes)
• Strong programming skills in one or more languages such as Go, Python, Rust, or Java with ability to build production-grade tooling
• Proven experience designing and implementing observability stacks (metrics, logging, tracing) using tools like Prometheus, Grafana, ELK/EFK, or equivalent
• Hands-on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code practices
• Demonstrated ability to debug complex distributed systems issues across multiple layers of the stack
• Track record of improving system reliability through architectural changes, not just operational band-aids
• Strong incident management and communication skills, with experience leading responses to critical outages
• Must be a U.S. Person due to required access to U.S. export controlled information or facilities
• Eligible to obtain and maintain an active U.S. Secret security clearance
Preferred:
• Experience with defense, aerospace, or other mission-critical systems where downtime has severe consequences
• Expertise in performance optimization and capacity planning for high-throughput, low-latency systems
• Knowledge of chaos engineering principles and experience implementing resilience testing frameworks
• Experience with service mesh technologies (Istio, Linkerd) and advanced traffic management patterns
• Background in database operations and optimization (PostgreSQL, Cassandra, or similar at scale)
• Familiarity with CI/CD platforms and deployment automation (ArgoCD, FluxCD, Spinnaker, Jenkins)
• Understanding of networking fundamentals including load balancing, DNS, TLS/SSL, and network security
• Experience with configuration management and secrets management solutions (Vault, Sealed Secrets, SOPS)
• Strong written and verbal communication skills with ability to explain technical concepts to non-technical stakeholders
• Active Secret or higher security clearance
Company:
Anduril Industries is a defense technology company that specializes in developing advanced autonomous systems to enhance national security. Founded in 2017, the company is headquartered in Costa Mesa, USA, with a team of 1001-5000 employees. The company is currently Late Stage.

Anduril Industries logo

About Anduril Industries

Sourced by ZipRecruiter

Anduril Industries is a trailblazer in the technology industry based in Costa Mesa, CA, US. Founded in 2017 by Palmer Luckey, the creator of Oculus VR, the company focuses on developing innovative technology to equip and empower those in the defense sector. Its primary products include cutting-edge autonomous systems and AI software that assist in combating threats to national and global security. The mission of Anduril Industries is to integrate technology and defense by building transformative, scalable solutions that ensure a safer world.

Industry

Guided missile and space vehicle manufacturing

Company size

501 - 1,000 Employees

Headquarters location

Costa Mesa, CA, US

Year founded

2017

Social media