1

Chaos Engineering Jobs (NOW HIRING)

Staff Engineer

Englewood, CO · On-site

$164K/yr

Mentoring development teams on CI/CD best practices, Chaos Engineering principles, and application performance tuning. Skills, Experience and Requirements Requires Master's degree (or foreign ...

Chaos Testing * Define and drive the enterprise performance engineering strategy to ensure applications are scalable, resilient, and optimized for high availability. * Architect and lead end-to-end ...

Cloud Solution Architect

$65 - $89/hr

Chaos engineering experience. * Strong understanding of virtual networks and general network management functions. * Solid knowledge of concepts of designing and developing dynamic cloud solutions.

... Chaos Engineering- Tools (Chaos Monkey, Gremlin),Performance Testing -Emerging Tools (K6,Gatling),Performance Testing -Execution (Baseline, Load, Endurance, Stress, Volume, Network, DR, Failover ...

Preferred Qualifications: - Experience in reliability/chaos engineering or disaster recovery planning. - Familiarity with AWS Well-Architected Framework and resilient design patterns. - AWS Certified ...

next page

Showing results 1-20

Chaos Engineering information

See salary details

$46.5K

$146.9K

$174K

How much do chaos engineering jobs pay per year?

As of Jun 25, 2026, the average yearly pay for chaos engineering in the United States is $146,868.00, according to ZipRecruiter salary data. Most workers in this role earn between $116,500.00 and $173,000.00 per year, depending on experience, location, and employer.

What engineers make $500,000?

Senior engineers in specialized fields such as software engineering, cloud architecture, or cybersecurity can earn $500,000 or more annually, especially with extensive experience, advanced skills, and leadership roles. High compensation often includes bonuses, stock options, or profit sharing, particularly in large tech companies or startups with significant growth potential.

What does a chaos engineer do?

A chaos engineer designs and executes experiments to intentionally disrupt systems in order to identify vulnerabilities and improve resilience. They use tools like chaos engineering frameworks to simulate failures in production environments, helping teams build more reliable and fault-tolerant systems. Strong knowledge of system architecture, scripting, and monitoring is essential for this role.

What is a Chaos Engineering job?

A Chaos Engineering job involves proactively identifying weaknesses in complex systems by intentionally injecting failures and observing how they respond. Professionals in this role design and execute controlled experiments to improve system resilience, ensuring that services remain reliable under unexpected conditions. They work closely with development, operations, and security teams to enhance fault tolerance and incident response strategies.

What engineers make $300,000 a year?

Senior engineers in specialized fields such as software engineering, cloud engineering, or cybersecurity can earn $300,000 or more annually, especially with extensive experience, advanced skills, and certifications like AWS or CISSP. Roles in high-demand industries or leadership positions may also reach this compensation level.

What are the key skills and qualifications needed to thrive in the Chaos Engineering position, and why are they important?

To thrive in Chaos Engineering, a strong background in software engineering, distributed systems, and reliability testing is essential, often supported by a degree in computer science or a related field. Familiarity with chaos engineering tools like Gremlin or Chaos Monkey and experience with cloud platforms, container orchestration, and monitoring systems are highly valued. Excellent problem-solving abilities, communication skills, and a mindset oriented toward experimentation help engineers collaborate effectively and analyze complex failure modes. These skills are crucial for proactively identifying system weaknesses and ensuring the resilience of large-scale technology infrastructures.

What are some typical challenges a Chaos Engineer faces, and how do they overcome them?

Chaos Engineers often face the challenge of designing effective experiments that simulate real-world failures without disrupting production systems. Balancing the need to discover vulnerabilities with maintaining uptime requires careful planning, communication, and coordination with development and operations teams. They address these challenges by thoroughly testing in controlled environments, documenting procedures, and establishing clear rollback strategies. Continuous learning and cross-functional collaboration are also key to staying ahead of new complexities in evolving systems.

Is chaos engineering still used today?

Chaos engineering is actively used in the industry to improve system resilience by intentionally introducing failures and testing responses. Professionals in this field often utilize tools like Chaos Monkey and follow practices such as continuous testing to identify vulnerabilities in complex systems.
More about Chaos Engineering jobs
What cities are hiring for Chaos Engineering jobs? Cities with the most Chaos Engineering job openings:
What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:
What states have the most Chaos Engineering jobs? States with the most job openings for Chaos Engineering jobs include:
Infographic showing various Chaos Engineering job openings in the United States as of June 2026, with employment types broken down into 1% Locum Tenens, 16% Full Time, 74% Part Time, and 9% Nights. Highlights an 88% Physical, 3% Hybrid, and 9% Remote job distribution, with an average salary of $146,868 per year, or $70.6 per hour.
Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Seattle, WA • On-site

$64.75 - $86.25/hr

Full-time

This job post has expired today. Applications are no longer accepted.


Anduril rating

9.4

Company rating: 9.4 out of 10

Based on 7 frontline employees who took The Breakroom Quiz


Job description

Job Summary:
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. They are seeking an experienced Senior Site Reliability Engineer to build resilient, highly available systems that scale to meet the demands of their core systems powering Lattice.
Responsibilities:
• Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform
• Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues
• Build and maintain infrastructure automation using tools like Terraform, Kubernetes operators, and custom tooling to manage large-scale distributed systems
• Establish and track Service Level Objectives (SLOs) and Error Budgets to balance feature velocity with system reliability
• Partner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineering
• Develop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demands
• Create runbooks, documentation, and training materials to enable teams to operate production systems effectively
• Lead cross-functional efforts to improve deployment safety through progressive rollouts, automated testing, and rollback capabilities
• Implement security best practices and compliance controls for production environments handling sensitive defense data
• Build tooling and automation to reduce toil and improve operational efficiency for the engineering organization
• Participate in on-call rotations and serve as an escalation point for critical production incidents
Qualifications:
Required:
• 7+ years of engineering experience with at least 3+ years focused on SRE, production operations, or infrastructure engineering
• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
• Deep expertise with Kubernetes in production environments, including operational challenges at scale (100+ nodes)
• Strong programming skills in one or more languages such as Go, Python, Rust, or Java with ability to build production-grade tooling
• Proven experience designing and implementing observability stacks (metrics, logging, tracing) using tools like Prometheus, Grafana, ELK/EFK, or equivalent
• Hands-on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code practices
• Demonstrated ability to debug complex distributed systems issues across multiple layers of the stack
• Track record of improving system reliability through architectural changes, not just operational band-aids
• Strong incident management and communication skills, with experience leading responses to critical outages
• Must be a U.S. Person due to required access to U.S. export controlled information or facilities
• Eligible to obtain and maintain an active U.S. Secret security clearance
Preferred:
• Experience with defense, aerospace, or other mission-critical systems where downtime has severe consequences
• Expertise in performance optimization and capacity planning for high-throughput, low-latency systems
• Knowledge of chaos engineering principles and experience implementing resilience testing frameworks
• Experience with service mesh technologies (Istio, Linkerd) and advanced traffic management patterns
• Background in database operations and optimization (PostgreSQL, Cassandra, or similar at scale)
• Familiarity with CI/CD platforms and deployment automation (ArgoCD, FluxCD, Spinnaker, Jenkins)
• Understanding of networking fundamentals including load balancing, DNS, TLS/SSL, and network security
• Experience with configuration management and secrets management solutions (Vault, Sealed Secrets, SOPS)
• Strong written and verbal communication skills with ability to explain technical concepts to non-technical stakeholders
• Active Secret or higher security clearance
Company:
Anduril Industries is a defense technology company that specializes in developing advanced autonomous systems to enhance national security. Founded in 2017, the company is headquartered in Costa Mesa, USA, with a team of 1001-5000 employees. The company is currently Late Stage.

Anduril Industries logo

About Anduril Industries

Sourced by ZipRecruiter

Anduril Industries is a trailblazer in the technology industry based in Costa Mesa, CA, US. Founded in 2017 by Palmer Luckey, the creator of Oculus VR, the company focuses on developing innovative technology to equip and empower those in the defense sector. Its primary products include cutting-edge autonomous systems and AI software that assist in combating threats to national and global security. The mission of Anduril Industries is to integrate technology and defense by building transformative, scalable solutions that ensure a safer world.

Industry

Guided missile and space vehicle manufacturing

Company size

501 - 1,000 Employees

Headquarters location

Costa Mesa, CA, US

Year founded

2017

Social media