Chaos Engineering Jobs (NOW HIRING)

Senior Site Reliability Engineer, Production Engineering

$64.75 - $86.25/hr

... chaos engineering • Develop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demands • Create runbooks, documentation, and ...

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Marriott

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · Hybrid

The right candidate is fluent in cloud-native resiliency patterns, multi-region architectures, chaos engineering, and modern recovery automation, and is equally comfortable in an architecture review ...

Marriott

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · Hybrid

EchoStar

Staff Engineer

Englewood, CO · On-site

$164K/yr

Mentoring development teams on CI/CD best practices, Chaos Engineering principles, and application performance tuning. Skills, Experience and Requirements Requires Master's degree (or foreign ...

EchoStar

Staff Engineer

Englewood, CO · On-site

$164K/yr

Mentoring development teams on CI/CD best practices, Chaos Engineering principles, and application performance tuning. Skills, Experience and Requirements Requires Master's degree (or foreign ...

Sage IT Inc

Performance Test Lead

Charlotte, NC · On-site

Chaos Testing * Define and drive the enterprise performance engineering strategy to ensure applications are scalable, resilient, and optimized for high availability. * Architect and lead end-to-end ...

Quick apply

Sage IT Inc

Performance Test Lead

Charlotte, NC · On-site

Nvidia

Senior Software Engineer, Resilience Engineering - DGX Cloud

Santa Clara, CA

$143K - $189K/yr

Implement chaos engineering, failure injection, and resilience testing to elevate our team's standard practices. * Improve standards by setting an example with your hands-on experience and leadership.

New

Nvidia

Senior Software Engineer, Resilience Engineering - DGX Cloud

Santa Clara, CA

$143K - $189K/yr

New

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Marriott

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · Hybrid

Marriott

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · Hybrid

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Costa Mesa, CA · On-site

$61.25 - $81.25/hr

Partner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineering * Develop capacity ...

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Costa Mesa, CA · On-site

$61.25 - $81.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Seattle, WA · On-site

$64.75 - $86.25/hr

Apex Informatics

Cloud Solution Architect

$65 - $89/hr

Chaos engineering experience. * Strong understanding of virtual networks and general network management functions. * Solid knowledge of concepts of designing and developing dynamic cloud solutions.

Apex Informatics

Cloud Solution Architect

$65 - $89/hr

Realtech Services

Hiring: Performance Test Lead at Charlotte, NC

Charlotte, NC · On-site

Quick apply

Realtech Services

Hiring: Performance Test Lead at Charlotte, NC

Charlotte, NC · On-site

Marriott International

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · On-site

Marriott International

Director, Architect Enterprise Resilience & Recoverability

Bethesda, MD · On-site

Futran Tech Solutions Pvt. Ltd.

QA and Performance Test Engineer

Mooresville, AL · On-site

MySQL, InfluxDB, Oracle Understanding and experience in Chaos Engineering, SRE principles. Retail domain with GCP cloud performance engineering experience. Experience with performance testing/tuning ...

Futran Tech Solutions Pvt. Ltd.

QA and Performance Test Engineer

Mooresville, AL · On-site

Staffingine LLC

Performance test Lead

Danbury, CT · On-site

... Chaos Engineering- Tools (Chaos Monkey, Gremlin),Performance Testing -Emerging Tools (K6,Gatling),Performance Testing -Execution (Baseline, Load, Endurance, Stress, Volume, Network, DR, Failover ...

Quick apply

Staffingine LLC

Performance test Lead

Danbury, CT · On-site

Mastech Digital

Senior AWS Cloud Architect

Alpharetta, GA · On-site

$120K/yr

Preferred Qualifications: - Experience in reliability/chaos engineering or disaster recovery planning. - Familiarity with AWS Well-Architected Framework and resilient design patterns. - AWS Certified ...

Quick apply

Mastech Digital

Senior AWS Cloud Architect

Alpharetta, GA · On-site

$120K/yr

State Farm

REMOTE - AI Engineering Manager (Databricks)

Chaos engineering. • Communication: Technical teaching. Influence without authority. Clear written communication. Stakeholder management. Preferred : • Developer platforms or CLI tools. • DORA ...

State Farm

REMOTE - AI Engineering Manager (Databricks)

State Farm

REMOTE - AI Engineering Manager (Databricks)

State Farm

REMOTE - AI Engineering Manager (Databricks)

Request Technology, LLC

Lead DevOps Software Engineer

Chicago, IL · On-site

$54.50 - $74.50/hr

... Chaos engineering principles and tooling (e.g., Chaos Monkey, Gremlin, LitmusChaos) · Fluent with different data formats and structures: JSON, Protobuf, Avro · SQL and NoSQL databases, in-memory ...

Request Technology, LLC

Lead DevOps Software Engineer

Chicago, IL · On-site

$54.50 - $74.50/hr

Showing results 1-20

Chaos Engineering Jobs

Chaos Engineering information

See salary details

$46.5K

$146.9K

$174K

How much do chaos engineering jobs pay per year?

As of Jun 25, 2026, the average yearly pay for chaos engineering in the United States is $146,868.00, according to ZipRecruiter salary data. Most workers in this role earn between $116,500.00 and $173,000.00 per year, depending on experience, location, and employer.

What engineers make $500,000?

Senior engineers in specialized fields such as software engineering, cloud architecture, or cybersecurity can earn $500,000 or more annually, especially with extensive experience, advanced skills, and leadership roles. High compensation often includes bonuses, stock options, or profit sharing, particularly in large tech companies or startups with significant growth potential.

What does a chaos engineer do?

A chaos engineer designs and executes experiments to intentionally disrupt systems in order to identify vulnerabilities and improve resilience. They use tools like chaos engineering frameworks to simulate failures in production environments, helping teams build more reliable and fault-tolerant systems. Strong knowledge of system architecture, scripting, and monitoring is essential for this role.

What is a Chaos Engineering job?

A Chaos Engineering job involves proactively identifying weaknesses in complex systems by intentionally injecting failures and observing how they respond. Professionals in this role design and execute controlled experiments to improve system resilience, ensuring that services remain reliable under unexpected conditions. They work closely with development, operations, and security teams to enhance fault tolerance and incident response strategies.

What engineers make $300,000 a year?

Senior engineers in specialized fields such as software engineering, cloud engineering, or cybersecurity can earn $300,000 or more annually, especially with extensive experience, advanced skills, and certifications like AWS or CISSP. Roles in high-demand industries or leadership positions may also reach this compensation level.

What are the key skills and qualifications needed to thrive in the Chaos Engineering position, and why are they important?

To thrive in Chaos Engineering, a strong background in software engineering, distributed systems, and reliability testing is essential, often supported by a degree in computer science or a related field. Familiarity with chaos engineering tools like Gremlin or Chaos Monkey and experience with cloud platforms, container orchestration, and monitoring systems are highly valued. Excellent problem-solving abilities, communication skills, and a mindset oriented toward experimentation help engineers collaborate effectively and analyze complex failure modes. These skills are crucial for proactively identifying system weaknesses and ensuring the resilience of large-scale technology infrastructures.

What are some typical challenges a Chaos Engineer faces, and how do they overcome them?

Chaos Engineers often face the challenge of designing effective experiments that simulate real-world failures without disrupting production systems. Balancing the need to discover vulnerabilities with maintaining uptime requires careful planning, communication, and coordination with development and operations teams. They address these challenges by thoroughly testing in controlled environments, documenting procedures, and establishing clear rollback strategies. Continuous learning and cross-functional collaboration are also key to staying ahead of new complexities in evolving systems.

Is chaos engineering still used today?

Chaos engineering is actively used in the industry to improve system resilience by intentionally introducing failures and testing responses. Professionals in this field often utilize tools like Chaos Monkey and follow practices such as continuous testing to identify vulnerabilities in complex systems.

More about Chaos Engineering jobs

The 10 Top Types Of Chaos Engineering Jobs

What cities are hiring for Chaos Engineering jobs? Cities with the most Chaos Engineering job openings:

What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:

What states have the most Chaos Engineering jobs? States with the most job openings for Chaos Engineering jobs include:

What job categories do people searching Chaos Engineering jobs look for? The top searched job categories for Chaos Engineering jobs are:

Chaos Engineering jobs near you

Infographic showing various Chaos Engineering job openings in the United States as of June 2026, with employment types broken down into 1% Locum Tenens, 16% Full Time, 74% Part Time, and 9% Nights. Highlights an 88% Physical, 3% Hybrid, and 9% Remote job distribution, with an average salary of $146,868 per year, or $70.6 per hour.

Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Seattle, WA • On-site

Apply

$64.75 - $86.25/hr

Full-time

This job post has expired today. Applications are no longer accepted.

Anduril rating

9.4

Based on 7 frontline employees who took The Breakroom Quiz

Job description

Job Summary:
Anduril Industries is a defense technology company with a mission to transform U.S. and allied military capabilities with advanced technology. They are seeking an experienced Senior Site Reliability Engineer to build resilient, highly available systems that scale to meet the demands of their core systems powering Lattice.
Responsibilities:
• Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform
• Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues
• Build and maintain infrastructure automation using tools like Terraform, Kubernetes operators, and custom tooling to manage large-scale distributed systems
• Establish and track Service Level Objectives (SLOs) and Error Budgets to balance feature velocity with system reliability
• Partner with software engineering teams to improve system architecture for reliability, implementing patterns like circuit breakers, graceful degradation, and chaos engineering
• Develop capacity planning models and performance testing frameworks to ensure systems can handle growth and peak operational demands
• Create runbooks, documentation, and training materials to enable teams to operate production systems effectively
• Lead cross-functional efforts to improve deployment safety through progressive rollouts, automated testing, and rollback capabilities
• Implement security best practices and compliance controls for production environments handling sensitive defense data
• Build tooling and automation to reduce toil and improve operational efficiency for the engineering organization
• Participate in on-call rotations and serve as an escalation point for critical production incidents
Qualifications:
Required:
• 7+ years of engineering experience with at least 3+ years focused on SRE, production operations, or infrastructure engineering
• Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
• Deep expertise with Kubernetes in production environments, including operational challenges at scale (100+ nodes)
• Strong programming skills in one or more languages such as Go, Python, Rust, or Java with ability to build production-grade tooling
• Proven experience designing and implementing observability stacks (metrics, logging, tracing) using tools like Prometheus, Grafana, ELK/EFK, or equivalent
• Hands-on experience with cloud platforms (AWS, Azure, or GCP) and infrastructure as code practices
• Demonstrated ability to debug complex distributed systems issues across multiple layers of the stack
• Track record of improving system reliability through architectural changes, not just operational band-aids
• Strong incident management and communication skills, with experience leading responses to critical outages
• Must be a U.S. Person due to required access to U.S. export controlled information or facilities
• Eligible to obtain and maintain an active U.S. Secret security clearance
Preferred:
• Experience with defense, aerospace, or other mission-critical systems where downtime has severe consequences
• Expertise in performance optimization and capacity planning for high-throughput, low-latency systems
• Knowledge of chaos engineering principles and experience implementing resilience testing frameworks
• Experience with service mesh technologies (Istio, Linkerd) and advanced traffic management patterns
• Background in database operations and optimization (PostgreSQL, Cassandra, or similar at scale)
• Familiarity with CI/CD platforms and deployment automation (ArgoCD, FluxCD, Spinnaker, Jenkins)
• Understanding of networking fundamentals including load balancing, DNS, TLS/SSL, and network security
• Experience with configuration management and secrets management solutions (Vault, Sealed Secrets, SOPS)
• Strong written and verbal communication skills with ability to explain technical concepts to non-technical stakeholders
• Active Secret or higher security clearance
Company:
Anduril Industries is a defense technology company that specializes in developing advanced autonomous systems to enhance national security. Founded in 2017, the company is headquartered in Costa Mesa, USA, with a team of 1001-5000 employees. The company is currently Late Stage.

About Anduril Industries

Sourced by ZipRecruiter

Anduril Industries is a trailblazer in the technology industry based in Costa Mesa, CA, US. Founded in 2017 by Palmer Luckey, the creator of Oculus VR, the company focuses on developing innovative technology to equip and empower those in the defense sector. Its primary products include cutting-edge autonomous systems and AI software that assist in combating threats to national and global security. The mission of Anduril Industries is to integrate technology and defense by building transformative, scalable solutions that ensure a safer world.

Industry

Guided missile and space vehicle manufacturing

Company size

501 - 1,000 Employees

Headquarters location

Costa Mesa, CA, US

Year founded

2017

Website

anduril.com

Social media

View All Anduril Industries Jobs

Apply

Chaos Engineering Jobs (NOW HIRING)

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Director, Architect Enterprise Resilience & Recoverability

Director, Architect Enterprise Resilience & Recoverability

Staff Engineer

Staff Engineer

Performance Test Lead

Performance Test Lead

Senior Software Engineer, Resilience Engineering - DGX Cloud

Senior Software Engineer, Resilience Engineering - DGX Cloud

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Director, Architect Enterprise Resilience & Recoverability

Director, Architect Enterprise Resilience & Recoverability

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Senior Site Reliability Engineer, Production Engineering

Cloud Solution Architect

Cloud Solution Architect

Hiring: Performance Test Lead at Charlotte, NC

Hiring: Performance Test Lead at Charlotte, NC

Director, Architect Enterprise Resilience & Recoverability

Director, Architect Enterprise Resilience & Recoverability

QA and Performance Test Engineer

QA and Performance Test Engineer

Performance test Lead

Performance test Lead

Senior AWS Cloud Architect

Senior AWS Cloud Architect

REMOTE - AI Engineering Manager (Databricks)

REMOTE - AI Engineering Manager (Databricks)

REMOTE - AI Engineering Manager (Databricks)

REMOTE - AI Engineering Manager (Databricks)

Lead DevOps Software Engineer

Lead DevOps Software Engineer

Chaos Engineering information

See salary details

How much do chaos engineering jobs pay per year?

What engineers make $500,000?

What does a chaos engineer do?

What is a Chaos Engineering job?

What engineers make $300,000 a year?

What are the key skills and qualifications needed to thrive in the Chaos Engineering position, and why are they important?

What are some typical challenges a Chaos Engineer faces, and how do they overcome them?

Is chaos engineering still used today?

Senior Site Reliability Engineer, Production Engineering

Share this job

Anduril rating

Get the real story on frontline employers

Job description

About Anduril Industries

Industry

Company size

Headquarters location

Year founded

Website

Social media

Share this job