2

Remote Chaos Engineering Jobs (NOW HIRING)

Staff AI Architect, Remote

Charleston, WV · Remote

$58.25 - $76.75/hr

Experience establishing SRE practices including SLO definition, error budgets, runbooks, chaos ... Flexible work environment, ability to work remote, hybrid or in-office * Flexible time off ...

Pre/Post Sales Solutions Architect

$64.50 - $85/hr

As the industry leader in Chaos Engineering and reliability testing, we work with hundreds of the ... But as a remote company, teamwork and collaboration won't happen by accident. We approach every ...

Experience establishing SRE practices including SLO definition, error budgets, runbooks, chaos ... Flexible work environment, ability to work remote, hybrid or in-office * Flexible time off ...

$58.25 - $76.75/hr

Experience establishing SRE practices including SLO definition, error budgets, runbooks, chaos ... Flexible work environment, ability to work remote, hybrid or in-office * Flexible time off ...

Champion proactive reliability - chaos engineering, game days, failure-mode analysis, capacity and ... remote on-call teams. * Demonstrated ownership of reliability outcomes for customer-facing SaaS at ...

... remote global workforce. If you are passionate about working on business problems that can be ... chaos engineering, performance engineering, toil reduction, reliability engineering etc

Chaos engineering and fault injection (AWS FIS or similar); * Observability tooling experience ... Flextime: Flexible schedule with remote and office options. Meet Our Recruitment Process:

DevOps Engineer

$54 - $74/hr

Knowledge of chaos engineering and resilience testing * Exposure to AI/ML services (SageMaker ... Remote work environment with emphasis on team collaboration and hands-on problem solving Pay Range ...

DevOps Engineer

$52.75 - $72.25/hr

Knowledge of chaos engineering and resilience testing * Exposure to AI/ML services (SageMaker ... Remote work environment with emphasis on team collaboration and hands-on problem solving Pay Range ...

DevOps Engineer

$52.75 - $72.25/hr

Knowledge of chaos engineering and resilience testing * Exposure to AI/ML services (SageMaker ... Remote work environment with emphasis on team collaboration and hands-on problem solving Pay Range ...

Sr. Staff DevOps Engineer - Federal

$133K - $170K/yr

Experience with Chaos Engineering methodologies in a public cloud environment #LI-Remote #LI-YC2 Zscaler's salary ranges are benchmarked and are determined by role and level. The range displayed on ...

... remote state, and GitOps-based plan/apply pipelines - no unmanaged resources * Audit the cloud ... one chaos engineering exercise (AWS FIS or equivalent) * Partner with engineering teams to ...

next page

Showing results 1-20

Remote Chaos Engineering information

See salary details

$73K

$194.7K

$254K

How much do remote chaos engineering jobs pay per year?

As of Jun 30, 2026, the average yearly pay for remote chaos engineering in the United States is $194,709.00, according to ZipRecruiter salary data. Most workers in this role earn between $141,500.00 and $253,000.00 per year, depending on experience, location, and employer.

What is Remote Chaos Engineering?

Remote Chaos Engineering is the practice of testing distributed systems' resilience by intentionally introducing failures and disruptions in remote or cloud environments. The goal is to identify weaknesses and improve system reliability by simulating real-world incidents, such as network outages or server crashes, in a controlled manner. This approach helps teams understand how their applications behave under stress and develop strategies to mitigate future incidents. Remote Chaos Engineering is particularly valuable for organizations leveraging cloud infrastructure and remote services, ensuring robust performance even under unexpected conditions.

What are some common challenges faced by professionals working in remote chaos engineering roles?

Professionals in remote chaos engineering often encounter challenges such as coordinating experiments across distributed teams, ensuring clear communication about system vulnerabilities, and managing the complexity of large-scale systems without direct, on-site access. Establishing robust monitoring and rollback procedures is essential to minimize risk during remote testing. Additionally, building trust with development and operations teams is key, as chaos engineering often involves intentionally introducing failures to improve system resilience.

What is the least stressful remote job?

Remote Chaos Engineering roles typically involve analyzing system resilience and running experiments to improve infrastructure stability, often requiring strong problem-solving skills and familiarity with cloud platforms. These positions can be less stressful when they have clear protocols, manageable workloads, and minimal on-call responsibilities. Overall, jobs with predictable schedules and low urgency tend to be less stressful in remote technical roles.

What are the key skills and qualifications needed to thrive as a Remote Chaos Engineer, and why are they important?

To thrive as a Remote Chaos Engineer, you need a strong background in software engineering, systems architecture, and site reliability, often supported by a degree in computer science or a related field. Familiarity with chaos engineering platforms (such as Gremlin or Chaos Monkey), cloud environments (AWS, Azure, GCP), and automation tools is typically required. Strong problem-solving abilities, clear communication, and a collaborative mindset help you effectively identify weaknesses and drive reliability improvements across distributed teams. These skills are crucial for proactively uncovering system vulnerabilities, ensuring system resilience, and maintaining high availability in complex, remote-first infrastructures.

Is it possible to work remotely as an engineer?

Remote chaos engineering roles are common in the tech industry, allowing engineers to work from anywhere with a reliable internet connection. These positions often require familiarity with cloud platforms, scripting, and monitoring tools, and may involve collaboration across time zones.

What engineers make $500,000?

Senior engineers in specialized fields such as software engineering, cloud infrastructure, or cybersecurity can earn $500,000 or more annually, especially with extensive experience, advanced skills, and leadership roles. High compensation often includes base salary, bonuses, and stock options, particularly in large tech companies or startups with significant growth potential.

What is the difference between Remote Chaos Engineering vs Remote Site Reliability Engineer?

AspectRemote Chaos EngineeringRemote Site Reliability Engineer
Primary FocusDesigning and executing chaos experiments to improve system resilienceEnsuring system reliability, availability, and performance through monitoring and automation
Skills & CertificationsKnowledge of chaos engineering tools, scripting, cloud platformsMonitoring tools, scripting, cloud infrastructure, SRE certifications
Work EnvironmentCollaborates with development and operations teams, often in DevOps cultureWorks closely with engineering teams to maintain system health and SLAs

While both roles focus on system stability, Remote Chaos Engineering specializes in testing system resilience through chaos experiments, whereas Remote Site Reliability Engineers focus on maintaining overall system reliability and performance. Both roles require scripting skills and cloud knowledge, but their core objectives differ: one proactively tests, the other maintains system health.

Is chaos engineering still used today?

Chaos engineering is actively used in modern IT environments to improve system resilience by intentionally introducing failures and testing responses. Professionals in roles like remote chaos engineering often utilize tools such as Chaos Monkey and conduct experiments in cloud or distributed systems to identify weaknesses before outages occur.
More about Remote Chaos Engineering jobs
What cities are hiring for Remote Chaos Engineering jobs? Cities with the most Remote Chaos Engineering job openings:
What are the most commonly searched types of Chaos Engineering jobs? The most popular types of Chaos Engineering jobs are:
What states have the most Remote Chaos Engineering jobs? States with the most job openings for Remote Chaos Engineering jobs include:
What job categories do people searching Remote Chaos Engineering jobs look for? The top searched job categories for Remote Chaos Engineering jobs are:
Infographic showing various Remote Chaos Engineering job openings in the United States as of June 2026, with employment types broken down into 99% Full Time, and 1% Part Time. Highlights an 37% Physical, 3% Hybrid, and 60% Remote job distribution, with an average salary of $194,709 per year, or $93.6 per hour.

Site Reliability Engineer (SRE)

Bright Vision Technologies

Celina, TX • Remote

$51.25 - $68/hr

Full-time

Posted 6 days ago


Job description

Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We leverage cutting-edge technologies to create scalable, secure, and user-friendly applications.
As we continue to grow, we’re looking for a skilled Site Reliability Engineer (SRE) to join our dynamic team and contribute to our mission of transforming business processes through technology.
This is a fantastic opportunity to join an established and well-respected organization offering tremendous career growth potential.
 Site Reliability Engineer (SRE)Job Title: Site Reliability Engineer (SRE)
Location: 100% Remote (Continental United States)
Position Type: In-house Bright Vision Technologies SOW engagement (no third-party client or vendor)
Experience: 5+ years
Salary: 100k - 150k
Sponsorship: No new H1B sponsorship available. H1B transfers welcomed for qualified candidates.
Employment Type: Full-time, direct W2 with Bright Vision Technologies (no C2C, no 1099, no third-party)
Engagement: Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap
Compensation: Competitive base salary commensurate with experience, plus benefits.
Employment Terms & Visa Policy
This is a 100% remote, full-time, direct W2 position with Bright Vision Technologies.
This role is part of Bright Vision Technologies’ in-house Statement of Work (SOW) engagement. The client, end customer, and employer for this position is Bright Vision Technologies — there is no third-party client, vendor, or implementation partner involved.
We do not engage in C2C, 1099, or third-party arrangements for this role.
BUT STRICTLY NO C2C/1099/3RD PARTY COMPANIES. ALL OUR ROLES ARE W2 AND NO 3RD PARTY BROKERING PLEASE.
Candidates must be willing to work directly as a full-time W2 employee of Bright Vision Technologies and contribute to our in-house SOW deliverables.
No new H1B sponsorship is available for this role.
However, candidates who are currently on a valid H1B visa and require a transfer are welcome to apply. We will support H1B transfers for qualified candidates.
For every role, a technical coding assessment is mandatory. Please apply only if you are confident in your technical abilities and hands-on experience.
Job Summary
We are seeking an experienced Site Reliability Engineer to ensure the availability, performance, and operational excellence of large-scale distributed systems in production. As an SRE you will live at the boundary between development and operations, applying strong software engineering principles to infrastructure and operations problems, and continually pushing the platform toward higher reliability with lower operational toil. The ideal candidate will combine deep systems knowledge with strong programming skills, a measurement-driven mindset, and the discipline to design, automate, and operate complex services so that reliability becomes a first-class engineering deliverable rather than a reactive concern.
Key Responsibilities
  • Define, instrument, and continually refine service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services, and use those measures to drive concrete engineering and prioritization decisions.
  • Lead incident response and resolution for production issues, acting as a calm and effective incident commander when needed, and ensuring high-quality post-incident reviews that drive lasting improvements.
  • Design and implement comprehensive monitoring, logging, and tracing strategies using Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar tooling so that operators have rich, actionable visibility into system behavior.
  • Build and maintain robust on-call processes, runbooks, and escalation paths that reduce mean time to detect and mean time to resolve while protecting the well-being of the engineers on rotation.
  • Automate operational toil aggressively by writing production-grade tooling in Python, Go, Bash, or similar languages, replacing manual workflows with reliable, auditable automation.
  • Architect and operate large-scale Kubernetes clusters and container-based workloads, including autoscaling, capacity planning, network policy, and integration with service meshes.
  • Design CI/CD pipelines that promote safe, frequent, and observable releases, supported by automated testing, canary deployments, feature flags, and progressive rollout strategies.
  • Lead capacity planning and performance engineering activities, building models that predict growth and stress, and validating those models through load testing and chaos experiments.
  • Partner closely with application development teams to embed reliability practices early in design — including failure-mode analyses, graceful degradation patterns, and dependency hardening.
  • Strengthen the platform’s resiliency through chaos engineering, fault injection, dependency isolation, retries, timeouts, circuit breakers, and well-tested failover paths.
  • Drive continuous improvement of security posture in collaboration with security teams, including patch management, vulnerability remediation, and secure-by-default platform defaults.
  • Contribute to the technical roadmap for reliability tooling, observability platforms, and developer-experience improvements that reduce friction and improve outcomes for engineering teams.
  • Mentor engineers across the organization on SRE practices and foster a strong, blameless culture of operational excellence.

Required Qualifications
  • Bachelor’s degree in Computer Science, Engineering, or a related technical discipline.
  • Five or more years of SRE, DevOps, or production engineering experience supporting large-scale distributed systems.
  • Strong programming skills in at least one of Python, Go, or Java, with the ability to build robust automation and tooling.
  • Deep, hands-on experience operating Linux at scale, including networking, performance tuning, and systems-level troubleshooting.
  • Production experience operating Kubernetes and container-based workloads.
  • Strong working knowledge of observability tooling such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, or commercial equivalents.
  • Hands-on experience designing and operating CI/CD pipelines for both infrastructure and applications.
  • Solid understanding of distributed system design, including consistency models, partitioning, and failure semantics.
  • Demonstrated experience leading incident response and conducting effective post-incident reviews.
  • Excellent communication and documentation skills.

Preferred Qualifications
  • Experience defining and operationalizing SLOs and error budgets in real production environments.
  • Exposure to chaos engineering practices and tools such as Chaos Monkey, Gremlin, or Litmus.
  • Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP).
  • Background in capacity planning, performance engineering, or large-scale load testing.
  • Familiarity with service mesh technologies such as Istio, Linkerd, or Consul.

How to Apply
Would you like to know more about this opportunity?
For immediate consideration, please send your resume to harry@bvteck.com or contact us at (908) 676-4399. Learn more about Bright Vision Technologies at www.bvteck.com.
We recognize that our people are our strength, and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company.
We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs.
Bright Vision Technologies is an Equal Opportunity Employer, including Disability/Veterans.
Position offered by “No Fee Agency.”
 

Equal Employment Opportunity (EEO) Statement

Bright Vision Technologies (BV Teck) is committed to equal employment opportunity (EEO) for all employees and applicants without regard to race, color, religion, sex, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status, or any other protected status as defined by applicable federal, state, or local laws. This commitment extends to all aspects of employment, including recruitment, hiring, training, compensation, promotion, transfer, leaves of absence, termination, layoffs, and recall.

BV Teck expressly prohibits any form of workplace harassment or discrimination. Any improper interference with employees\' ability to perform their job duties may result in disciplinary action up to and including termination of employment.