Design, deploy, and manage network infrastructure using Terraform or Ansible, moving the firm away ... direct hands-on competence where it isn't. * Incident Response: Serve as an escalation point for ...
Design, deploy, and manage network infrastructure using Terraform or Ansible, moving the firm away ... direct hands-on competence where it isn't. * Incident Response: Serve as an escalation point for ...
The Associate Director leads Maintenance Managers at each site and has direct responsibility for Reliability Engineers. This role works in close partnership with Instrumentation and Control Systems ...
The Associate Director leads Maintenance Managers at each site and has direct responsibility for Reliability Engineers. This role works in close partnership with Instrumentation and Control Systems ...
The Associate Director leads Maintenance Managers at each site and has direct responsibility for Reliability Engineers. This role works in close partnership with Instrumentation and Control Systems ...
The Associate Director leads Maintenance Managers at each site and has direct responsibility for Reliability Engineers. This role works in close partnership with Instrumentation and Control Systems ...
Senior Reliability Coordinator
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
Senior Reliability Coordinator
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
Senior Reliability Coordinator
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
Senior Reliability Coordinator
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
Senior Reliability Coordinator
Carmel, IN · On-site
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
Quick apply
Senior Reliability Coordinator
Carmel, IN · On-site
$138K - $148K/yr
Senior Reliability Coordinator Direct Hire Carmel, IN RELOCATION IS AVAILABLE PLEASE NOTE VISA ... Our client manages the electricity superhighway in the Central U.S. Through use of groundbreaking ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
... Manager, this position is responsible for leading and directing maintenance resources to the ... Identify, implement, and document maintenance and reliability best practices. * Identify and ...
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
Director of Maintenance
Terre Haute, IN · On-site
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
Quick apply
Director of Maintenance
Terre Haute, IN · On-site
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
Director of Maintenance
Terre Haute, IN · On-site
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
Director of Maintenance
Terre Haute, IN · On-site
The Director of Maintenance is responsible for developing, leading, and executing the plant ... equipment reliability and reduce downtime. * Manage maintenance budgets, capital projects ...
... and reliability * Manage freight forwarders, carriers, brokers, and FTZ operators, including ... Four employee-led and self-directed Business Resource Groups; Paid volunteer day annually;
... and reliability * Manage freight forwarders, carriers, brokers, and FTZ operators, including ... Four employee-led and self-directed Business Resource Groups; Paid volunteer day annually;
... reliability. • Develop, manage, and continuously improve dashboards, KPIs, metrics, and reports ... direct supervision. Company : We're a medicine company turning science into healing to make life ...
... reliability. • Develop, manage, and continuously improve dashboards, KPIs, metrics, and reports ... direct supervision. Company : We're a medicine company turning science into healing to make life ...
The Patient Safety Dir will develop a comprehensive patient safety plan which will include ... The program will include the promotion of standard leader reliability skills and employee universal ...
The Patient Safety Dir will develop a comprehensive patient safety plan which will include ... The program will include the promotion of standard leader reliability skills and employee universal ...
$170K - $200K/yr
... manage massive ecosystems of digital evidence, process electronic records, and drive real-time ... Lives depend on the reliability of our systems. Your primary mandate is uncompromising engineering ...
$170K - $200K/yr
... manage massive ecosystems of digital evidence, process electronic records, and drive real-time ... Lives depend on the reliability of our systems. Your primary mandate is uncompromising engineering ...
... directed Preferred Background and Experience: * Career experience with knowledge of (management of ... Basic understanding of the concepts of "reliability minded" approaches to asset uptime and ...
... directed Preferred Background and Experience: * Career experience with knowledge of (management of ... Basic understanding of the concepts of "reliability minded" approaches to asset uptime and ...
... the reliability, scalability, and continuous improvement of Nexstar's 24/7 master control ... Direct all employee workflow management activities within Broadcast Hub Services, including hiring ...
... the reliability, scalability, and continuous improvement of Nexstar's 24/7 master control ... Direct all employee workflow management activities within Broadcast Hub Services, including hiring ...
The Moove Reliability Services Department is aligned with the goals and objectives of our sales and ... directed Preferred Background and Experience: * Career experience with knowledge of (management of ...
Quick apply
The Moove Reliability Services Department is aligned with the goals and objectives of our sales and ... directed Preferred Background and Experience: * Career experience with knowledge of (management of ...
... the reliability, scalability, and continuous improvement of Nexstar's 24/7 master control ... Direct all employee workflow management activities within Broadcast Hub Services, including hiring ...
... the reliability, scalability, and continuous improvement of Nexstar's 24/7 master control ... Direct all employee workflow management activities within Broadcast Hub Services, including hiring ...
Director Reliability Manager information
What is the difference between Director Reliability Manager vs Reliability Engineer?
| Aspect | Director Reliability Manager | Reliability Engineer |
|---|---|---|
| Credentials | Bachelor's or Master's in Engineering, certifications like CRC, CMRP | Bachelor's in Engineering or related field, certifications like CRC, CMRP |
| Work Environment | Leadership roles overseeing teams, strategic planning | Technical roles focused on analysis, testing, and troubleshooting |
| Industry Usage | Used in manufacturing, energy, aerospace for high-level reliability strategies | Common in manufacturing, maintenance, and engineering teams |
| Search & Comparison Intent | Understanding leadership responsibilities, strategic focus | Technical skills, daily tasks, and hands-on work |
The Director Reliability Manager typically oversees reliability strategies and manages teams, focusing on high-level planning and decision-making. Reliability Engineers are more involved in technical analysis, testing, and implementing reliability improvements. Both roles require similar credentials but differ in scope and responsibilities within organizations.
What are the key skills and qualifications needed to thrive as a Director Reliability Manager, and why are they important?
What are some common challenges faced by a Director Reliability Manager, and how are they typically addressed?
What does a Director Reliability Manager do?

Full-time
Medical, Dental, Vision, Life, Retirement
Posted 17 days ago
Group1001 rating
9.5
Based on 8 frontline employees who took The Breakroom Quiz
9th of 261 rated insurance
Job description
Why This Role Matters:
The Platform Engineering Services team at Group 1001 is building a Site Reliability Engineering practice with a network scope. We're hiring an Sr. Network Reliability Engineer who embodies Innovation and Excellence, and will apply SRE principles - code-as-source-of-truth, SLOs and error budgets, alerting on symptoms rather than causes, failure-mode-first design, and the elimination of toil - to the firm's network platform from carrier edge through cloud fabric to Kubernetes pod boundary. This is not a "keep the lights on" role. You will systematically engineer the lights-on work out of existence, build the abstractions that let other engineering teams express network intent in code, and treat the network as a single engineered system rather than a collection of vendor consoles. You will operate inside a DevSecOps practice spanning multi-cloud, multi-region environments, and you will partner closely with Cloud and Data Platforms, the NOC/SOC, and Cyber Security to extend reliability practice across the firm.
How You'll Contribute:
- Treat reliability as an engineered property. Define SLOs and error budgets for the network platform - DNS resolution, edge availability, mesh ingress success, cross-region path health - and use them to gate changes, not just to color dashboards. Lead postmortems with a focus on permanent remediation, not pattern-recognition. Alert on symptoms users feel, not on causes that may or may not produce impact.
- Move network state into code. Use Terraform (or Pulumi), Ansible, and Python to replace CLI-driven configuration with declarative, version-controlled, peer-reviewed change running through Infra CI/CD. This applies equally to the edge tier (Cloudflare), security platforms (Zscaler ZIA/ZPA, ZTNA policies, next-gen firewalls), the cloud network fabric (Transit Gateway, Cloud WAN, VPCs, Route53, IPAM), and increasingly the Kubernetes and service-mesh layer.
- Build network policy as intent, not rule lists. Express what flows are permitted, what segments are isolated, what egress is inspected, what zones share DNS - and engineer the compilers that turn that intent into per-vendor configuration. Use Policy as Code (OPA/Rego, Sentinel, Cilium NetworkPolicy) to catch invariant violations at plan time, not apply time.
- Infrastructure as Code (IaC): Design, deploy, and manage network infrastructure using Terraform or Ansible, moving the firm away from manual configuration to a code-first approach.
- Engineer the cloud network platform. Operate and extend our multi-account AWS Landing Zone - Cloud WAN segmentation, Transit Gateway peering, IPAM-driven CIDR allocation, shared private DNS, cross-account telemetry pipelines. Build the platform abstractions that make a new account or service land correctly with policy and connectivity composed from declarative inputs.
- Extend platform thinking into the container tier. Kubernetes networking, service mesh (Istio, Linkerd, Consul Connect), eBPF-based observability and policy (Cilium, Hubble), and the integration points where mesh-level authz meets cloud-tier identity. Recognize that an "internal" service is one logical hop on a chain of policy enforcement points and engineer for that explicitly.
- Improve telemetry and observability with intent. Build alerts as structured payloads with runbook links, suspected blast radius, and dependency-aware suppression. Author both system-health dashboards for operators and end-user monitoring dashboards that reflect actual user experience. Use Grafana, Elastic, Open Telemetry where each fits.
- Mentor and grow the team. Provide technical guidance to junior engineers, foster a culture of learning, and work out loud across Platform Engineering so the patterns you build cross-pollinate to adjacent domains.
- Handle hardware when required. Provide maintenance and configuration support for routers, switches, and firewalls at data centers and offices when needed - bringing code-first practices to physical hardware where possible (templating, change validation, zero-touch provisioning) and direct hands-on competence where it isn't.
- Incident Response: Serve as an escalation point for network issues, some complex and some basic but not yet covered by runbooks. Troubleshooting with a focus on root cause analysis and permanent remediation with a documentation-first mindset.
- Reduce toil and hand off cleanly. Repetitive operational tasks are scoped engineering problems with measurable payoff. Author runbooks and SOPs that the NOC can execute confidently; package routine work for L1/L2 handoff so engineering interrupt drops over time. Coordinate across Data Platforms, NOC/SOC, and Cyber Security so reliability practices spread instead of staying siloed.
What We're Looking For:
- Network Engineering: Deep understanding of TCP/IP, BGP, OSPF, VPNs, and SD-WAN architecture.
- Automation: Proven experience with Terraform (state management, modules) and Ansible (playbooks, roles) - or similar - in a production environment. Proficiency in Python for automation and API interaction, or similar.
- Security Platforms: Hands-on experience with Cloudflare, zScaler, and/or enterprise firewalls.
- Observability: Experience configuring monitoring tools (e.g., Datadog, Prometheus, Grafana) to create meaningful alerts and dashboards.
Nice to Have
- Service mesh experience (Istio, Linkerd, Consul Connect, Cilium).
- eBPF-based observability (Hubble, Pixie).
- AWS Multi-account landing zone tooling experience (AFT, Control Tower, or equivalent).
- Policy as Code experience (OPA/Rego, Sentinel, Cilium NetworkPolicy).
- Professional Attributes
- Documentation First: A strong belief that a job isn't done until the documentation in written.
- Toil Reduction: A mindset that actively seeks to automate repetitive tasks.
- Hybrid Capability: Willingness to handle physical hardware tasks when required while maintaining a software-centric engineering mindset.
Compensation:
Our compensation reflects the cost of labor across several U.S. geographic markets. The base pay for this position ranges from $135,000/year in our lowest geographic market up to $190,000/year in our highest geographic market. Pay is based on factors such as market location, job-related skills, and experience.
Benefits Highlights:
Employees who meet benefit eligibility guidelines and work 30 hours or more weekly, have the ability to enroll in Group 1001's benefits package. Employees (and their families) are eligible to participate in the Company's comprehensive health, dental, and vision insurance plan options. Employees are also eligible for Basic and Supplemental Life Insurance, Short and Long-Term Disability. All employees (regardless of hours worked) have immediate access to the Company's Employee Assistance Program and wellness programs-no enrollment is required. Employees may also participate in the Company's 401K plan, with matching contributions by the Company.
Group 1001, and its affiliated companies, is strongly committed to providing a supportive work environment where employee differences are valued. Diversity is an essential ingredient in making Group 1001 a welcoming place to work and is fundamental in building a high-performance team. Diversity embodies all the differences that make us unique individuals. All employees share the responsibility for maintaining a workplace culture of dignity, respect, understanding and appreciation of individual and group differences.
#LI-REMOTE
About Group1001
Sourced by ZipRecruiter
Company size
201 - 500 Employees
Headquarters location
Indianapolis, IN, US
Year founded
2013