Senior Devops SRE
California-Burbank, Glendale or Florida-Orlando or Washington-Seattle
Is this Remote, Onsite or hybrid role- ? (Please provide clarity): Onsite
Length Of Contract: 9 months
Responsibilities:
Build safe and secure automation for infrastructure and developer enablement following the Disney Security Configuration Standards whilst seeking best practices from other teams
• Develop useful telemetry, alerts, and response to reduce Mean Time To Repair (MTTR);
• Collaborate and provide technical excellence within and across teams;
• Consult on best practices and develop tools to enable smooth adoptions of good service reliability practices and methods;
• Identify areas of improvement in reliability, efficiency, and operations;
• Build tools to help your SRE team quickly pinpoint, isolate and resolve issues related to infrastructure, platform services and applications;
• Continuously refine monitoring processes, configurations, and thresholds;
• Develop runbooks and tools to streamline processes and shorten problem resolution time;
• Write code that improves scalability, performance, maintainability, and security;
• Add, tune and maintain alert configurations and documentation as needed;
• Cultivate full-team participation in high quality, thoughtful software;
• Develop and improve CI/CD processes to improve release cadence and success;
• Use Chaos Engineering principles and methodologies to test what you build under real-world conditions;
• Mentor SREs in technical and non-technical SRE responsibilities;
• Take primary responsibility for large (multi-person) efforts, including planning, execution, and training
Basic Qualifications:
Creative and innovative outside-the-box thinking
• 5+ years of experience in SRE, DevOps, technical operations, systems engineering, software engineering or related discipline
• Excellent communication skills, both verbal and written
• Passionate and curious about ways to leverage technology while continually learning
• Ability to identify root-cause sources of instability in high-traffic, large-scale distributed systems
• Experience in designing, building, and operating large-scale production systems
• Efficiently skilled with the use of containers in enterprise production environments (e.g. Docker, Kubernetes, LXC, AWS ECS and EKS)
• Configuration management and orchestration (e.g. Terraform, Cloud Formation, Ansible)
• Comfortable in one or more of the following languages (Python, Java, Scala, Go, Rust, Ruby, or similar)
• Scripting languages like Ruby, Bash, PowerShell or Python;
• Skilled in Cloud/PaaS/SaaS Environments (e.g. AWS, Azure, Google Cloud Compute)
• Hands-on experience using source control (Git, GitHub) and feature branching strategies
• Experience with continuous integration tools (e.g. Jenkins, Gitlab CI/CD, AWS CodeBuild, CodeDeploy, CodePipeline, Azure DevOps, Spinnaker)
• Knowledge of best practices and IT operations in an always-up, always-available service;
• Possess expertise in scalable testing, automation, continuous integration frameworks and best practices;
• Experience in SDLC, distributed systems, networking, hardware, logistics and operations or capacity planning;
• UNIX/Linux administration, troubleshooting, performance tuning, and security
• Must be detail-oriented, self-organized, be committed to quality and be capable of tracking multiple issues simultaneously
Preferred Education
• BS Degree in Computer Science, Electrical & Computer Engineering or Mathematics; or equivalent experience.