Job Summary:
Scalence L.L.C. is a company focused on cloud environments, and they are seeking a Site Reliability Engineer to lead application reliability, resiliency, security, and incident management. The role involves driving automation, monitoring, performance optimization, and recovery solutions for large-scale systems.
Responsibilities:
• Lead incident response, recovery, and post-mortem analysis
• Design and automate failure detection, recovery, and resiliency solutions
• Define and manage Service Level Objectives (SLOs)
• Improve application performance, security, quality, and cost efficiency
• Develop monitoring, metrics, dashboards, and operational playbooks
• Collaborate with engineering teams to ensure reliable production systems
• Act as a technical advisor for complex reliability and infrastructure challenges
Qualifications:
Required:
• Strong Site Reliability Engineering (SRE) or DevOps experience
• AWS Cloud Services, Kubernetes, Terraform, Datadog
• Incident Management, RCA, SLO/SLI implementation
• Automation and Infrastructure as Code (IaC)
• Programming experience in one or more: Java, Scala, JavaScript, .NET, Go, or Python
• Experience with application monitoring, observability, security, and performance tuning
Company:
In today’s dynamic and competitive market, success hinges on mastering three key areas: Data Intelligence, Business Resilience, and Digital Experience. Founded in , the company is headquartered in Morristown, New Jersey, US, , with a team of 501-1000 employees. The company is currently Late Stage.