Must Have Technical/Functional Skills Cloud & Platform Engineering (Expert Level) - Deep expertise in Microsoft Azure, including:
o Compute (VMs, App Services, Azure Container Apps)
o Containers & Orchestration (AKS, Docker)
o Networking (VNETs, Private Endpoints, Application Gateway, Load Balancers)
o Storage, Azure Key Vault, Azure Monitor, Log Analytics
- Proven experience designing enterprise-grade, highly available cloud platforms
- Strong understanding of hybrid and multi-cloud architectures (AWS / GCP exposure preferred)
DevOps & Engineering Excellence - Advanced experience with Azure DevOps and CI/CD pipeline architecture
- Infrastructure automation using Terraform (modules, state management, governance)
- Strong scripting skills (PowerShell, Bash)
- GitOps concepts, branching strategies, release orchestration
- Site Reliability Engineering (Leadership Level) Ownership of platform reliability, resiliency, and performance Definition and governance of:
o SLIs, SLOs, SLAs
o Error budgets and reliability metrics
- Advanced observability strategy, designing and implementation:
o Metrics, logs, traces, alerts, dashboards using Dynatrace
- Incident response leadership, RCA facilitation, and long-term remediation planning Experience operating 99.9%99.99% availability systems
Containers, APIs & Integration - Leadership-level experience with AKS-based platforms, ingress, and scaling strategies
- Understanding of microservices, API-led and event-driven architectures
- Familiarity with Azure Integration Services (Service Bus, Event Hub, API Management)
Security, Compliance & Cost - Secure cloud design using Key Vault, managed identities, RBAC
- Cost optimization (FinOps mindset) across cloud infrastructure
Roles & Responsibilities - Act as Lead SRE for client's Retail platforms, owning reliability and stability outcomes
- Define and enforce SRE standards, best practices, and operating models
- Architect and govern highly available, scalable cloud platforms
- Lead the design and implementation of CI/CD and IaC strategies
- Establish proactive monitoring, alerting, and incident prevention mechanisms
- O wn major incident leadership, RCA execution, and corrective action tracking
- Partner with application, security, and architecture teams to build reliability by design
- Drive automation to reduce toil and improve operational efficiency
- Mentor and coach SRE and DevOps engineers across teams
- Influence roadmap decisions with a reliability, scalability, and cost lens
TCS Employee Benefits Summary: - Discretionary Annual Incentive.
- Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
- Family Support: Maternal & Parental Leaves.
- Insurance Options: Auto & Home Insurance, Identity Theft Protection.
- Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
- Time Off: Vacation, Time Off, Sick Leave & Holidays.
- Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
#LI-RJ2 Salary Range-$120000-$160,000 a year