Data Center Operations Lead
The Data Center Operations Lead / Architect is primarily responsible for leading 24/7 data center operations, ensuring high availability, reliability, security, and cost-effective delivery of infrastructure services. This role focuses heavily on operational excellence, incident management, and service delivery, while providing practical architectural inputs to improve performance, scalability, and stability.
Key Responsibilities
Operations Management (Primary Focus)
โข Lead and manage day-to-day data center operations across servers, storage, networking, and facilities
โข Ensure maximum uptime, system availability, and SLA compliance
โข Oversee monitoring, alerting, and proactive issue resolution
โข Manage operational activities such as backups, patching, batch jobs, and scheduled maintenance
โข Ensure reliable, secure, and cost-effective delivery of infrastructure services
Incident, Problem & Change Management
โข Act as the primary escalation point for critical incidents and outages
โข Drive resolution of incidents in collaboration with cross-functional teams
โข Lead root cause analysis (RCA) and implement preventive measures
โข Ensure strict adherence to ITIL processes (Incident, Problem, Change, Service Request Management)
Service Delivery & Process Governance
โข Lead ITIL-aligned service delivery processes, ensuring consistent execution
โข Drive operational excellence, SLA adherence, and continuous improvement initiatives
โข Ensure high-quality handling of incidents, problems, changes, and service requests
Team Leadership
โข Lead, mentor, and manage a team of data center engineers and operations staff
โข Manage shift schedules, on-call rotations, and 24/7 support coverage
โข Drive performance management and team skill development
Infrastructure Operations & Maintenance
โข Oversee installation, configuration, maintenance, and lifecycle management of infrastructure
โข Manage hardware upgrades, patching, and DC activities (Moves/Adds/Changes)
โข Ensure backup, disaster recovery (DR), and business continuity readiness
Architecture Support (Secondary Focus)
โข Provide operational insights into infrastructure design and architecture decisions
โข Support implementation of new technologies ensuring operational readiness and stability
โข Recommend improvements for performance, capacity, and resilience
Capacity & Performance Management
โข Monitor infrastructure utilization and performance metrics
โข Plan and manage capacity and scalability requirements
โข Optimize resource usage to improve efficiency and reduce cost
Vendor & Stakeholder Management
โข Coordinate with vendors for support, maintenance, and issue resolution
โข Collaborate with Cloud, Network, Security, and Application teams
โข Ensure adherence to SLAs and service delivery commitments
Compliance & Security
โข Ensure compliance with security standards, policies, and regulatory requirements
โข Maintain operational documentation and SOPs
โข Support audits, risk assessments, and compliance activities
Monitoring & Reporting
โข Manage monitoring tools and implement alert thresholds
โข Track and report key metrics such as uptime, MTTR, SLA compliance, and incident trends
โข Provide dashboards and reports to leadership
Technical Skills
โข Responsible for ensuring reliable, secure, and cost effective delivery of infrastructure services across the environment
โข Strong hands-on expertise in servers, storage, networking, and virtualization
โข Deep understanding of ITIL processes (Incident, Problem, Change, Service Requests)
โข Experience with monitoring tools and ITSM platforms (e.g., ServiceNow)
Experience
โข Typically requires 1015+ years of experience in data center operations, infrastructure support, or similar operations-focused leadership roles
โข Proven experience managing 24/7 production environments
Process
โข Leads ITIL-aligned service delivery processes, ensuring consistent and efficient execution
โข Ensures governance across incident, problem, change, and service request management
โข Focus on operational stability, process adherence, and continuous improvement
Preferred Skills
โข Experience in banking or financial services domain
โข Exposure to cloud platforms (Azure, AWS) and hybrid environments
โข Familiarity with automation tools (Ansible, Terraform)
โข Strong troubleshooting and analytical capabilities
Key Competencies
โข Strong operations leadership and incident management skills
โข Ability to work in high-pressure, critical production environments
โข Excellent problem-solving and decision-making abilities
โข Effective communication and stakeholder management
Work Environment
โข 24/7 support model with on-call responsibilities
โข Coordination across multiple data center locations and teams
Salary Range- $85,000-$100,000 a year
#LI-SP3
#LI-VX1