1

Datacenter Operations Manager Jobs in Texas (NOW HIRING)

... managing system state, ensuring seamless operation between hardware and customer workloads ... our Datacenter Operations Engineers to maintain and operate the fleet of AI systems at peak ...

Strong organizational and time management skills in fast-paced operational settings. Preferred ... Familiarity with datacenter equipment, ticketing systems, or asset tracking tools. * Technical ...

Strong organizational and time management skills in fast-paced operational settings. Preferred ... Familiarity with datacenter equipment, ticketing systems, or asset tracking tools. * Technical ...

VCF Platform Engineer Lead

Dallas, TX · On-site

$101K - $133K/yr

... datacenter operations. • Proven experience leading enterprise-scale platform operations and ... management. • Experience with broader private cloud and hosting service ecosystems, including ...

VCF Platform Engineer Lead

Fort Worth, TX · On-site

$98K - $129K/yr

... datacenter operations. • Proven experience leading enterprise-scale platform operations and ... management. • Experience with broader private cloud and hosting service ecosystems, including ...

VCF Platform Engineer Lead

San Antonio, TX · On-site

$92K - $121K/yr

... datacenter operations. • Proven experience leading enterprise-scale platform operations and ... management. • Experience with broader private cloud and hosting service ecosystems, including ...

next page

Showing results 1-20

Datacenter Operations Manager information

What are some common challenges faced by Datacenter Operations Managers, and how can they be addressed?

Datacenter Operations Managers often encounter challenges such as maintaining uptime during equipment failures, managing rapid scaling demands, and ensuring robust security protocols. To address these, managers typically implement detailed incident response plans, invest in staff cross-training, and coordinate closely with IT and security teams. Proactive monitoring, regular maintenance schedules, and clear communication channels are also essential for minimizing downtime and ensuring compliance with industry standards.

What is the difference between Datacenter Operations Manager vs Data Center Technician?

AspectDatacenter Operations ManagerData Center Technician
CredentialsTypically requires management experience, certifications like Cisco CCNA, CompTIA Server+Often requires technical certifications such as CompTIA A+, Network+, or vendor-specific training
Work EnvironmentOversees data center operations, manages teams, plans infrastructure upgradesPerforms hardware installation, troubleshooting, and maintenance tasks
Employer & Industry UsageUsed by data center and IT service providers for operational oversightCommonly employed in data centers, telecom, and enterprise IT facilities for technical support

The Datacenter Operations Manager focuses on overseeing overall data center operations, managing teams, and strategic planning. In contrast, the Data Center Technician handles hands-on technical tasks like hardware setup and troubleshooting. Both roles are essential but differ in scope and responsibilities within the data center environment.

What does a Datacenter Operations Manager do?

A Datacenter Operations Manager is responsible for overseeing the daily operations and maintenance of data center facilities. This includes ensuring the reliability, security, and efficiency of all hardware, software, and network systems within the data center. They manage staff, coordinate with other IT teams, handle incident response, and ensure compliance with industry standards and regulations. Their role is crucial for minimizing downtime, optimizing performance, and supporting the business's IT infrastructure needs.

What are the key skills and qualifications needed to thrive as a Datacenter Operations Manager, and why are they important?

To thrive as a Datacenter Operations Manager, you need a strong background in IT infrastructure, systems administration, and facility management, often supported by a degree in computer science or a related field. Familiarity with data center management tools, monitoring systems, and certifications such as ITIL or Data Center Certified Associate (DCCA) is commonly required. Strong leadership, problem-solving, and communication skills are crucial for managing teams, coordinating with stakeholders, and ensuring operational continuity. These competencies are vital for maintaining uptime, optimizing performance, and safeguarding critical business data and services.
What cities in Texas are hiring for Datacenter Operations Manager jobs? Cities in Texas with the most Datacenter Operations Manager job openings:

Staff Engineer

Graphcore

Austin, TX • On-site

Full-time

Posted 6 days ago


Job description

Job Summary:
Graphcore is one of the world’s leading innovators in Artificial Intelligence compute, developing hardware, software and systems infrastructure for AI breakthroughs. The Staff Engineer will join the System Management team to develop critical interfaces for managing system state, ensuring seamless operation between hardware and customer workloads.
Responsibilities:
• Ownership of software engineering efforts across the full SDLC, including implementation, automated testing, integration, and production readiness for the rack management solution.
• Ownership of critical infrastructure with the need to drive issues to resolution while collaborating effectively across teams.
• Configure and test new Graphcore AI hardware and systems using Continuous Deployment and Infrastructure-as-code in internal and external datacentres.
• Work with our Datacenter Operations Engineers to maintain and operate the fleet of AI systems at peak performance.
• Drive corrective actions for systems that are not operating correctly, working with DC operations and Graphcore Engineering as required.
Qualifications:
Required:
• Bachelor's degree or equivalent practical experience in a relevant subject.
• Experience with RESTful API development.
• Experience building, deploying, and operating containerized workloads using Kubernetes and container runtimes such as Docker or Podman.
• Experience with managing production Kubernetes clusters and workloads.
• Programming experience with Go.
• Hands-on experience deploying and operating infrastructure using Infrastructure-as-Code, source code version control, and CI/CD automation tools (e.g. Terraform/OpenTofu, Ansible, GitLab, GitHub Actions, Git version control).
• Experience with Redfish for datacenter hardware management, telemetry, provisioning, and control.
• Experience specifying, scoping, estimating and detailing work plans in an AGILE and SCRUM framework, including priorities, risks, issues, impacts and constraints.
• Strong Linux systems engineering experience, including administration, automation, and scripting with Bash and Python.
Preferred:
• Experience with AI coding assistants (Codex, Claude, etc).
• Experience with Kubernetes operator development (Custom resources).
• Experience with High Performance Computing (HPC) environments using SLURM or similar batch workload solutions.
• Experience with virtualized deployments and the technologies they rely on (e.g. Open vSwitch, KVM, QEMU).
• Experience with distributed object, block, and file storage (e.g., Ceph).
• Experience in end-to-end deployment automation and CI of containerized services. Complete automation of pipelines for build, test, deploy, manage, alert, destroy, rebuild.
• Experience with solutions for monitoring and observability (e.g. Grafana, Prometheus, OpenSearch/ElasticSearch, Loki, Mimir, OpenTelemetry, Fluentd, Kafka).
• Experience with managed switch configuration (e.g. EOS, SONiC, DNOS).
• Experience with PyTorch for AI workloads.
• Solid understanding of cloud and infrastructure technologies, including APIs, virtualization, networking, block storage, resource management, and monitoring systems.
Company:
Graphcore develops a microprocessor designed for AI and machine learning applications. It is a sub-organization of SoftBank. Founded in 2016, the company is headquartered in Bristol, GBR, with a team of 501-1000 employees. The company is currently Late Stage.