We want people who are passionate about troubleshooting complex problems with systems, networking and storage at scale.We are looking for a seasoned system administrator to help us keep the cloud running smoothly. Reporting to the manager of Cloud Operations, the GPU Operations Engineer monitors and provides first-response to all cloud health issues that impact, or could potentially impact, customer experience - internal or external. You will interface with teams across the organization to research and troubleshoot issues from single droplets to cloud-wide disturbances. Our workweek spans five days, and that may involve working on weekends.
What You'll Be Doing:ย - Ensuring maximum uptime for our global infrastructure
- Automating processes and building tools to improve operational efficiency
- Coordinating operational work across teams to improve the platform with minimal impact
What You'll Add to DigitalOcean:- Solid experience with Linux operating systems or Networking and day to day upkeep
- Familiarity with virtualization technologies and troubleshooting virtual machine instances
- Familiarity with containerization technologies and troubleshooting containers
- Familiarity with IPv4 Networking and troubleshooting (CCNA equivalent)
- Basic storage concepts and technologies
- Experience with monitoring systems and incident management
- Experience scripting in one or more of the following languages: Bash, Python, or Go
- Experience with GPU hardware or AI/ML, and Kubernetes
- A passion for good documentation and open communication
- Proven ability to learn!
Compensation Range:ย *This is a hybrid role
JR: 2026-7423
#LI-Hybrid