ClickHouse is expanding its cloud data platform across AWS, GCP, and Azure-adding new capabilities that connect and extend Postgres and ClickHouse for modern data workloads. We're hiring a Senior SRE / Senior Infrastructure Engineer to own reliability, automation, and operations as these services scale globally.
You'll be at the center of how we run and evolve our next-generation data platform-building the automation, observability, and operational rigor that ensure a fast, secure, and dependable customer experience. This is a hands-on, high-impact role where you'll write code, shape architecture, and enable the broader engineering team to deliver with confidence and velocity.
What You'll Do- Lead reliability and operations for ClickHouse's Postgres integration - upgrades, patching, maintenance, and scaling.
- Design and implement automation for provisioning, deployments, and service lifecycle management across AWS, GCP, and Azure.
- Develop infrastructure-as-code using Terraform and modern CI/CD tooling to ensure consistent, repeatable deployments.
- Contribute Go-based tooling and services that improve automation, observability, and developer experience.
- Own observability and monitoring, ensuring robust alerting, metrics, and tracing across environments.
- Drive incident management and postmortem practices that strengthen reliability and learning loops.
- Collaborate cross-functionally with platform, networking, and product teams to improve service operability.
- Mentor and enable engineers, helping the team scale effectively as customer adoption grows.
About You- Experience: 7+ years in SRE, DevOps, or infrastructure engineering, with a track record of running distributed, production-grade systems.
- Database Operations: Solid understanding of Postgres operations, scaling, and performance tuning.
- Cloud Expertise: Deep hands-on experience across AWS, with exposure to GCP and Azure; comfortable navigating multi-cloud topologies.
- Automation Skills: Proficient with Terraform, Kubernetes, and container-based infrastructure.
- Programming: Strong Go development skills (or willingness to write and own production Go code).
- Observability: Familiar with tools like Prometheus, Grafana, Loki, OpenTelemetry, or equivalents.
- Reliability Focus: Deep understanding of SLOs, incident response, and continuous improvement in service reliability.
- Mindset: You operate with a founder's mentality - hands-on, resourceful, and willing to dive deep to get things done. You take pride in hard work, autonomy, and shipping impactful systems.
Why Join Us- Build and operate the foundation for ClickHouse's Postgres integration / extension, spanning multiple CSPs.
- Work on a small, elite team where ownership, reliability, and speed are core values.
- Shape the operational practices and automation that will define how ClickHouse runs databases in the cloud.
- Write production code, influence architecture, and mentor others - all while scaling a high-impact platform from the ground up.
- Be part of a company redefining performance, simplicity, and developer experience in data infrastructure.
#LI-remote