Job Summary:
Decagon is the leading conversational AI platform empowering brands to deliver concierge customer experiences. The Senior Infrastructure Engineer will design, build, and operate production infrastructure for high-scale, low-latency systems, ensuring reliability and performance while supporting various deployment architectures.
Responsibilities:
โข Design and implement critical infrastructure services with strong SLOs, clear runbooks, and actionable telemetry.
โข Partner with research and product teams to architect solutions, set up prototypes, evaluate performance, and scale new features.
โข Tune service latencies: optimize networking paths, apply smart caching/queuing, and tune CPU/memory/I/O for tight p95/p99s.
โข Evolve CI/CD, golden paths, and selfโservice tooling to improve developer velocity and safety.
โข Support various deployment architectures for customers with robust observability and upgrade paths.
โข Lead infrastructureโasโcode (Terraform) and GitOps practices; reduce drift with reusable modules and policyโasโcode.
โข Participate in onโcall and drive down toil through automation and elimination of recurring issues.
Qualifications:
Required:
โข 5+ years building and operating production infrastructure at scale.
โข Depth in at least one area across Core/Data/AI-ML/Platform/Voice, with curiosity to learn the rest.
โข Proven track record meeting high availability and low latency targets (owning SLOs, p95/p99, and load testing).
โข Excellent observability chops (OpenTelemetry, Prometheus/Grafana, Datadog) and incident response (PagerDuty, SLO/error budgets).
โข Clear written communication and the ability to turn ambiguous requirements into simple, reliable designs.
Preferred:
โข Experience being an early backend/platform/infrastructure engineer at another company
โข Strong Kubernetes experience (GKE/EKS/AKS) and experience across multiple cloud providers (GCP, AWS, and Azure)
โข Experience with customer-managed deployments
Company:
Decagon provides a conversational AI platform for automating customer support across multiple channels. Founded in 2023, the company is headquartered in San Francisco, USA, with a team of 201-500 employees. The company is currently Growth Stage.