Job Summary:
Decagon is the leading conversational AI platform empowering brands to deliver concierge customer experiences. The Senior Infrastructure Engineer will design, build, and operate production infrastructure for high-scale, low-latency systems, ensuring reliability and performance while supporting various deployment architectures.
Responsibilities:
• Design and implement critical infrastructure services with strong SLOs, clear runbooks, and actionable telemetry.
• Partner with research and product teams to architect solutions, set up prototypes, evaluate performance, and scale new features.
• Tune service latencies: optimize networking paths, apply smart caching/queuing, and tune CPU/memory/I/O for tight p95/p99s.
• Evolve CI/CD, golden paths, and self‑service tooling to improve developer velocity and safety.
• Support various deployment architectures for customers with robust observability and upgrade paths.
• Lead infrastructure‑as‑code (Terraform) and GitOps practices; reduce drift with reusable modules and policy‑as‑code.
• Participate in on‑call and drive down toil through automation and elimination of recurring issues.
Qualifications:
Required:
• 5+ years building and operating production infrastructure at scale.
• Depth in at least one area across Core/Data/AI-ML/Platform/Voice, with curiosity to learn the rest.
• Proven track record meeting high availability and low latency targets (owning SLOs, p95/p99, and load testing).
• Excellent observability chops (OpenTelemetry, Prometheus/Grafana, Datadog) and incident response (PagerDuty, SLO/error budgets).
• Clear written communication and the ability to turn ambiguous requirements into simple, reliable designs.
Preferred:
• Experience being an early backend/platform/infrastructure engineer at another company
• Strong Kubernetes experience (GKE/EKS/AKS) and experience across multiple cloud providers (GCP, AWS, and Azure)
• Experience with customer-managed deployments
Company:
Decagon provides a conversational AI platform for automating customer support across multiple channels. Founded in 2023, the company is headquartered in San Francisco, USA, with a team of 201-500 employees. The company is currently Growth Stage.