General InformationRef #51550
DepartmentData / technology
Job SiteMission Pet Health
Date Published06-10-2026
Pay ClassFull-Time
Job DescriptionAbout the RoleWe're looking for a Senior DevOps Engineer to own our cloud infrastructure end-to-end - from operating a large multi-tenant Kubernetes environment to building CI/CD pipelines that teams actually trust. You'll work across AWS, drive infrastructure-as-code standards, and lead our migration toward GitLab CI and a Grafana-based observability stack while keeping production environments stable.
What You'll Do- Operate and scale a multi-tenant AWS EKS cluster where each client runs an isolated set of application services - owning tooling to onboard, scale, and observe hundreds of service instances reliably
- Build and improve CI/CD pipelines in GitLab CI and GitHub Actions with automated testing, static analysis, and build-gated releases; maintain ArgoCD GitOps workflows for production deployments
- Lead the migration from Datadog to a self-managed Grafana observability stack (Grafana, Loki, Mimir/Prometheus, Tempo) - dashboards, SLOs, alert routing, and on-call integration
- Manage secrets, IAM, and security scanning pipelines using AWS KMS, Secrets Manager, external-secrets operator, and Auth0/Dex OIDC - enforcing least-privilege across all environments
- Own and evolve the Redpanda (Kafka-compatible) streaming layer and its integrations with application workers
- Drive cloud cost optimization through right-sizing, autoscaling, and shared infrastructure patterns on EKS
- Document infrastructure with automated tooling (terraform-docs) and maintain standards that scale across teams
- Automate operational toil - certificate renewal, clinic environment provisioning, deployment validation, runbook automation
Responsibilities and BenefitsWhat We're Looking ForRequired- 5+ years in DevOps or infrastructure engineering
- 3+ years operating Kubernetes in production - AWS EKS preferred - including CSI drivers, cluster autoscaling, network policy (Calico), and pod identity
- 3+ years hands-on with AWS core services (IAM, S3, KMS, Secrets Manager, STS, EKS, Load Balancer Controller, ECR)
- Strong Terraform experience; GitOps experience with ArgoCD or Flux
- Hands-on experience with GitLab CI and/or GitHub Actions
- Scripting proficiency in Python and Bash
- Experience with IAM design and security best practices (SAST/DAST, secret scanning, OIDC federation)
- Familiarity with streaming or message-queue infrastructure (Redpanda, Kafka, or equivalent)
Nice to Have- Experience migrating from a SaaS observability tool (Datadog, New Relic) to a self-hosted Grafana stack
- Grafana stack depth - Loki for logs, Mimir or Thanos for metrics, Tempo for traces, Alertmanager for routing
- Experience with Redpanda specifically, or deep Kafka operations knowledge
- Background in multi-tenant SaaS platforms or per-customer service isolation patterns
- AWS certification
- Familiarity with chaos engineering tooling (chaos-mesh or LitmusChaos)
- Background in software engineering or scripting-heavy roles
Tech StackCurrent production: AWS (EKS, S3, KMS, Secrets Manager, STS, Load Balancer) โข Terraform โข GitHub Actions โข ArgoCD โข Kubernetes โข Traefik โข Coraza WAF โข Redis HA โข MongoDB โข Auth0 โข Dex โข external-secrets โข Datadog โข Docker โข Python โข Bash โข Linux
Where we're going: GitLab CI โข Redpanda โข Grafana โข Loki โข Prometheus/Mimir โข Tempo โข Alertmanager
Platform components you'll operate: ArgoCD โข Traefik โข Coraza WAF โข Auth0 โข Dex โข Redis HA โข MongoDB โข API servers โข client-facing portals โข internal tooling
Why Join Us- Own infrastructure across a real multi-tenant platform serving production clinic environments
- Lead the observability and streaming migrations - greenfield decisions with lasting impact
- Collaborative engineering culture with high trust and low bureaucracy
- Competitive salary, benefits, and flexible work arrangements