Job Summary:
Okta is a company focused on securing identities in the era of AI. They are seeking a highly technical Senior Observability Site Reliability Engineer to own and evolve their Splunk ecosystem and deliver a comprehensive, scalable Observability Platform.
Responsibilities:
โข Design, build, and maintain scalable observability infrastructure using tools like Terraform.
โข Optimize the collection, processing, and storage of log data to ensure high reliability and low latency of our Splunk services
โข Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and "observability-driven development."
โข Eliminate "toil" by automating the deployment and scaling of observability agents and collectors.
Qualifications:
Required:
โข Minimum 5+ Experience scaling and managing Splunk Cloud at scale (1000+ SVCs), including Workload Management (WLM) and HEC optimization.
โข Expertise in creating intuitive, actionable Splunk dashboards that correlate data across multiple sources.
โข Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems.
โข Strong coding skills in SPL, Go for building internal tools and automating workflows.
โข Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/EKS).
โข A data-driven approach to debugging complex, cross-service performance bottlenecks.
โข This position requires the ability to access federal environments and/or have access to protected federal data.
โข The successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
โข This person must attend in person onboarding in our San Francisco office the first week of employment.
Preferred:
โข Hands-on experience with OpenTelemetry (OTel), Vector, or similar frameworks for instrumenting applications.
โข Experience in implementing Splunk charge-back app for usage reporting.
โข Experience managing observability native tools within AWS or GCP.
Company:
Okta is a management platform that secures critical resources from cloud to ground for workforce and customers. Founded in 2009, the company is headquartered in San Francisco, USA, with a team of 5001-10000 employees. The company is currently Late Stage.