Job Summary:
Alpha Consulting Corp. is seeking a Senior Observability Engineer who will elevate customer observability maturity across infrastructure, applications, and business transactions. The role involves designing and optimizing telemetry architecture and ensuring effective observability practices are implemented throughout the organization.
Responsibilities:
• Analyze and optimize our Alert-to-Incident noise ratio (targeting a baseline better than 10:1). Drive the evolution from chaotic alerting to high-fidelity, actionable incident creation.
• Shift the paradigm away from rigid static thresholds. Implement dynamic baseline that intelligently accounts for time-of-day, day-of-week, and seasonal traffic patterns.
• Drive the maturity of our telemetry infrastructure by ensuring all dashboards, alerts, SLOs, and monitor configurations are defined, versioned, and deployed as code.
• Establish and enforce automated instrumentation compliance gates within our deployment pipelines to ensure code is observable before it hits production.
• Centrally manage, version, and monitor the health of our Open Telemetry (OTel) collectors and agent fleets.
• Implement platform capabilities that automatically surface probable root cause the moment an incident fire.
• Ensure all deployments, configuration changes, feature flag toggles, and database migrations are automatically annotated on dashboards and correlated to active incident timelines.
• Evaluate and adopt GenAI/LLM capabilities for advanced log pattern explanation and accelerated incident troubleshooting.
• Ensure deep telemetry integration across cloud-managed services (AWS/Azure/GCP, EKS/AKS, Lambda, RDS) and critical third-party SaaS dependencies (e.g., Guidewire, Salesforce, Earnix, Uniphore, payment gateways).
• Architect pipelines to export raw telemetry data to our data Lakehouse (S3/ADLS) to power advanced ML pipelines and predictive analytics.
• Leverage the observability platform for capacity forecasting—predicting utilization trends for CPU, memory, queue depth, and storage before saturation occurs.
• Drive org-wide standards for log structure and serialization to ensure seamless cross-platform parsing and querying.
• Map and trace complex, multi-service customer journeys (e.g., policy quote bind pay) to provide full-context business transaction visibility.
• Define, implement, and track Service Level Objectives (SLOs) across all production services.
• Democratize observability by fostering a proactive culture where developers instrument their own services during active development, backed by standardized, self-service health dashboards.
Qualifications:
Required:
• 10+ years of experience in observability and telemetry architecture
• Strong understanding of AWS / Azure / GCP environments
• Expertise in microservices architecture
• Experience with distributed systems & event-driven systems
• Knowledge of high availability & scalability patterns
• Experience with CI/CD pipelines (GitLab, Jenkins)
• Familiarity with Infrastructure as Code (Terraform, CloudFormation)
• Experience with containerization (Docker, Kubernetes troubleshooting)
• Knowledge of release observability & rollback readiness
• Experience with AIOps / AI-driven observability
• Experience with predictive alerting / anomaly detection
• Knowledge of observability cost optimization
• Familiarity with chaos engineering basics
• Experience with API & integration observability
• Monitoring, logging, tracing design (metrics, logs, traces)
• Dashboarding, alerting, and telemetry pipelines
• Observability platform design & optimization
• Root Cause Analysis (RCA), incident analysis
• SLO / SLI / SLA definition and error budgets
• Ability to drive the maturity of telemetry infrastructure
• Experience with dynamic baselining & anomaly detection
• Ability to implement automated root cause analysis
• Experience with end-to-end business transaction tracing
• Ability to define, implement, and track Service Level Objectives (SLOs)
• Ability to foster a proactive culture for developer empowerment & self-service
Company:
Alpha Consulting Corp. has been exceeding expectations in the IT, pharmaceutical, and clinical staffing business since 1994. Founded in 1994, the company is headquartered in East Brunswick, USA, with a team of 201-500 employees. The company is currently Growth Stage.