Overview:
We are looking for a Mid-Level Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms.
This role focuses on hands-on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.
Collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production-ready.
Responsibilities include implementing and maintaining metrics, logs, and traces for applications and infrastructure; assisting with onboarding applications into observability platforms such as Dynatrace, ELK, and Datadog;
configuring dashboards, alerts, and anomaly detection;
working with development teams to enable structured logging, distributed tracing, and core metrics;
validating observability requirements during Production Readiness Reviews;
troubleshooting telemetry issues; configuring alerts based on golden signals; reducing alert noise; supporting incident response; supporting root cause analysis; maintaining dashboards and documentation;
participating in on-call rotations; automating onboarding and validation tasks; and following observability standards and best practices.
Required qualifications include 2-4 years of experience in Observability or SRE,
working knowledge of metrics, logs, and tracing concepts,
hands-on experience with observability platforms, understanding of SLIs, SLOs, and service health indicators, experience with cloud or hybrid environments, and scripting skills in Python, Bash, or PowerShell.
Preferred qualifications include experience with OpenTelemetry, APM agents, Kubernetes, incident management tools, and experience in regulated or enterprise environments.