Job Title: Senior Observability Engineer
Location: Holmdel, NJ (2–3 days onsite weekly). We will only consider local candidates. No remote or relocation candidates. Current location must be noted on the resume.
Main Skills: Observability Engineer experienced with Splunk working with Cloud and DevOps
Work Authorization: Candidates must be legally authorized to work in the United States without employer sponsorship, now or in the future. This is a contract to hire role.
· Seeking Observability Engineer
· They have been receiving Cloud Engineer and DevOps engineers.
· Observability is needed first, but with cloud and DevOps experience.
· Main Skill is Splunk. Must be a Splunk expert. SME level.
· Will mostly work with Splunk.
· Will also work with AppDynamics, Zenos, and Open Telemetry. However, they can be trained on those tools.
· Job is 3 days onsite
· Will implement, manage, and develop
· Must know Bash and Python scripting
· 1st interview virtual, 2nd interview onsite
Job Description:
· We are seeking a dedicated and detail-oriented Senior Observability Engineer to join our Enterprise Observability Engineering team. The ideal candidate will bring deep expertise in AppDynamics, Splunk, Open Telemetry, and AWS-native services, along with strong DevOps experience.
· This role is responsible for the administration, configuration, implementation, and ongoing optimization of observability platforms that enable end-to-end visibility across applications, infrastructure, and cloud-native workloads. You will play a critical role in ensuring platform reliability, performance, and actionable insights to support engineering and business teams.
Job Responsibilities:
Observability Platform Administration & Implementation
· Administer, configure, and support AppDynamics, Splunk, and Open Telemetry (OTel) platforms to meet enterprise monitoring and observability needs.
· Design and implement observability solutions aligned to MELT (Metrics, Events, Logs, Traces) best practices.
· Perform regular upgrades, patching, and security hardening of observability platforms.
Monitoring, Reliability & Maintenance
· Continuously monitor the health, availability, and performance of observability platforms.
· Ensure data integrity, retention, and availability across metrics, logs, and traces.
· Proactively identify and remediate platform performance, scalability, and reliability issues.
Cloud & Full-Stack Observability
· Implement and support observability for AWS services, including:
· EKS, ECS, Lambda Functions
· SNS/SQS, S3, CloudWatch
Deliver full-stack observability, including:
· -Kubernetes cluster and workload metrics
· -Service discovery, events, and application performance data
Leverage Open Telemetry for instrumentation, context propagation, collectors, and sampling strategies.
Dashboard, Alerting & Reporting
· Create and maintain dashboards, reports, and alerts in AppDynamics and Splunk.
· Collaborate with application, platform, and DevOps teams to define meaningful monitoring and alerting standards.
· Reduce noise through alert tuning and promote actionable signal over raw data.
DevOps & Automation
· Integrate observability into CI/CD pipelines using GitHub, Jenkins, Argo CD, and automation frameworks.
· Develop scripts and automation using Python, JavaScript, or Bash to streamline onboarding, configuration, and maintenance activities.
User Support, Enablement & Training
· Provide tier-2/3 support for observability-related issues.
· Assist internal teams with troubleshooting, root cause analysis, and performance investigations.
· Develop and deliver training materials and knowledge sessions to improve tool adoption and effective usage.
Documentation & Best Practices
· Maintain comprehensive documentation for:
· platform configurations
· onboarding procedures
· operational runbooks and standards
· Define and enforce observability best practices across the organization.
Incident Response & Collaboration
· Partner with IT, SRE, and DevOps teams to ensure comprehensive monitoring coverage.
· Participate in incident response efforts, leveraging observability data to accelerate detection, diagnosis, and resolution.
Required Qualifications:
Education:
Bachelor’s degree in Computer Science, Information Technology, or a related field.
· 5–7+ years of experience in Observability, Monitoring, SRE, or Platform Engineering roles.
· Proven hands-on experience implementing, managing, and maintaining AppDynamics, Splunk, and Open Telemetry in enterprise environments.
Technical Skills:
· Observability Platforms:
· AppDynamics (APM, dashboards, alerts)
· Splunk (configuration, administration, data onboarding)
· Open Telemetry (instrumentation, collectors, sampling)
AWS Cloud Services:
· EKS, ECS, Lambda
· SNS/SQS, S3, CloudWatch
DevOps & CI/CD:
· GitHub, Jenkins, Argo CD
· CI/CD pipelines and Git Ops practices
Observability Concepts:
· Strong expertise in Metrics, Events, Logs, and Traces (MELT)
· Full-stack and cloud-native observability
· Automation & Scripting:
· Python, JavaScript, Bash
Infrastructure Knowledge:
· Strong understanding of IT infrastructure, applications, and networking
Soft Skills:
· Excellent problem-solving and analytical skills
· Strong communication and collaboration abilities
· Ability to work independently as well as in a team-oriented environment
· Detail-oriented with a strong focus on operational excellence