1

Senior Observability Engineer Jobs (NOW HIRING)

Senior ITSMA Observability Engineer

Raleigh, NC ยท On-site +1

$101.60K - $139.50K/yr

The Senior ITSMA Observability Engineer is responsible for the design and development of the Elastic and Prometheus Stack, as well as, AWS Observability tools that monitor and manage critical ...

Senior ML Observability Engineer

Fairfax, VA ยท On-site

$107K - $146.90K/yr

They are seeking a Senior ML Observability Engineer to architect and govern the instrumentation and telemetry infrastructure for AI and machine learning models within the War Data Platform, ensuring ...

New

Observability SW Senior Engineer

Hopkinton, MA ยท On-site

$140.80K - $196.90K/yr

Senior Software Engineer We are looking for a highly skilled Senior Observability Software Engineer to design, implement, and maintain largescale observability platforms that provide deep visibility ...

DevOps/Observability Engineer

$54 - $74/hr

We are seeking a highly experienced Senior DevOps/Observability Engineer with over 8 years of experience to lead the design and implementation of our next-generation, unified observability platform.

DevOps/Observability Engineer

OR ยท Remote

$52.75 - $72.25/hr

We are seeking a highly experienced Senior DevOps/Observability Engineer with over 8 years of experience to lead the design and implementation of our next-generation, unified observability platform.

next page

Showing results 1-20

Senior Observability Engineer information

See salary details

$59.5K

$126.6K

$183.5K

How much do senior observability engineer jobs pay per year?

As of May 31, 2026, the average yearly pay for senior observability engineer in the United States is $126,557.00, according to ZipRecruiter salary data. Most workers in this role earn between $104,500.00 and $143,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Senior Observability Engineer, and why are they important?

To thrive as a Senior Observability Engineer, you need expertise in monitoring, logging, and tracing systems, with a solid background in computer science or a related field. Familiarity with tools like Prometheus, Grafana, ELK stack, and cloud platforms, as well as certifications such as AWS Certified DevOps Engineer, are typically required. Strong problem-solving, collaboration, and communication skills are critical for effectively diagnosing and resolving complex infrastructure issues. These skills ensure reliable system performance, rapid incident response, and continuous improvement of the technology environment.

How does a Senior Observability Engineer typically collaborate with development and operations teams?

A Senior Observability Engineer works closely with both development and operations teams to ensure robust monitoring, logging, and tracing solutions are in place across all applications and infrastructure. They often participate in architecture discussions to advise on best practices for instrumenting code and systems for observability. By analyzing metrics and alerting patterns, they help teams proactively resolve issues and optimize system performance. This role also involves mentoring engineers on observability tools and fostering a culture of transparency and accountability in incident response.

What is a Senior Observability Engineer?

A Senior Observability Engineer is a seasoned IT professional responsible for designing, implementing, and maintaining systems that monitor and provide insights into the performance, health, and reliability of software applications and infrastructure. They utilize tools for logging, monitoring, tracing, and alerting to ensure that systems are observable and any issues can be quickly detected and resolved. In addition to technical expertise, they often collaborate with development and operations teams to establish best practices, improve incident response, and optimize system performance. Their work is crucial for maintaining uptime, enhancing customer experiences, and supporting the scalability of technology platforms.

What is the difference between Senior Observability Engineer vs Site Reliability Engineer?

AspectSenior Observability EngineerSite Reliability Engineer
CredentialsExperience with monitoring tools, scripting, cloud platformsSame as Senior Observability Engineer, often with SRE certifications
Work EnvironmentFocus on monitoring, logging, and tracing systemsFocus on system reliability, automation, and incident response
Industry UsageUsed in tech companies emphasizing system observabilityCommon in large-scale tech and cloud services
Search/Comparison IntentOften compared for monitoring rolesCompared for reliability and system stability roles

While both roles require expertise in cloud platforms and scripting, the Senior Observability Engineer primarily focuses on designing and maintaining monitoring, logging, and tracing systems to ensure system visibility. In contrast, a Site Reliability Engineer emphasizes system reliability, automation, and incident management to maintain service uptime. Both roles are vital in tech environments but serve different core functions related to system health and stability.

More about Senior Observability Engineer jobs
What cities are hiring for Senior Observability Engineer jobs? Cities with the most Senior Observability Engineer job openings:
What are the most commonly searched types of Observability Engineer jobs? The most popular types of Observability Engineer jobs are:
What states have the most Senior Observability Engineer jobs? States with the most job openings for Senior Observability Engineer jobs include:
Infographic showing various Senior Observability Engineer job openings in the United States as of May 2026, with employment types broken down into 94% Full Time, and 6% Contract. Highlights an 90% Physical, 5% Hybrid, and 5% Remote job distribution, with an average salary of $126,557 per year, or $60.8 per hour.
Senior ITSMA Observability Engineer

Senior ITSMA Observability Engineer

HedgeServ

Raleigh, NC โ€ข On-site, Remote

$101.60K - $139.50K/yr

Other

Posted 3 days ago


Job description

At HedgeServ, we're redefining what's possible in fund administration. With more than $700 billion in assets under administration, we partner with the world's most forward-thinking investment managers - across private equity, private credit, endowments, hedge funds and more - to deliver seamless, tech-enabled solutions that drive performance.

Our proprietary platform, enhanced by machine learning and robotic process automation, gives clients real-time insights and unmatched control over their operations. Alongside our technology, we offer award-winning service through our team-based approach -- led by a deeply experienced team of industry experts. Our solutions span the full investment lifecycle, including fund accounting, middle office, risk, compliance, tax, and investor services.

We're a future-focused company, empowering our people through a robust career development framework, clear career trajectories with structured learning paths, training, and progression plans. We invest in leadership development and in our collaborative culture, creating space for talent to grow. Our corporate values - Relationships, Support, Innovation, and Expertise - create a sense of shared purpose and belonging, and we recognize our employees sit at the core of our success. We continue to innovate and evolve through our employees, working together to achieve our shared vision and mission.

HedgeServ supports employees through a variety of offerings, including remote and hybrid working arrangements, and fully paid comprehensive health and well-being benefits. We've been recognized as an employer of choice, earning a top 100 workplaces designation.

Founded in 2008, HedgeServ has grown into a global organization with over 2,000 experts across the globe, with offices in the United States, Grand Cayman, Ireland, Poland, Bulgaria, Luxembourg, the Philippines, and Australia. We've earned numerous accolades, including Top Overall Administrator, along with #1 rankings for providing alternative asset services in Accounting, Technology, Client Service, Investor Services, Alternative Fund Expertise, Reporting, and Regulatory Expertise.

Job Description

The Senior ITSMA Observability Engineer is responsible for the design and development of the Elastic and Prometheus Stack, as well as, AWS Observability tools that monitor and manage critical applications and infrastructure at HedgeServ. As an important member of the ITSMA Monitoring and Analytics Team, the Senior Engineer will be responsible for the operation and design of the portfolio of tools, which include alerting mechanisms and escalation, dashboards, and the overall framework to support the management of HedgeServ's infrastructure, systems, and applications. Additionally, this role entails leading IT infrastructure monitoring projects and vendor management and handling daily operations with SME (Subject Matter Expert) escalation support as needed. The successful applicant should possess the ability to collaborate with various IT teams to gather requirements and develop solutions by means of existing monitoring capabilities or customized monitors (scripts).

Role Responsibilities

The Senior ITSMA Observability Engineer will collaborate with the ITSMA Monitoring and Analytics Team to design, build, secure, maintain, optimize, and document solutions utilizing Elastic Cloud Stack and AWS-managed Prometheus.

  • Proficiency with Elasticsearch, Logstash, Kibana, Beats, APM with X-Pack, Prometheus, Grafana, AWS CloudWatch, and other observability tools
  • Experience with OTEL Collectors
  • Engage closely with application owners, engineers, and development teams to evaluate requirements, architect, and support an Elasticsearch Stack solution, as well as structure queries to enhance system performance and efficiency
  • Design and configure ETL data pipelines using Elastic Common Schema for onboarding application logs and metrics
  • Configure index templates and manage data lifecycle (ILM) for effective data retention
  • Develop Ansible playbooks for automated deployment of Beat agents across on-premises and AWS systems; utilize Terraform for safe management of production infrastructure, employing methodologies such as Infrastructure as Code within AWS environments
  • Create Elastic alerting solutions via Watcher and Kibana Alerts integrated with existing ticketing tools and MS Teams
  • Develop Machine Learning jobs to dynamically monitor and provide alerts based on specific metrics and KPIs
  • Build Elastic and AWS observability AI solutions that enable infrastructure engineering and operations teams to address production issues efficiently
  • Adhere to lifecycle processes for transitioning solutions from Development to QA to Production
  • Actively participate in collaborative group sessions, attend agile sprint daily meetings, and share progress to ensure solution development aligns with organizational requirements

Pre-Requisite Knowledge, Skills and Experience

  • Technical Degree in Information Technology
  • Experience with Elastic Cloud and AWS Managed Prometheus
  • Knowledge of installation, system tasks, data collection, network troubleshooting, data pipelines, and cluster administration
  • Proficient in Python, Bash, PowerShell, Painless, and other scripting languages
  • Extensive ELK Stack expertise, including Elasticsearch, Logstash, Kibana, Beats, Machine Learning, APM, X-Pack, and REST API integration
  • Skilled in evaluating and tuning Elastic clusters, configurations, indexing, search performance, security, and administration
  • Proficient with Prometheus, Grafana, AWS observability tools, and their performance, security, and management
  • Experienced with security integrations (Windows SAML, LDAP, Kerberos) in Elasticsearch
  • Adept with AWS services: CloudWatch, CloudTrail, Kubernetes, Docker, Lambda
  • Integrated Elastic alerting with third-party ticketing tools
  • Experienced in implementing and integrating observability AI agents and frameworks for automated analysis, incident detection, and proactive resolution across complex systems