1

Observability Site Reliability Engineer Jobs (NOW HIRING)

SITE RELIABILITY ENGINEER

Camden, NJ · On-site

$130K - $150K/yr

Site Reliability Engineer (SRE) Engineer Reliability into the Systems That Move the Nation's Food ... Observability across the full stack, correlating cloud services, APIs, and on-premise facility ...

Site Reliability Engineer

Plano, TX · On-site

$54.50 - $72.50/hr

Site Reliability Engineering (SRE): • Ensure high availability, scalability, and reliability of distributed systems. • Implement observability, logging, and monitoring using tools like Prometheus ...

Site Reliability Engineer

Beaverton, OR · Hybrid

$59.25 - $78.75/hr

Overview As our Site Reliability Engineer, you'll help drive Concora Credit's Mission to enable ... Contribute to cloud reliability through automation, observability, incident reduction, capacity ...

Site Reliability Engineer (SRE)

Plano, TX · On-site

$54.50 - $72.50/hr

Site Reliability Engineer (SRE) Location: Richmond, VA or Plano, TX Work Model: Hybrid - 3 days ... Observability tools (Prometheus, Grafana, Datadog, CloudWatch) * Experience working in highly ...

Site Reliability Engineer

Beaverton, OR · Hybrid

$59.25 - $78.75/hr

As our Site Reliability Engineer, you'll help drive Concora Credit's Mission to enable customers to ... Automation, Observability, and Continuous Improvement: • Improve operational efficiency through ...

Site Reliability Engineer

Beaverton, OR · Hybrid

$59.25 - $78.75/hr

Overview As our Site Reliability Engineer, you'll help drive Concora Credit's Mission to enable ... Contribute to cloud reliability through automation, observability, incident reduction, capacity ...

Site Reliability Engineer

Beaverton, OR · On-site

$59.25 - $78.75/hr

Overview As our Site Reliability Engineer, you'll help drive Concora Credit's Mission to enable ... Automation, Observability, and Continuous Improvement: • Improve operational efficiency through ...

Site Reliability Engineer

Birmingham, AL · Hybrid

$53.50 - $71/hr

Direct Hire -Site Reliability Engineer. This is a hybrid (possible remote) opportunity, working ... Observability & Performance: Leverage monitoring platforms to proactively identify risks, resolve ...

Observability adoption (OpenTelemetry, Dynatrace) * Reliability engineering practices * Platform standardization and automation The ideal candidate combines software engineering expertise with SRE ...

This role focuses on automation, observability, and incident response while upholding strict Service Level Objectives (SLOs). The SRE will help build resilient systems that scale, automate manual ...

Site Reliability Engineer

Frederick, MD · On-site

$56.75 - $75.25/hr

Design and implement enterprise-grade monitoring and observability frameworks (metrics, logs ... Champion DevOps and SRE practices including Infrastructure as Code, CI/CD, observability, and ...

next page

Showing results 1-20

Observability Site Reliability Engineer information

See salary details

$10

$63

$91

How much do observability site reliability engineer jobs pay per hour?

As of Jun 29, 2026, the average hourly pay for observability site reliability engineer in the United States is $63.74, according to ZipRecruiter salary data. Most workers in this role earn between $54.81 and $72.84 per hour, depending on experience, location, and employer.

What is the difference between Observability Site Reliability Engineer vs Monitoring Engineer?

AspectObservability Site Reliability EngineerMonitoring Engineer
FocusEnsuring system reliability through observability, automation, and incident responseImplementing and managing monitoring tools and dashboards
SkillsCloud platforms, scripting, incident management, observability toolsMonitoring tools, alerting systems, data analysis
Work EnvironmentDevOps teams, cloud infrastructure, large-scale systemsOperations teams, infrastructure monitoring

While both roles involve system health, the Observability Site Reliability Engineer focuses on comprehensive system reliability using observability practices, whereas Monitoring Engineers primarily manage monitoring tools and alerts. The SRE role emphasizes automation, incident response, and system resilience, making it broader in scope.

More about Observability Site Reliability Engineer jobs
What cities are hiring for Observability Site Reliability Engineer jobs? Cities with the most Observability Site Reliability Engineer job openings:
What states have the most Observability Site Reliability Engineer jobs? States with the most job openings for Observability Site Reliability Engineer jobs include:
What job categories do people searching Observability Site Reliability Engineer jobs look for? The top searched job categories for Observability Site Reliability Engineer jobs are:
SITE RELIABILITY ENGINEER

SITE RELIABILITY ENGINEER

United States Cold Storage Inc

Camden, NJ

$130K - $150K/yr

Full-time

Posted 12 days ago


United States Cold Storage rating

7.8

Company rating: 7.8 out of 10

Based on 49 frontline employees who took The Breakroom Quiz

85th of 345 rated logistics


Job description

Site Reliability Engineer (SRE)
Engineer Reliability into the Systems That Move the Nation’s Food SupplyWho We AreUS Cold owns and operates one of the most complex temperature-controlled logistics networks in North America. Every day, our systems coordinate the storage and movement of food at national scale across a network of state-of-the-art distribution centers, including multiple highly automated warehouse facilities.We continue to advance our core warehouse and logistics platforms. Our current focus is on modular, event-driven, API-first and cloud architectures. We continue to enhance reliability and accelerate engineering productivity by strengthening our SRE and AI practices. This is a large investment in innovation to continue to drive operational excellence at our facilities.If you want to build durable systems that operate in the physical world at scale, this is that opportunity. The RoleThe Site Reliability Engineer is a founding member of US Cold’s SRE practice.This role exists to move the organization from reactive operations to engineered reliability. You will study how our most critical systems fail — particularly our Phenix WMS and facility automation interfaces — and design controls, automation, and observability that reduce incidents over time.Success in this role means fewer false alerts, faster recovery, less manual intervention, and systems that heal themselves when possible.You will work closely with application, infrastructure, and operations teams and participate directly in on‑call and incident response.What You Will Own
  • Reliability of the Phenix WMS and its integration with facility automation systems (robotics, conveyors, and control interfaces)
  • Definition and implementation of SLIs and SLOs that measure meaningful system health, not just availability
  • Observability across the full stack, correlating cloud services, APIs, and on‑premise facility operations
  • Automation to eliminate operational toil, including patching, data corrections, restarts, and recovery tasks
  • Development of self‑healing behaviors for common failure modes
  • Participation in on‑call rotations and leadership of blameless post‑incident reviews
  • Design and execution of disaster recovery tests across SaaS, cloud, and on‑premise environments
This is hands‑on reliability engineering. The systems you improve will directly impact daily warehouse operations.Technical Environment
  • Hybrid environments spanning cloud and on‑premise infrastructure
  • Azure cloud services
  • Warehouse Management Systems (Phenix WMS) and facility automation interfaces
  • Observability tooling across logs, metrics, and alerting
  • Automation using Python, PowerShell, Bash, or Ansible
  • CI/CD tools and modern deployment practices
  • Exposure to containerized and distributed systems environments
What We’re Looking For
  • 3+ years of experience in SRE, DevOps, Systems Engineering, or related roles
  • Strong Linux and Windows systems administration and troubleshooting skills
  • Hands‑on experience with automation and scripting
  • Experience designing and operating monitoring, alerting, and observability solutions
  • Practical experience working in Azure environments
  • Strong analytical skills and a bias toward eliminating root causes, not symptoms
  • Ability to collaborate across application, infrastructure, and operations teams
  • Experience supporting warehouse management systems or industrial automation platforms
  • Exposure to Kubernetes, microservices, or container orchestration
  • Hands on experience with infrastructure‑as‑code tools such as Terraform or Ansible
  • Understanding of distributed systems and high‑availability design
  • Experience with SRE practices such as SLO‑based operations, runbook automation, or chaos testing
Why This Role Is DifferentThis is not an inherited SRE function.
There is no mature framework to maintain.You will:
  • Help define what reliability means at US Cold
  • Work on systems that operate in the physical world
  • Engineer solutions that reduce toil and operational load
  • See the direct impact of your work on warehouse uptime and performance
  • Build practices that scale as the platform modernizes
This is an opportunity to grow as an SRE while helping establish the reliability foundation of a mission‑critical platform.Compensation & Structure
  • Location: Hybrid – Camden NJ
  • Reports to: IT – Site Reliability Engineering Manager
  • Salary Range: $130,000- $150,000
Operational Context
  • Systems operate continuously across warehouse facilities
  • Reliability failures have physical and operational consequences
  • On‑call participation is part of the role
  • Work occurs across cloud, SaaS, and on‑premise environments

What United States Cold Storage employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom