2

Remote Observability Engineer Jobs in Arizona (NOW HIRING)

Posting Type Hybrid/Remote Job Overview WHO WE ARE Relativity is a leading legal data intelligence ... Improve system observability through metrics, structured logging, dashboards, and alerting

Senior Software Engineer II

Phoenix, AZ ยท On-site +1

$197K - $232K/yr

Remote Department Engineering Compensation: $197.4K - $232K โ€ข Offers Equity At Confluent, we are ... Improve service reliability and operations by defining SLOs/SLAs, strengthening observability, and ...

Senior AI/ML Engineer

Phoenix, AZ ยท On-site +1

$103K - $142K/yr

Remote/Hybrid: This role is based remotely but if you live within a 50-mile radius of Sunnyvale, CA ... Experience with A/B testing and telemetry/observability systems to measure impact and reliability.

Senior Software Engineer

Phoenix, AZ ยท On-site +1

$121K - $160K/yr

Posting Type Hybrid/Remote Job Overview Who We Are Relativity is a leading legal data intelligence ... Solid understanding of CI/CD, observability, and incident response. * Experience guiding and ...

Senior Software Engineer -GCP

Phoenix, AZ ยท Remote

$121K - $160K/yr

Build and maintain the observability, deployment pipelines, and automation that support multi ... For positions with Remote-US locations, the actual salary range for the position may differ based ...

Senior IT Infrastructure Engineer

Phoenix, AZ ยท Remote

$107K - $146K/yr

... observability. For more information, visit www.enterprisedb.com Candidate Note ... This role is 100% remote for candidates based in EST or CST only We are looking for a confident ...

next page

Showing results 1-20

Remote Observability Engineer information

What are the typical collaboration patterns for a Remote Observability Engineer working with distributed teams?

Remote Observability Engineers frequently collaborate with software developers, DevOps teams, and IT operations to ensure systems are monitored effectively and issues are detected early. Working remotely, you'll often use communication tools like Slack, Jira, and video conferencing to coordinate incident response, discuss monitoring strategies, and review system health dashboards. Regular sync meetings and asynchronous updates are common, and you'll likely contribute to documentation and knowledge sharing to keep all stakeholders informed. Building strong communication habits is important, as much of the troubleshooting and improvement work hinges on clear coordination with multiple teams.

What are the key skills and qualifications needed to thrive as a Remote Observability Engineer, and why are they important?

To thrive as a Remote Observability Engineer, you need strong expertise in monitoring, logging, and tracing systems, along with a background in computer science or related technical fields. Familiarity with tools like Prometheus, Grafana, ELK Stack, Datadog, and cloud platforms is typically required, as well as relevant certifications such as AWS Certified Cloud Practitioner or Google Cloud Professional DevOps Engineer. Excellent problem-solving abilities, communication skills, and a proactive mindset help you detect and resolve issues before they impact users. These competencies ensure system reliability, enable rapid incident response, and support seamless collaboration in distributed environments.

What is the difference between Remote Observability Engineer vs Site Reliability Engineer?

AspectRemote Observability EngineerSite Reliability Engineer
CredentialsKnowledge of monitoring tools, scripting, cloud platformsSame as Observability Engineer, plus SRE certifications often preferred
Work EnvironmentFocus on monitoring, logging, and tracing systems remotelyBroader scope including system reliability, incident response, and automation
Industry UsagePrimarily in tech, SaaS, cloud servicesWidely in tech, finance, and large-scale online services

The Remote Observability Engineer specializes in monitoring and analyzing system performance remotely, focusing on tools like logs and metrics. In contrast, the Site Reliability Engineer has a broader role, ensuring overall system reliability, automation, and incident management. While both roles require similar technical skills, SREs often have additional responsibilities related to system resilience and scalability.

What is a Remote Observability Engineer?

A Remote Observability Engineer is a professional responsible for designing, implementing, and maintaining systems that monitor the health, performance, and reliability of software applications and infrastructure from a remote location. They use observability tools to collect and analyze logs, metrics, and traces, helping organizations quickly detect and resolve issues. Their work ensures that distributed systems are transparent, reliable, and efficient, often collaborating with development, operations, and security teams. Remote Observability Engineers often work from anywhere, leveraging cloud-based tools and platforms to manage complex IT environments.
What are the most commonly searched types of Observability Engineer jobs in Arizona? The most popular types of Observability Engineer jobs in Arizona are:
What job categories do people searching Remote Observability Engineer jobs in Arizona look for? The top searched job categories for Remote Observability Engineer jobs in Arizona are:
What cities in Arizona are hiring for Remote Observability Engineer jobs? Cities in Arizona with the most Remote Observability Engineer job openings:

Principal Dev Ops Engineer

Iridium Satellite, LLC

Tempe, AZ โ€ข Remote

Full-time

Posted 10 days ago


Job description

Company Overview

Iridium is an award-winning and innovative satellite communications company with bragging rights to the only network that offers voice and data connectivity anywhere in the world.ย ย  For over 20 years, Iridium's unique network and services have supported critical communications needs for individuals, businesses, and the evolving Internet of Things.

At Iridium, we understand the importance of staying connected and the limitations of traditional communications networks. People across the globe, including first responders, humanitarians, global militaries, scientific researchers, and lone workers, as well as ships, aircraft and remote operations all rely on Iridium to stay connected. We take our responsibility for providing these essential communications very seriously and pride ourselves on offering a reliable lifeline when needed.ย  Likewise, Iridium is committed to providing an exciting and innovative workplace, where employees are challenged to think outside the box and collaborate on new, bold ideas and solutions.ย  Our talented teams are passionate about their work and the impact our company makes around the world.ย  Iridium fosters an empowering and inclusive culture that allows employees to genuinely be their best selves.ย ย  We are looking for others who want to join this truly unique company that celebrates our employees and provides the opportunity to truly make a difference in the world.

What We're Looking For:

We are seeking a highly skilled Principal DevOps Engineer to lead the strategy, design, and evolution of DevOps practices supporting our cloud-native Open RAN and 4G/5G Core network. In this role, you will set the technical direction for CI/CD, infrastructure-as-code, automation, and observability frameworks that enable reliable, scalable operations across Core, RAN, Transport, and Cloud domains.ย  You will define and implement greenfield CI/CD pipelines, establish standardized automation and monitoring approaches, and create advanced telemetry, alerting, and automated remediation capabilities. Through close partnership with NOC Operations, Engineering, Cloud, Development, and Test teams, you will help drive operational excellence, reduce Mean Time to Repair (MTTR), and minimize alert fatigue. As a technical leader within the Gateway organization, you will provide governance, best practices, and handson expertise to teams across global time zones. The ideal candidate brings deep experience with cloudnative architectures, Kubernetes, CI/CD, telemetry pipelines, and infrastructureascode, along with familiarity in telecom network environments and Agile practices.

What You'll Do:

Cloud & CI/CD Enablement

  • Lead the design and implementation of CI/CD pipelines supporting cloud-native and G-RAN deployments
  • Manage Kubernetes environments (EKS and on-prem) by:
    • Monitoring CNF health
    • Automating scaling policies
    • Optimizing resource allocation
  • Implement Infrastructure-as-Code solutions using Terraform and Ansible to deploy and maintain monitoring and observability stacks
  • Integrate observability platforms and tools into operational workflows to strengthen visibility and diagnostic capabilitiesย 

Observability & Monitoring Architecture

  • Design and enhance observability frameworks using:
    • Grafana dashboards and alert correlation
    • Health checks/Back Ups etc.
    • Core CDR dashboards (IMS & Packet Core)
    • Viavi probe integrations
    • SolarWinds telemetry feeds
  • Build unified dashboards that provide nationallevel visibility and realtime health insights
  • Optimize alarm thresholds and event correlation to reduce false positives and alert storms
  • Implement structured logging, metrics, and distributed tracing for cloudnative network functions

Automation & Self-Healing Engineering

  • Develop automation using Python, Bash, or Go to:
    • Auto-triage common alarms
    • Perform health validations
    • Trigger corrective actions and workflows
  • Build eventdriven automation using Kafka feeds from Mavenir and Gatehouse OSS systems
  • Implement automated remediation for common failure scenarios (e.g., pod restarts, resource exhaustion, signaling retries) to reduce manual NOC intervention
  • Reduce manual NOC intervention through closed-loop automation
  • Implement Infrastructure as Code (Terraform/Ansible) for monitoring stack deployments
  • Integrate observability tools into DevSecOps workflows

Incident & Reliability Engineering

  • Support Major Incident Management by providing telemetry insights, automated diagnostics, and postincident analyses
  • Perform post-incident analysis using logs, traces, and performance metrics
  • Drive improvements that reduce MTTD and MTTR
  • Partner with Core, RAN, Transport, and Cloud engineering teams to prevent recurring issues through rootcause analysis

Leadership & Continuous Improvement

  • Mentor junior DevOps and NOC engineers in automation, observability, and DevOps best practices
  • Develop reusable automation frameworks and operational standards
  • Document playbooks, reference architectures, and bestpractice patterns to mature operations from reactive to predictive
What You'll Need to Succeed:
  • Bachelor's degree in Engineering, Computer Science, Telecommunications, or related field
  • 10+ years of experience in DevOps, Site Reliability Engineering, or network automation roles supporting cloudnative environments
  • Strong proficiency with CI/CD pipeline management, Infrastructure-as-Code frameworks, and containerized deployments
  • Hands-on experience with Kubernetes (EKS and on-prem K8s) and Docker-based cloud-native network functions (CNFs)
  • Proficiency with AWS cloud services
  • Advanced Python scripting skills, with additional experience in Bash or Go
  • Experience building Grafana dashboards, alerting logic, and observability workflows
  • Familiarity with Kafka-based event streaming architectures
  • Strong Linux system administration skills
  • Strong understanding of telecom architecture, including 4G EPC, 5G Core, IMS, Open RAN
  • Experience integrating and operationalizing probe-based observability solutions (e.g., Viavi)
  • Deep understanding of monitoring concepts, including metrics, logs, traces, and APM
  • Excellent communication skills, with the ability to convey products, deliverables, analyses, and/or issues clearly and confidently, and recognize and adapt to different communication techniques
  • Be able to analyze a situation or problem, generate effective solutions, and see those solutions through to completion
  • Must possess the creativity and resourcefulness needed to make reliable decisions and determine methods on new assignments
  • Can thrive in a dynamic environment by handling multiple tasks and managing shifting priorities
  • Be proactive in sharing knowledge you've learned with others
Things Thatย Would beย Greatย if You Brought to the Table:
  • Experience supporting Mavenir 4G/5G Core in production
  • Knowledge of SIP, Diameter, GTP, HTTP/2, PFCP protocols
  • Experience with Prometheus, ELK stack, or OpenTelemetry
  • CI/CD experience (GitLab, Jenkins, ArgoCD)
  • Kubernetes certification (CKA/CKAD)
  • AWS certifications
  • Experience building closed-loop automation for telecom NOCs
We'll also need you to:
  • Participate in on-call rotations for automation platform support
  • Support major incidents requiring automation troubleshooting
  • Travel up to 10% if needed
Work Environment:

This position primarily works in an office setting and is largely sedentary with the majority of the position working with a computer. The role typically requires the use of basic office equipment such as a phone, video, computer, keyboard, mouse, and printer.ย 

We believe in-person connection drives innovation, strengthens mentorship, and builds culture, while flexibility enables employees to do their best work. Under Iridium's Hybrid Work Policy, employees are expected to work at least three days per week (approximately 60%) in an Iridium office to support collaboration, relationship-building, and professional growth.

ย 

Additional Information

This job description outlines the general nature and level of work for this role and is not a comprehensive list of duties, responsibilities, or qualifications. Employees may be assigned additional responsibilities as needed.

Iridium is an Equal Opportunity Employer, including individuals with disabilities and protected veterans.

Employment Type: FULL_TIME