1

Observability Manager Jobs in Arizona (NOW HIRING)

GCP Kubernetes SRE

Scottsdale, AZ · On-site

$57.50 - $76.25/hr

Implement and manage infrastructure using IaC principles with Terraform, Helm, and GitHub Actions. * Monitor system performance and health using Prometheus and Grafana observability tools. * Apply AI ...

Observability Mastery: Deep knowledge of observation concepts (APM, Infrastructure, Log Management) and the ability to articulate technical value to engineers and leadership. * Consultative Influence:

... observability frameworks for LLMs using tools such as LangGraph, MLflow, Azure ML, and Databricks. • Collaborate with business domain managers to define product requirements and identify AI ...

... observability, or AI platform readiness. Familiarity with SRE practices including SLIs, SLOs, incident response, problem management, post-incident reviews, and toil reduction. Exposure to IAM, cloud ...

Data Engineer

Phoenix, AZ

$113K - $136K/yr

Enforce data quality through automated testing, validation, and observability frameworks. Manage version control using Git and maintain deployment pipelines. Ensure data security and governance ...

... observability Implement built-in resiliency, observability, and enable FinOps as a part of ... management and operations, to deliver the engineering roadmap in Agile model Partner with ...

DevOps Engineer

Scottsdale, AZ · On-site

$53.25 - $72.75/hr

SigNoz as our primary observability platform * Datastores: MySQL, PostgreSQL or similar relational databases, plus other managed services as required Responsibilities: * Design, maintain, and ...

Implement defenseindepth security controls such as multifactor authentication (MFA), token lifecycle management, and leastprivilege access * Improve system observability through metrics, structured ...

Lead Data Engineer

Phoenix, AZ

$113K - $136K/yr

Review, approve, and manage code deployments across environments including production and DR * Enforce testing, automation, monitoring, and observability best practices * Mentor junior engineers and ...

next page

Showing results 1-20

Observability Manager information

What is the difference between Observability Manager vs Site Reliability Engineer?

AspectObservability ManagerSite Reliability Engineer
CredentialsTypically requires experience in monitoring, logging, and cloud tools; certifications like AWS, Google Cloud, or Kubernetes are commonRequires strong background in systems engineering, scripting, and cloud platforms; certifications like AWS, GCP, or Linux are often preferred
Work EnvironmentFocuses on overseeing observability tools, data analysis, and team coordination in tech environmentsHands-on role involving system automation, incident response, and infrastructure reliability
Industry UsageUsed across tech companies to improve system visibility and performanceCommon in DevOps and SRE teams to ensure system reliability and uptime

The Observability Manager primarily oversees monitoring and logging strategies, ensuring system visibility, while the Site Reliability Engineer is more hands-on, focusing on automating infrastructure and maintaining system reliability. Both roles require technical expertise and often collaborate closely but differ in scope and daily responsibilities.

What are the most commonly searched types of Observability jobs in Arizona? The most popular types of Observability jobs in Arizona are:
What are popular job titles related to Observability Manager jobs in Arizona? For Observability Manager jobs in Arizona, the most frequently searched job titles are:
What job categories do people searching Observability Manager jobs in Arizona look for? The top searched job categories for Observability Manager jobs in Arizona are:
What cities in Arizona are hiring for Observability Manager jobs? Cities in Arizona with the most Observability Manager job openings:

GCP Kubernetes SRE

Prophecy Technologies

Scottsdale, AZ • On-site

$57.50 - $76.25/hr

Full-time

Posted 29 days ago


Job description

Role Overview:
This role is for a highly skilled Site Reliability Engineer with strong expertise in Kubernetes and Google Cloud Platform (GCP), specifically GKE. The position requires a deep understanding of infrastructure as code (IaC) using Terraform, Helm, and GitHub Actions, alongside proficiency in Python, Ansible, and Node.js. The engineer will be crucial in maintaining and enhancing observability stacks with Prometheus and Grafana, ensuring robust Linux systems and networking fundamentals, and contributing to automation and CI/CD pipelines. A significant aspect of the role involves applying AI/ML concepts and AIOps practices to improve system reliability and incident management.
Key Responsibilities:
  • Manage incidents, provide on-call support, and perform production triage to ensure system stability.
  • Develop and maintain automation scripts and CI/CD pipelines for efficient software delivery and infrastructure management.
  • Implement and manage infrastructure using IaC principles with Terraform, Helm, and GitHub Actions.
  • Monitor system performance and health using Prometheus and Grafana observability tools.
  • Apply AI/ML concepts and AIOps practices, including model lifecycle management, monitoring, and AI-driven alerting, to enhance operational efficiency.
  • Support and operate ML/AI platforms or pipelines (MLOps) and integrate AI-driven automation into monitoring and incident response.

Required Skills:
  • Strong experience with Kubernetes and GCP (GKE).
  • Strong experience in IaC (Terraform), Helm, and GitHub Actions.
  • Proficiency in Python, Ansible, Node.js.
  • Strong experience with Prometheus and Grafana observability stack.
  • Solid understanding of Linux systems and networking fundamentals.
  • Experience in incident management, on-call support, and production triage.
  • Hands-on experience with automation and CI/CD pipelines.
  • Strong understanding of AI/ML concepts and AIOps practices (model lifecycle, monitoring, or AI-driven alerting).

Qualifications:
  • 10+ years of experience in Site Reliability Engineering or a related field.
  • Google Cloud Architect Certification (Preferred).
  • Certified Kubernetes Administrator (CKA) (Preferred).

Preferred Skills:
  • Experience in Java/J2EE, Spring Boot.
  • Experience supporting or operating ML/AI platforms or pipelines (MLOps).
  • Exposure to AIOps tools, anomaly detection, or predictive analytics systems.
  • Experience with large-scale distributed systems and microservices architecture.
  • Experience with GPU-based workloads or ML infrastructure on GCP.
  • Knowledge of Kubeflow, Vertex AI, or ML pipelines.
  • Experience integrating AI-driven automation into monitoring and incident response.