1

Observability Manager Jobs in Texas (NOW HIRING)

Senior Software Engineer - Observability

Austin, TX · On-site

$121K - $160K/yr

In this role, you will design and develop the observability platforms, reliability tooling, and AI ... management, and post-incident reviewStrong software design skills - you write clean, well-tested ...

Lead Observability Engineer (Grafana Cloud)

Coppell, TX · Hybrid

$95K - $125K/yr

Incident Management : Responding to and managing incidents, performing root cause analysis, and ... Participate in user training to increase awareness of observability solutions * Ensuring incident ...

SolarWinds is seeking a strategic and data-driven Senior Manager of SEO to drive organic search performance across our Monitoring and Observability Business Unit, and the SolarWinds website as a ...

Lead Observability Engineer (Grafana Cloud)

Coppell, TX · On-site

$95K - $125K/yr

Incident Management : Responding to and managing incidents, performing root cause analysis, and ... Participate in user training to increase awareness of observability solutions * Ensuring incident ...

next page

Showing results 1-20

Observability Manager information

What is the difference between Observability Manager vs Site Reliability Engineer?

AspectObservability ManagerSite Reliability Engineer
CredentialsTypically requires experience in monitoring, logging, and cloud tools; certifications like AWS, Google Cloud, or Kubernetes are commonRequires strong background in systems engineering, scripting, and cloud platforms; certifications like AWS, GCP, or Linux are often preferred
Work EnvironmentFocuses on overseeing observability tools, data analysis, and team coordination in tech environmentsHands-on role involving system automation, incident response, and infrastructure reliability
Industry UsageUsed across tech companies to improve system visibility and performanceCommon in DevOps and SRE teams to ensure system reliability and uptime

The Observability Manager primarily oversees monitoring and logging strategies, ensuring system visibility, while the Site Reliability Engineer is more hands-on, focusing on automating infrastructure and maintaining system reliability. Both roles require technical expertise and often collaborate closely but differ in scope and daily responsibilities.

What are the most commonly searched types of Observability jobs in Texas? The most popular types of Observability jobs in Texas are:
What job categories do people searching Observability Manager jobs in Texas look for? The top searched job categories for Observability Manager jobs in Texas are:
What cities in Texas are hiring for Observability Manager jobs? Cities in Texas with the most Observability Manager job openings:
Principal Engineer - Observability Telemetry Client Infrastructure

Principal Engineer - Observability Telemetry Client Infrastructure

Cloudera, Inc.

Austin, TX • On-site, Remote

Full-time

PTO

Posted 16 days ago


Job description

Business Area:
Engineering
Seniority Level:
Director
Job Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world's largest enterprises.
Cloudera is seeking a Principal Engineer to serve as the primary architect and visionary for our Observability Telemetry client interactions framework as we build a multi-tenant, high-throughput telemetry fabric to support the world's largest data estates. In this role, you will lead the evolution of Cloudera's Observability product by designing and building a "self-service" Open Telemetry-based ecosystem that allows internal clients to integrate telemetry data seamlessly for multiple downstream consumers.
You will architect the self-service interfaces that allow thousands of distributed components to emit high-cardinality logs, metrics, and traces that are automatically correlated, context-aware, and ready for downstream analysis in ClickHouse and other massive-scale engines. You will be Cloudera's voice in the CNCF/OTel community, influencing the direction of open-source observability to meet the needs of hybrid-cloud data platforms.
As a technical leader, you will be responsible for defining the semantic conventions to enable log events, metrics, spans, and traces from diverse, multi-language clients to be correlated into a unified, actionable view for customers, Support and Cloudera product engineering. This data is the foundation for troubleshooting, forecasting, workload analysis, financial governance, and other administrative functions. You will also work closely with open source products for integration and can influence OTel integration directions in the open source ecosystem for these components
This position is a high-visibility role requiring a blend of deep systems architecture, hands-on implementation, and cross-organizational influence to ensure our telemetry infrastructure scales with the world's most complex data workloads.
This role is not eligible for immigration sponsorship.
As an Observability Telemetry Principal Engineer you will:
  • Architect and drive the implementation of automated "on-ramps" for observability clients that handles the complexity of multi-cloud, hybrid environments without sacrificing performance, ensuring teams can integrate their services with minimal friction.
  • Establish and enforce the semantic conventions needed to ensure telemetry data carries the appropriate context for easy correlation across the entire Cloudera stack.
  • Develop and support high-performance interfaces and SDKs for clients across various languages (Java, Go, Python, etc.) to contribute high-fidelity signals.
  • Build the logic to stitch together disparate signals into a unified trace, enabling deep-dive workload analysis and financial governance across massive distributed systems.
  • Work alongside engineering teams to turn architectural blueprints into production reality, conducting deep-dive code reviews and resolving complex systemic bottlenecks.
  • Serve as the "go-to" expert for observability, resolving technical disagreements and making high-stakes decisions on the future of our telemetry platform.

We are excited if you have: (Required Experience)
  • 10+ years of experience (or equivalent advanced degree + experience) designing and maintaining large-scale distributed systems and observability platforms.
  • A proven track record of designing and shipping complex, critical features that serve as foundational infrastructure for other engineering teams.
  • Deep, hands-on experience with the OpenTelemetry Collector architecture, custom processors, and the challenges of high-cardinality data.
  • Experience with high-volume OLAP engines (e.g., ClickHouse, StarRocks) and an understanding of how to structure telemetry data for sub-second queries at large scale.
  • Excellent communication and collaboration skills and the ability to build relationships across the company to drive adoption of new standards and remove technical roadblocks.
  • The ability to map business requirements to technical roadmaps, ensuring our observability tools support Cloudera's long-term strategic goals.
  • Experience coaching senior and staff-level engineers, acting as a "force multiplier" for a technical organization.
  • Bsc/Msc in related field or equivalent experience

You might have:
  • Significant contributions to major observability or data projects (e.g., CNCF or Apache projects). Bonus points if you're already a CNCF OTel maintainer.
  • Deep experience with Kubernetes-native observability and managing telemetry at scale in hybrid-cloud environments.
  • Experience representing technical initiatives at industry conferences or internal company-wide summits.
  • Experience using machine learning or advanced analytics to derive "AIOps" insights from raw telemetry data.

Why this role matters:
At Cloudera, our customers manage some of the largest and most complex data estates in the world. Without robust, correlated observability, managing these environments is an impossible task. This role is the linchpin of our visibility strategy.
By building a self-service, standardized telemetry framework, you are not just helping one team; you are empowering every developer at Cloudera and every one of our customers to understand their data life cycle. Your work ensures that when a performance bottleneck occurs or a system fails, the path to resolution is visible, traceable, and immediate. You are building the "nervous system" of the Cloudera Data Platform.
What you can expect from us:
  • Generous PTO Policy
  • Support work life balance with Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups

EEO/VEVRAA
#LI-CP1
#LI-HYBRID