1

Customer Reliability Engineer Jobs in California

Coach other Reliability engineers and work with functional leadership in identifying and ... Customer focus and collaboration skills. * Excellent written and oral communication skills ...

Site Reliability Engineer

Mountain View, CA · Hybrid

$67.25 - $89.25/hr

Note: This is an internal-facing role -- no customer interaction. Must-Have: * 4+ years in SRE, DevOps, or Infrastructure Engineering * Solid experience with GCP or AWS (hybrid/on-prem a plus)

Reliability Engineer

Goleta, CA · On-site

$96K - $128K/yr

Job Summary: Responsible for Quality & Reliability Engineering activities supporting the ... Travels as required to suppliers, customers and/or outside labs as needed. * Member of cross ...

We are seeking a handson Reliability Engineering leader to establish, own, and execute reliability ... Support customer qualification activities and audits * Build, present, and defend reliability and ...

Site Reliability Engineer

San Francisco, CA · On-site +1

$67.25 - $89.25/hr

Our fast-growing customer base includes hundreds of modern software companies building the next generation of enterprise-ready products. About the Site Reliability Engineering Team The Site ...

Staff Reliability Engineer

San Jose, CA · On-site

$120K - $151K/yr

... customers remain ahead in a world where soaring energy demand and intensifying energy scarcity are ... We are looking for a Staff Reliability Engineer to join our team in one of today's most ...

Job Summary: Responsible for Quality & Reliability Engineering activities supporting the ... Travels as required to suppliers, customers and/or outside labs as needed. * Member of cross ...

Site Reliability Engineer

San Diego, CA · On-site

$60.50 - $80.50/hr

Our work focuses on helping customers reduce operational friction, improve resilience, and make ... The Site Reliability Engineer will focus on the execution and maintenance of reliability ...

Site Reliability Engineer

San Diego, CA · Remote

$60.50 - $80.50/hr

Our work focuses on helping customers reduce operational friction, improve resilience, and make ... The Site Reliability Engineer will focus on the execution and maintenance of reliability ...

Staff Reliability Engineer

San Jose, CA · On-site

$120K - $151K/yr

... customers remain ahead in a world where soaring energy demand and intensifying energy scarcity are ... We are looking for a Staff Reliability Engineer to join our team in one of today's most exciting ...

We are seeking a hands-on Reliability Engineering leader to establish, own, and execute reliability ... Support customer qualification activities and audits * Build, present, and defend reliability and ...

Senior SRE

San Francisco, CA · On-site

$127K - $191K/yr

... connected customer view with unmatched clarity and context while protecting precious brand and ... The Global SRE team is responsible for owning and supporting deployments of global products, and ...

Hardware Reliability Engineer

San Diego, CA · On-site

$108K - $137K/yr

Job Title: Hardware Reliability Engineer Job location: San Diego, CA Job Duration: 3 Months ... customer usage, identify high-risk failure modes, and determine the best mitigation * strategies ...

next page

Showing results 1-20

Customer Reliability Engineer information

See California salary details

$60.2K

$116.4K

$139.2K

How much do customer reliability engineer jobs pay per year?

As of Jun 15, 2026, the average yearly pay for customer reliability engineer in California is $116,428.00, according to ZipRecruiter salary data. Most workers in this role earn between $101,200.00 and $127,300.00 per year, depending on experience, location, and employer.

How does a Customer Reliability Engineer typically interact with both clients and internal engineering teams?

Customer Reliability Engineers serve as a vital bridge between clients and internal technical teams. They regularly communicate with customers to understand their needs, troubleshoot issues, and provide technical guidance. Internally, they collaborate closely with product, support, and development teams to relay customer feedback, help prioritize reliability improvements, and ensure seamless incident resolution. This cross-functional role requires strong communication skills and the ability to translate technical information for different audiences, making every day varied and impactful.

What does a customer reliability engineer do?

A customer reliability engineer (CRE) works with clients to ensure the reliability, performance, and availability of products or services. They analyze system issues, develop solutions, and collaborate with engineering teams to improve customer experience, often using monitoring tools and technical expertise. CREs typically have strong problem-solving skills and may hold certifications related to systems or cloud platforms.

What is the difference between Customer Reliability Engineer vs Site Reliability Engineer?

AspectCustomer Reliability EngineerSite Reliability Engineer
CredentialsTypically requires engineering degrees, certifications in cloud platforms (AWS, Azure), and knowledge of customer supportRequires engineering degrees, certifications in cloud and systems management, with a focus on infrastructure
Work EnvironmentCustomer-facing, involves direct interaction with clients to resolve issues and improve reliabilityPrimarily internal, focused on maintaining and improving system reliability and scalability
Employer & Industry UsageUsed by cloud service providers and tech companies with a customer support componentCommon in large tech companies managing large-scale infrastructure and services

The main difference is that Customer Reliability Engineers focus on ensuring customer satisfaction and resolving client-specific issues, while Site Reliability Engineers concentrate on internal system stability and scalability. Both roles require technical expertise and cloud knowledge but serve different operational needs.

How much does an SRE get paid?

SREs (Site Reliability Engineers) typically earn a median salary ranging from $100,000 to $150,000 annually, depending on experience, location, and company size. Senior SREs with specialized skills in automation, cloud platforms, and monitoring tools can earn higher compensation, often exceeding $180,000 per year.

What is a Customer Reliability Engineer?

A Customer Reliability Engineer (CRE) is a technical professional who works closely with customers to ensure the reliability, performance, and uptime of software products and services. CREs act as a bridge between customers and engineering teams, helping to identify, troubleshoot, and resolve reliability issues. They often collaborate with multiple departments to implement best practices, monitor systems, and proactively address potential problems, ultimately aiming to improve the overall customer experience.

What engineers make $500,000?

Senior-level engineers in fields such as software engineering, data engineering, and cloud infrastructure can earn $500,000 or more annually, especially with extensive experience, specialized skills, and in high-demand industries. Roles like Principal Engineer, Staff Engineer, or Engineering Manager often reach this compensation level, particularly in large tech companies or organizations with competitive benefits and stock options.

What are the key skills and qualifications needed to thrive as a Customer Reliability Engineer, and why are they important?

To thrive as a Customer Reliability Engineer, you need a solid background in systems engineering, incident management, and troubleshooting, often supported by a degree in computer science or related field. Familiarity with cloud platforms (such as AWS or GCP), monitoring tools (like Datadog or Prometheus), and automation scripts is typically required. Exceptional communication, problem-solving abilities, and a customer-centric mindset are vital soft skills for this role. These skills ensure efficient incident resolution, strong client relationships, and reliable system performance under pressure.

Will AI replace SRE jobs?

AI is expected to augment the work of Customer Reliability Engineers (CREs) by automating routine tasks such as monitoring, incident detection, and data analysis. However, CREs will continue to play a critical role in designing systems, troubleshooting complex issues, and making strategic decisions that require human judgment. AI tools are seen as complementary, not a replacement, for the skills and expertise of SREs and related roles.
What job categories do people searching Customer Reliability Engineer jobs in California look for? The top searched job categories for Customer Reliability Engineer jobs in California are:
What cities in California are hiring for Customer Reliability Engineer jobs? Cities in California with the most Customer Reliability Engineer job openings:

Staff Software Engineer - Reliability

Rubrik Job Board

Palo Alto, CA

$67 - $89/hr

Other

Posted 16 days ago


Job description

Job Description - Staff Site Reliability EngineerAbout Team & About Role

The Site Reliability Engineering (SRE) team at Rubrik ensures the absolute reliability, availability, performance, and security of our enterprise infrastructure services, spanning both global SaaS platforms and government-compliant environments. We operate at the intersection of software development and systems engineering, prioritizing hyperscale platform automation, self-healing architectures, and structural resiliency. As a Staff Site Reliability Engineer, you will serve as a primary technical leader and architect across our broader distributed cloud systems. You will drive long-term technical roadmaps, establish cross-organizational reliability standards, and solve complex distributed systems challenges that safeguard both enterprise and public sector environments. 

Beyond the core SRE charter, this Staff role also leads the Application-SRE team - a US-based group that partners closely with engineering, Sales, and Support to unblock POCs, drive complex customer escalations to resolution, and convert recurring field signals into engineering and reliability roadmap items. You will be the technical leader and project owner for Application-SRE: setting direction, tracking commitments, and ensuring the team operates as a high-leverage bridge between the field and the broader engineering org.

What You'll Do

As a Staff Site Reliability Engineer, you will possess engineering-wide influence and take ownership of the following critical areas:

  • Infrastructure Strategy & Architecture: Formulate and execute the architectural vision for Rubrik's Cloud Platform, optimizing backend infrastructure systems like Kubernetes, MySQL, and cloud-native services for performance, security, and multi-region scale.
  • Hyperscale Automation & Platform Tooling: Build, scale, and maintain sophisticated custom internal tools, platform controllers, and automation frameworks in Go or Python to systematically eliminate operational toil.
  • AI Infrastructure for SaaS: Deploy, scale, and operate the AI infrastructure that powers Rubrik's SaaS offerings, owning the reliability, performance, cost, and security controls required to run AI workloads in multi-tenant, compliance-bound environments.
  • AI for SRE & Engineering Productivity: Drive the adoption of AI-driven solutions across the SRE charter to compress toil and multiply the org - applying agentic and LLM-based approaches to automated triage, incident response, operational analysis, and developer productivity.
  • AI Adoption Guardrails for SaaS Reliability: Build the guardrails, controls, and platform patterns that keep Rubrik's SaaS reliable as AI adoption accelerates across product and engineering, ensuring new AI capabilities ship without eroding availability, performance, security, or cost posture.
  • Cross-Functional Leadership: Wield engineering-wide influence to create technical consensus among component, platform, and security engineering teams, effectively "shifting left" to embed structural resilience, capacity guards, and compliance from initial feature designs.
  • Reliability Governance: Define, audit, and enforce robust Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets across all critical enterprise platform services, translating telemetry insights into actionable product roadmaps during executive reviews.
  • Incident Command & Operations Review: Serve as a primary Incident Commander for high-severity cloud outages, establishing roles, directing mitigation vectors under pressure, and orchestrating comprehensive, blameless post-mortems that drive durable systemic fixes.
  • Cost Governance & Capacity Modeling: Architect cost-observability tools and attribution frameworks, leading cloud infrastructure capacity forecasting, resource quota optimization, and vendor SLA management.
  • Application-SRE Leadership: Set the technical direction for the Application-SRE team, raising the bar on how the team diagnoses, mitigates, and durably resolves the most complex customer-impacting issues across our platform.
  • Technical Multiplier & Mentorship: Champion SRE best practices, mentoring senior and junior individual contributors across the organization, participating in interview frameworks, and actively raising the collective technical bar.
  • On-Call Rotations: Participate in on-call rotations
Experience You'll Need
  • Citizenship & Residency: Must be a US Citizen currently residing on CONUS soil (strict regulatory requirement to enable support for federal and FedRAMP environments when required).
  • Education: BS, MS, or PhD in Computer Science, Computer Engineering, or a highly related technical discipline.
  • Industry Experience: A minimum of 8-12+ years of software engineering and production cloud infrastructure experience, with at least 5+ years dedicated to a formal SRE, DevOps, or Platform engineering role operating hyperscale SaaS products.
  • Technical Depth: Comprehensive, hands-on programming expertise in Golang, Python, or Java with a deep grasp of concurrency models, data structures, and test-driven software design patterns.
  • Distributed Systems Expertise: Proven proficiency designing, deploying, analyzing, and auditing complex, large-scale distributed systems, database topologies, and high-availability public cloud meshes.
  • Systems Internals: Authoritative operational command of Unix/Linux operating system environments (process models, file systems, kernels), systems administration, and advanced L4/L7 networking protocols.
  • AI Systems Fluency: Working knowledge of operating AI systems in production - including model serving, cost trade-offs, and the reliability and safety considerations of LLM- and agent-based workloads. Practical judgment on when AI is the right tool versus deterministic automation.
  • Field-to-Product Feedback Loop: Institutionalize the channel that converts patterns from customer escalations and POCs into prioritized product and reliability feedback, partnering directly with Product, Sales Engineering and Support leadership.
  • Customer & Field Fluency: Track record of partnering directly with Sales, Support, and customers on escalations and POCs, and translating field signals into engineering action.
  • Leadership Capability: Demonstrated history of technical leadership, mapping architectural dependencies, managing multi-team technical projects, and guiding organizations through critical platform shifts with high technical judgment.
Preferred Qualifications
  • Extensive production experience provisioning, lifecycle-managing, and recovering enterprise-scale Kubernetes (GKE, EKS) deployments and large-scale relational/non-relational databases (MySQL).
  • Prior experience building, certifying, or auditing infrastructure environments under compliance structures such as FedRAMP (High/Moderate), SOC 2, ISO 27001, or CJIS.
  • Fluency in Infrastructure-as-Code (Terraform, Pulumi) module design, multi-tenant state isolation, and enterprise observability fabrics (Prometheus, Grafana, OpenTelemetry).
  • Exposure to building AI- or LLM-powered internal tooling and applying it to SRE, operations, or engineering productivity use cases.
  • Familiarity with the operational considerations of running AI workloads on cloud and Kubernetes platforms.