Role OverviewThe Staff Site Reliability Engineer, Infrastructure role is building a high-scale infrastructure team responsible for owning environments with thousands of nodes. Candidates must have experience operating at this scale and leading infrastructure through significant transformation-especially moving from monoliths to scalable microservices and platform/API architectures.
Our current team is shifting from reactive support to proactive engineering. Our product demands reliability and we need someone who understands what that truly means. You'll help unify a complex infrastructure stack, build systems that guide application teams toward best practices, and drive cross-functional efforts across product, app, and integration teams.
If you've led large-scale growth, know how to turn infrastructure chaos into scalable systems, and can show others what "good" looks like-we want to talk.
The position will work from our Mountain View, CA office five days per week.
Responsibilities- Execute on the transformation from monolith to scalable microservices (API/Platform focus).
- Drive initiatives to continually improve reliability, with a deep understanding of the implications of each "9."
- Architect systems and write code that enables application teams to adopt best practices by default-not by instruction.
- Integrate and unify diverse infrastructure components into a cohesive, scalable platform within a massive tech stack.
- Design observability, reliability, and CI/CD frameworks to support growth and operational excellence at scale.
- Collaborate cross-functionally with product, application, and integration teams to align infrastructure direction with business goals.
- Provide technical leadership to shift the team from reactive support to a proactive, strategic function.
- Mentor and guide a team of 6 engineers while shaping the direction of infrastructure engineering.
Minimum Qualifications - Bachelor's degree in Computer Science or related field of study.
- At least 10 years of hands-on coding experience in building internal platforms/tools to support developer experience and operational best practices.
- At least 5 years of experience in cloud platforms-GCP preferred, AWS acceptable; cloud engineering background required.
Preferred Qualifications- Proven experience scaling infrastructure in environments with many thousands of nodes.
- Track record of leading architectural shifts from monolithic systems to microservices in large-scale environments.
- Deep knowledge of reliability engineering and high-availability systems; able to articulate the impact of increasing the number of 9s.
- Strong understanding of first-party infrastructure integration and unifying disparate systems.
- Familiarity with observability, CI/CD tooling, and infrastructure automation.
- Experience at large-scale tech companies (Google, Meta, Amazon, etc.) or equivalent environments highly preferred.
- Strong cross-functional collaboration skills and the ability to drive infrastructure alignment across engineering orgs.