Job Summary:
Ticketmaster is a global leader in live event ticketing, dedicated to connecting fans with the events they love. As a Lead Engineer in the Platform Enablement team, you will mentor engineers, improve operational practices, and enhance the developer platform for better team performance and resilience.
Responsibilities:
• Help engineering teams get more out of the developer platform by finding where they're stuck or underusing something, and working with them to fix problems.
• Coach teams on resilience and operational practice, including SLOs, error budgets, alerting philosophy, and production readiness, in the context of their own services.
• Build self-service workflows, templates, components, and documentation so common problems stop being tickets and start being solved by the platform.
• Improve how teams use observability and logging products by raising signal quality, tightening alerting, and helping them build dashboards that answer the questions they actually ask during incidents.
• Help teams improve their CI/CD and safely shorten the path from commit to production.
• Support teams adopting LLM and AI-assisted workflows in their daily engineering work, sharing patterns that work and building organizational skill as the landscape changes.
• Pair with teams on resilience-focused design and code reviews, guiding them toward simpler, safer architectures.
• Support incident analyses with partner teams, focusing on reducing the impact of contributing factors and implementing durable fixes.
• Mentor engineers through pairing, reviews, and coaching.
• Bring actionable feedback to the CSRE Platform to improve our products and integrations.
• Improve Enablement's own procedures and operating practices based on lessons learned.
Qualifications:
Required:
• Deep practical understanding of SRE principles, including building SLIs and SLOs, and error budgets in practice.
• Proven ability to lead cross-team technical work and influence with situational authority.
• Strong experience designing and troubleshooting distributed systems with cross-service failure modes.
• Experience improving observability and alerting in production, including signal quality and useful dashboards.
• Comfortable working with systems running in on-premises data centers.
• Strong cloud native experience, including governance and cost trade-offs.
• Ability to design resilience and operational automation and tooling that is reusable and adopted by multiple teams.
• Experience with production readiness and resilience practices, including DR validation and controlled testing.
• Strong software engineering fundamentals with the ability to deliver and review high-quality changes in enterprise codebases.
• Strong incident analysis skills focused on contributing factors and impact reduction.
• Experience working with LLM and AI-assisted tooling for engineering work, with judgment about where it helps and where it does not.
• Excellent written and verbal communication, including clear procedures, useful design docs, and exec-ready summaries.
Company:
Ticketmaster is a ticket sales and distribution company that sells tickets for concerts, sports, and events. Founded in 1976, the company is headquartered in West Hollywood, USA, with a team of 10001+ employees. The company is currently Late Stage.