The Systems Engineer - Site Reliability Engineering (SRE) is responsible for the reliability, scalability, and performance of mission-critical cloud and on-prem services that support millions of Marriot customers globally. This role involves overseeing incident management, driving automation efforts, and working closely with cross-functional teams to ensure alignment between SRE strategy and business objectives. Partners closely with Product Teams, Applications teams, Infrastructure, and the broader Applications and Infrastructure Delivery teams to develop key metrics and KPIs to improve applications stability, availability and performance. The ideal candidate will bring strong communication skills, collaborating with key stakeholders across the company to optimize cloud infrastructure and uphold the highest standards of operational excellence in a dynamic, fast-paced environment.
-
Deep understanding of SRE practices, such as Service Level Objectives, Error Budgets, Toil Management, Observability & Monitoring, Blameless Postmortems, Incident Response Process, Capacity Planning
-
Networking: VPC, subnets, route tables, NAT gateways, Transit Gateway
-
Storage and Databases: S3, EBS, EFS, RDS, DocumentDB.
-
Experience using modern, continuous development techniques and pipelines (e.g. Agile, Kanban, Jira, CI/CD, Helm, Harness, Jenkins, Git, Artifactory, Vault)
-
Experience designing and implementing end-to-end observability solutions across metrics, logs, and traces using tools like Prometheus, Grafana, ELK Stack, and OpenTelemetry.
-
Experience troubleshooting API-related issues in distributed systems, including latency, authentication/authorization failures, rate limiting, and upstream/downstream dependency failures.
-
Familiarity with service mesh technologies to enable secure and resilient service communication, including mTLS, traffic shaping, and policy enforcement.
-
Familiarity with vulnerability management, OS hardening, patching, security compliance of infrastructure, applications and databases
-
Experience driving cloud cost optimization initiatives (rightsizing, reserved instances, autoscaling strategies, cost observability)
-
Networking expertise including Load Balancing, Firewalls, Security Groups, NACLs, TCP/IP, DNS, HTTP/HTTPS, SSL/TLS etc
-
Ensure the reliability, availability, and performance of mission-critical cloud services, implementing best practices for monitoring, alerting, and incident management.
-
Develop and execute the SRE strategy aligned with business goals, and communicate service health, reliability, and performance metrics to senior leadership and stakeholders
Drive Applications Performance Management and Monitoring:
Building Successful Relationships:
Managing Projects and Priorities:
Delivering on the Needs of Key Stakeholders:
Providing Technical Support and Consultation:
At Marriott International, we are dedicated to being an equal opportunity employer, welcoming all and providing access to opportunity. We actively foster an environment where the unique backgrounds of our associates are valued and celebrated.Our greatest strength lies in the rich blend of culture, talent, and experiences of our associates. We are committed to non-discrimination on any protected basis, including disability, veteran status, or other basis protected by applicable law.
All positions offer a 401(k) plan, stock purchase plan, discounts at Marriott properties, commuter benefits, employee assistance plan, and childcare discounts. Benefits are subject to terms and conditions, which may include rules regarding eligibility, enrollment, waiting period, contribution, benefit limits, election changes, benefit exclusions, and others. Click here to learn more.
Full-time positions also offer coverage for medical, dental, vision, health care flexible spending account, dependent care flexible spending account, life insurance, disability insurance, accident insurance, adoption expense reimbursements, paid parental leave and educational assistance.
Washington Applicants Only: Employees will accrue paid sick leave, 0.077 PTO balance for every hour worked and be eligible to receive a minimum of 9 holidays annually.
Marriott HQ is committed to a hybrid work environment that enables associates to Be connected. Headquarters-based positions are considered hybrid, for candidates within a commuting distance to Bethesda, MD; candidates outside of commuting distance to Bethesda, MD will be considered for Remote positions.
Marriott International is the world's largest hotel company, with more brands, more hotels and more opportunities for associates to grow and succeed.
Be where you can do your best work,
begin your purpose,
belong to an amazing global team, and
become the best version of you.