Job Summary:
Accenture Federal Services is dedicated to enhancing the US federal government's capabilities through technology and innovation. The Site Reliability Engineer will focus on ensuring the reliability, scalability, and continuous monitoring of enterprise AI systems that support mission-critical applications.
Responsibilities:
• Ensure the reliability, scalability, and performance of enterprise AI systems within a modern Hub-and-Spoke architecture
• Lead incident response efforts to minimize downtime and maintain service continuity
• Implement and manage SLOs/SLAs, capacity planning, and performance optimization strategies
• Operate and enhance observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo Drive FinOps practices to optimize operational costs and resource utilization
• Collaborate with cross-functional teams in AI, DevSecOps, data engineering, platform engineering, and cybersecurity
• Integrate monitoring and continuous feedback mechanisms for mission applications and agentic AI systems
• Support enterprise AI governance and scalable software delivery through robust operational workflows Proactively identify and resolve reliability and performance issues in production environments
• You will be responsible for incident response, performance optimization, and capacity planning, working closely with cross-functional teams to integrate AI, DevSecOps, data engineering, and cybersecurity into seamless operational workflows
• Your expertise will be essential in maintaining robust observability operations and supporting scalable software delivery for agentic AI systems
Qualifications:
Required:
• Experience with OpenTelemetry, Prom, Grafana, Loki, and Tempo to enhance system observability and performance
• Hands-on experience with SLO/SLA management, FinOps practices, and advanced monitoring techniques to proactively identify and resolve issues before they impact mission outcomes
• Exposure to complex integration efforts, continuous delivery pipelines, and mission-focused operational environments will help you excel in this role
• Experience with reliability engineering, incident response and FinOps
• Must be a U.S Citizen
• An active TS/SCI clearance is required
Company:
Accenture Federal Services is a leading US federal services company and subsidiary of Accenture. Founded in 1989, the company is headquartered in Arlington, USA, with a team of 10001+ employees. The company is currently Late Stage.