Job Summary:
Accenture Federal Services is a technology company dedicated to strengthening and securing the US federal government. As a Site Reliability Engineer, you will ensure the reliability and scalability of enterprise AI systems while leading incident response efforts and collaborating with cross-functional teams to enhance operational workflows.
Responsibilities:
• Ensure the reliability, scalability, and performance of enterprise AI systems within a modern Hub-and-Spoke architecture
• Lead incident response efforts to minimize downtime and maintain service continuity
• Implement and manage SLOs/SLAs, capacity planning, and performance optimization strategies
• Operate and enhance observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo Drive FinOps practices to optimize operational costs and resource utilization
• Collaborate with cross-functional teams in AI, DevSecOps, data engineering, platform engineering, and cybersecurity
• Integrate monitoring and continuous feedback mechanisms for mission applications and agentic AI systems
• Support enterprise AI governance and scalable software delivery through robust operational workflows Proactively identify and resolve reliability and performance issues in production environments
• You will be responsible for incident response, performance optimization, and capacity planning, working closely with cross-functional teams to integrate AI, DevSecOps, data engineering, and cybersecurity into seamless operational workflows
• Your expertise will be essential in maintaining robust observability operations and supporting scalable software delivery for agentic AI systems
Qualifications:
Required:
• Experience with OpenTelemetry, Prom, Grafana, Loki, and Tempo to enhance system observability and performance
• Hands-on experience with SLO/SLA management, FinOps practices, and advanced monitoring techniques to proactively identify and resolve issues before they impact mission outcomes
• Exposure to complex integration efforts, continuous delivery pipelines, and mission-focused operational environments
• Experience with reliability engineering, incident response and FinOps
• Must be a U.S Citizen
• An active TS/SCI clearance is required
Company:
Accenture Federal Services is a leading US federal services company and subsidiary of Accenture. Founded in 1989, the company is headquartered in Arlington, USA, with a team of 10001+ employees. The company is currently Late Stage.