Overview:Senior Site Reliability Engineer (SRE)Location: Chicago, IL (Onsite)
Type: Contract
Role Overview:We are seeking a
Senior Site Reliability Engineer (SRE) with strong expertise in
AWS infrastructure, automation, observability, and production support. The ideal candidate will bring a blend of
DevOps and SRE practices, ensuring our systems remain
resilient, scalable, and cost-efficient. This role requires hands-on technical depth, proactive problem-solving, and the ability to embed reliability engineering across development teams.
Key Responsibilities: - Design, implement, and maintain secure, scalable, and highly available AWS infrastructure.
- Build and enhance CI/CD pipelines and Infrastructure as Code (IaC) solutions using Terraform and Harness.
- Implement and manage monitoring, logging, alerting, and distributed tracing with tools like Dynatrace and Datadog.
- Troubleshoot production incidents, conduct blameless postmortems, and strengthen incident response processes.
- Optimize systems for performance, cost efficiency, and reliability.
- Drive chaos engineering and resilience testing initiatives.
- Collaborate with developers to implement SLAs, SLOs, and error budgets.
- Mentor junior SREs and promote DevOps/SRE best practices across the organization.
Required Skills & Experience: - 8+ years of experience in DevOps/SRE roles with a strong focus on AWS.
- Proven expertise in AWS services and infrastructure automation.
- Strong hands-on experience with Terraform, Harness, or similar IaC/CICD tools.
- Advanced knowledge of monitoring & observability platforms (Dynatrace, Datadog, Prometheus, Grafana, etc.).
- Deep understanding of incident response, disaster recovery, and reliability frameworks.
- Solid coding/scripting skills in Python, Bash, or similar languages.
- Experience with chaos engineering, resilience testing, and fault tolerance design.
- Strong collaboration, leadership, and mentoring capabilities.
Preferred Qualifications: - Familiarity with Kubernetes, Docker, and container orchestration.
- Experience with FinOps practices (cloud cost optimization).
- Background in distributed systems, scalability, and fault-tolerant architectures.