Arobas Personnel is currently seeking a Site Reliability Engineer (SRE) for a contract mandate with one of its partners.
*Remote*
In this role, you will work closely with development and business teams to design and implement monitoring, alerting, and observability solutions that enhance system performance and visibility. You will support production environments, troubleshoot complex issues, and contribute to long-term system stability through proactive incident management and automation.
You will also play a key role in designing secure, scalable, and cost-efficient cloud infrastructure.
Key Responsibilities
Reliability Engineering & System Operations
- Design, implement, and maintain scalable and reliable production systems
- Investigate and resolve complex application and infrastructure issues
- Collaborate with development teams to build features with a strong focus on reliability, performance, and observability
- Apply SRE best practices, including SLIs, SLOs, and SLAs
Monitoring & Observability
- Develop and maintain monitoring, alerting, dashboards, and synthetic testing solutions
- Configure metrics and log collection agents and manage incident notification channels
- Analyze trends and recurring issues to drive continuous improvement
Cloud Infrastructure Management
- Manage and optimize AWS and/or Azure environments across staging and production
- Collaborate with architecture, development, and finance teams to build secure and cost-effective cloud solutions
Incident & Problem Management
- Participate in a 24/7 on-call rotation and respond rapidly to production incidents
- Identify root causes, lead post-mortems, and implement long-term fixes
- Ensure proper escalation and communication of issues
Automation & Tooling
- Automate repetitive operational tasks to improve efficiency and reliability
- Build and maintain deployment and configuration tools
- Work with CI/CD pipelines, including tools such as GitHub Actions
Collaboration & Customer Focus
- Partner with product and development teams to prioritize and resolve production issues
- Enable internal teams with self-service tools and insights
- Ensure timely ticket resolution and clear stakeholder communication
Architecture & Documentation
- Review technical documentation (HLDs/FRDs) to proactively identify risks and gaps
Maintain strong knowledge of platforms, systems, and usage patterns
Employment Type: Temps Plein