Overview:About the Role: We're seeking a highly skilled Site Reliability Engineer (SRE) to join our IT Operations team. You'll be responsible for ensuring the reliability, availability, and performance of our applications and services, with a specific focus on IBM WebSphere Commerce Suite environments. This role combines systems engineering with software development to build and maintain large-scale, distributed systems.
What You'll Do: - Implement and maintain SRE best practices across application and service infrastructure
- Develop automation tools and scripts to improve system resilience and operational efficiency
- Monitor system health, performance metrics, and implement proactive alerting systems
- Collaborate with development and operations teams to enhance system reliability
- Design and execute chaos engineering experiments to test system resilience
- Manage and optimize IBM WebSphere Commerce Suite environments
- Participate in incident response, root cause analysis, and post-mortem processes
- Build and maintain CI/CD pipelines and DevOps automation workflows
- Establish and track Service Level Objectives (SLOs) and Error Budgets
What We're Looking For: - Strong experience with Site Reliability Engineering (SRE) practices and methodologies
- Expert-level knowledge of IBM WebSphere Commerce Suite (WCS)
- Proficiency in DevOps tools and practices (CI/CD, automation, infrastructure as code)
- Experience with cloud platforms and distributed systems architecture
- Strong background in monitoring tools and observability platforms
- Knowledge of chaos engineering principles and fault tolerance design
- Scripting and programming skills (Python, Shell, etc.)
- Understanding of containerization and orchestration technologies
- Experience with incident management and on-call responsibilities
- Strong analytical and problem-solving abilities
Required Technical Skills: - BOTH SRE and IBM WebSphere Commerce Suite experience (mandatory)
- Linux/Unix system administration
- Database management and optimization
- Network troubleshooting and performance tuning
- Security best practices and compliance
Preferred Qualifications: - SRE or DevOps certifications
- Experience with microservices architecture
- Knowledge of load balancing and high availability systems
- Familiarity with APM tools and log analysis platforms
Application Process: We're looking for an experienced professional who can bridge the gap between development and operations while maintaining our critical WebSphere Commerce systems. Candidates must have hands-on experience with both SRE practices and IBM WebSphere Commerce Suite.
We are an equal opportunity employer committed to diversity and inclusion.