$75,000 - $120,000 / year + Bonus
The insurance industry runs on Vertafore. We equip agencies, MGAs, and carriers with the core digital systems, specialized AI, and data-driven foundation to eliminate distribution drag across the insurance lifecycle, spanning sales, servicing, and back-office operations.
Underpinned by unmatched speed and performance power, we are the trusted backbone that’s taking the insurance industry from friction to flow with Distribution Velocity – speed, performance, and trust - to drive growth at scale.
With over 95% of the top agencies and insurers and 50% of industry compliance transactions running through Vertafore, we lead at the intersection of innovation and trust, giving insurance professionals the confidence to transform and win in the AI era.
Our reach is global, with headquarters in Denver, Colorado, and offices across the U.S., Canada, and India.
.
Role Summary
We are seeking a Site Reliability Engineer II to support the reliability, scalability, and performance of critical production services. This role contributes to the full-service lifecycle, helping to transition services from deployment readiness into stable production operations. At Vertafore, we treat operations as a software problem; you will work alongside Senior SREs to apply engineering rigor to our AWS and hybrid environments.
Key ResponsibilitiesReliability and Observability Support
- Service Maintenance: Contribute to the operational health and performance of assigned production services.
- Observability Implementation: Assist in building and maintaining observability frameworks. Help track the Four Golden Signals (latency, traffic, errors, and saturation) to ensure service health is visible.
- SLO Contribution: Participate in monitoring SLIs and SLOs, providing data to help the team manage error budgets effectively.
Engineering and Automation
- Toil Reduction and Guided Debugging: Work on projects to automate manual and repetitive tasks using scripting, programming, or AI tools. Troubleshoot production issues across infrastructure and application code, implementing durable solutions instead of quick fixes.
- Deployment Execution: Support production changes such as patching and software releases using established automated pipelines and safety-first practices.
Incident Participation and Learning
- Active Incident Response: Participate in incident response for production events and join on-call rotations.
- Postmortem Contribution: Assist in root cause analysis and contribute to blameless postmortems to help the team learn from failures.
Qualifications
- Experience: 2 to 3.5 years of hands-on experience in SRE, DevOps, or a software engineering role with a focus on system stability.
- SRE Fundamentals: Understanding of core SRE principles such as SLIs, SLOs, and error budgets.
- Coding Skills: Proficiency in at least one language such as C#, .NET, Java, Python, or React.
- Technical Skills: Experience with AWS, CI/CD pipelines (GitLab or Jenkins), and infrastructure as code.
- Systems Knowledge: Working knowledge of Linux and Windows environments and relational databases.
- Education: Bachelor’s degree in Computer Science or a related technical field.