About the Role
As Production Support Engineer, you will be part of a multi-tiered support team whose primary responsibilities include minimizing downtime and optimizing production uptime. You will be the bridge between code and customer. This is an in-person role in our Chicago office.
What You'll Do
The Production Support Engineer will
- Act as the primary incident commander for high-severity outages, orchestrating the technical response and maintaining clear lines of communication with stakeholders.
- Lead blameless post-mortem sessions to dissect failures. Be responsible for translating technical items into documented systemic improvements.
- Perform in-depth troubleshooting and analysis to resolve bugs, identify workflow enhancements and other functional errors.
- Own and evolve the observability strategy. Move us from reactive alerts to predictive insights.
- Dive into Ruby code, review GitHub PRs, manage feature flags, and run production jobs to keep things moving.
- Lead problem management. Identify systemic trends and partner with Product teams to prioritize permanent fixes over temporary band-aids.
- Drive continuous improvement by identifying and implementing enhancements to support tools, workflows, and documentation.
- Drive critical escalations in technically challenging situations in collaboration with engineering, product, and other IT teams.
What We Look For
- 5+ years of experience in a high-stakes Production/SRE role, with deep expertise in AWS (ECS, Lambda, CloudWatch) and SQL.
- Expert-level experience with Observability/APM tools
- A strong development background. You are comfortable reading and debugging Ruby or JavaScript and navigating GitHub workflows.
- Proven experience in incident command. You know how to manage a bridge, silence the noise and drive a team toward a resolution under heavy pressure.
- Highly organized with a strong work ethic, sharp attention to detail, and a proactive mindset.
- A "Bias for Action." You can navigate ambiguity, manage your own project timelines, and stay calm while the "house is on fire.".
- You can translate a complex system failure into a clear, business-value narrative for non-technical executives.