Site Reliability Engineer
- Full-Time
POSITION SUMMARY
The Site Reliability Engineer is primarily responsible for enhancing and maintaining our setup and application monitoring systems for Service Level Agreements (SLA). This entails defining our Service Level Objectives (SLO) and Service Level Indicators (SLI).
ESSENTIAL FUNCTIONS:
- Work with the Performance Engineering team who are focused on technology changes and scaling/optimization improvements to provide guidance around SLA adjustments and high availability strategies
- Improve and create CI/CD automation tooling in collaboration with DevOps. Includes Kubernetes scaling and deployment configuration/setup improvements
- Work with feature/product engineering teams to set realistic targets for warning and alert thresholds for SLAs
- Suggest improvements within the source code to help attain better availability/latency
- Apply knowledge and expertise to enhance our load testing efforts
POSITION QUALIFICATIONS
Requirements
Candidate should have minimum 2 years of experience in an SRE, DevOps or performance/scaling oriented software development role. Also should possess experience with:
- monitoring and tracing platforms (e.g. DataDog, DynaTrace)
- python in the capacity of software development (backend/web/data pipelines) or DevOps oriented work particularly within automation or monitoring frameworks
- tuning/monitoring for a variety of systems (e.g. relational databases, NoSQL, caching)
Nice to have experience:
- OpenTracing/ OpenTelemetry
- ElasticSearch/ LogStash/ Kibana (ELK), Splunk
- Celery/Flower usage/familiarity
- DynamoDB
- Hybrid cloud
- WebSocket
- Load/performance testing/analytics (Locust or other tooling familiarity)
Education:
- BS/BA Degree Required
Experience:
- 2+ years of experience
Skills & Abilities:
- Has sharp, driven, strong communication skills
- Is attracted to the challenge of building elegant solutions using modern technologies
- Is always looking for ways to improve the current process
- Works well with others – you thrive while sharing knowledge and receiving input
Competency Statement(s)
- Analytical Skills - Ability to use thinking and reasoning to solve a problem
- Communication, Oral - Ability to communicate effectively with others using the spoken word
- Communication, Written - Ability to communicate in writing clearly and concisely
- Customer Oriented - Ability to take care of the customers’ needs while following company procedures.
- Decision Making - Ability to make critical decisions while following company procedures.
- Interpersonal - Ability to get along well with a variety of personalities and individuals.
- Management Skills - Ability to organize and direct oneself and effectively supervise others.
- Problem Solving – Ability to find a solution for or to deal proactively with work-related problems
- Relationship Building - Ability to effectively build relationships with customers and co-workers
- Working Under Pressure - Ability to complete assigned tasks under stressful situations
- Presentation Skills – Ability to exhibit strong skills in presenting to small or medium-sized groups
- Flexibility – Ability to adapt to new, different, or changing requirements and environment
Address
ModivCare
Denver, COIndustry
Technology
Get fresh Site Reliability Engineer jobs daily straight to your inbox!
You Already Have an Account
We're sending an email you can use to verify and access your account.
If you know your password, you can go to the sign in page.