1

Site Reliability Engineer Developer Operations Jobs in Utah

$57.75 - $76.75/hr

Site Reliability Engineer (SRE) Department: Technology Location: Manila Reporting To: Head of Infra ... DevOps, and client success teams to operationalize deployments across on-premise, VPC, and ...

Site Reliability Engineer

Draper, UT

$53.25 - $70.75/hr

The Site Reliability Engineer will work at the intersection of ... SecOps, DevOps, Quality Assurance, and IT operations teams by leveraging technical and ...

Site Reliability Engineer

Draper, UT · On-site

$53.25 - $70.75/hr

... Site Reliability Engineer will work at the intersection of ... SecOps, DevOps, Quality Assurance, and IT operations teams by leveraging technical and ...

Site Reliability Engineer II

Lehi, UT · On-site

$53.50 - $71/hr

WHAT YOU'LL NEED * 2+ years of experience in SRE, ... DevOps, or infrastructure engineering. * Strong understanding of cloud platforms (AWS, GCP, or ...

Site Reliability Engineer

Saint George, UT · On-site

$53.75 - $71.50/hr

The Site Reliability Engineer works as part of a team to analyze, troubleshoot, deploy, monitor ... used for DevOps/Continuous Delivery, including but not limited to Go, Java, and Node.Js * ...

Site Reliability Engineer

Saint George, UT · On-site

$50.75 - $67.50/hr

The Site Reliability Engineer works as part of a team to analyze, troubleshoot, deploy, monitor ... used for DevOps/Continuous Delivery, including but not limited to Go, Java, and Node.Js * ...

SRE Engineer - PxE Talent

Salt Lake City, UT · On-site

$55.25 - $73.25/hr

As a SRE Engineer you will actively engage in your engineering craft, taking a hands-on approach to ... Develops and iterates operational solutions with customers and product teams; responds to incidents ...

Manager, SRE Engineer - PxE ERM

Salt Lake City, UT · On-site

$55.25 - $73.25/hr

Champions modern SRE practices, ensuring alignment with business goals and operational standards ... Responsible for requirement analysis, automation, integration, monitoring, and ongoing support.

next page

Showing results 1-20

Site Reliability Engineer Developer Operations information

What cities in Utah are hiring for Site Reliability Engineer Developer Operations jobs? Cities in Utah with the most Site Reliability Engineer Developer Operations job openings:

$57.75 - $76.75/hr

Full-time

Posted 22 days ago


Job description

Position Overview

Job Title: Site Reliability Engineer (SRE)
Department: Technology
Location: Manila
Reporting To: Head of Infra

Tookitaki is looking for a Site Reliability Engineer (SRE) with 3–6 years of experience to help maintain and scale the infrastructure that powers our flagship products—FinCense and the AFC Ecosystem. As an SRE, you will work at the intersection of software engineering and infrastructure, ensuring high availability, performance, and scalability of our platforms.

You will collaborate with engineering, DevOps, and client success teams to operationalize deployments across on-premise, VPC, and Compliance as a Service (CaaS) environments while improving monitoring, automation, and incident response.

Position Purpose

The SRE role is responsible for ensuring the reliability and efficiency of Tookitaki’s production systems and environments. This includes building monitoring systems, improving deployment pipelines, automating routine operations, and responding to production incidents. You’ll help build a resilient infrastructure that supports our mission to provide AI-driven solutions that prevent financial crime.

Key Responsibilities
  1. System Monitoring & Incident Management

    • Build and maintain monitoring, alerting, and logging systems using tools like Prometheus, Grafana, and ELK.

    • Respond to incidents and outages, conduct post-mortems, and implement corrective actions.

  2. Infrastructure & Deployment Automation

    • Automate infrastructure provisioning and application deployment using Terraform, Ansible, or Helm.

    • Contribute to CI/CD pipelines, improve reliability and speed of software delivery (GitLab CI, Jenkins, etc.).

  3. Container & Orchestration Management

    • Manage and troubleshoot Docker containers and Kubernetes clusters, ensuring workload scaling, resource management, and health.

    • Support application updates, rollbacks, and blue-green or canary deployments.

  4. Cloud & Platform Operations

    • Operate within AWS (preferred) or GCP environments (EC2, S3, VPC, IAM).

    • Monitor system availability and resource usage across environments.

  5. Security & Reliability Enhancements

    • Implement and monitor TLS/SSL, RBAC, SSO, and secure API practices.

    • Support compliance and security audit activities by maintaining logs, access controls, and operational hygiene.

  6. Collaboration & Documentation

    • Work closely with developers, infra engineers, and support teams to ensure production readiness.

    • Maintain playbooks, runbooks, and system documentation for reliability engineering activities.

Qualifications and SkillsEducation
  • Bachelor’s degree in Computer Science, Engineering, or related technical field.

Experience
  • 3–6 years in Site Reliability Engineering, DevOps, Platform Engineering, or a related role.

  • Experience with production environments and live system debugging.

Technical Skills
  • Kubernetes, Docker, Helm – experience deploying and scaling services.

  • Linux administration and command-line debugging.

  • Hands-on with AWS (preferred) or GCP cloud platforms.

  • Scripting in Bash and Python for automation and monitoring tasks.

  • Experience with monitoring and alerting tools like Prometheus, Grafana, ELK, or Datadog.

  • Familiarity with databases (e.g., MariaDB, ScyllaDB) and SQL/CQL querying.

Soft Skills
  • Strong problem-solving and debugging skills.

  • Ability to work in on-call rotations and high-pressure production environments.

  • Excellent communication and documentation abilities.

Key Competencies
  • Operational Reliability: Ensures system uptime and performance through proactive monitoring and maintenance.

  • Automation Mindset: Reduces manual effort through scripting and tooling.

  • Incident Response: Quick identification and resolution of issues to minimize downtime.

  • Cross-Functional Collaboration: Works effectively with engineering, support, and infra teams.

  • Security Awareness: Applies best practices in infrastructure and platform security.

Success Metrics
  • Maintain 99.9%+ uptime across production environments.

  • Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for critical incidents.

  • Increase in automation coverage and reduction in manual deployment steps.

  • High internal satisfaction from developers on CI/CD and platform reliability.

  • Compliance readiness and security log availability for audits.

Benefits
  • Competitive compensation

  • Work on a globally recognized RegTech platform transforming financial crime prevention.

Exposure to cutting-edge AI and big data infrastructure (Spark, Kafka, ScyllaDB, Flink).

Employment Type: FULL_TIME