1

Site Reliability Engineer Jobs in Quebec (NOW HIRING)

The SRE will function as part of a special investigations unit that empowers and enables Applicative Support, Infrastructure Support, and the Incident Management team-coaching, guiding, and leading ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...

As a Senior SRE, you will be responsible for improving and developing multiple systems to better use our development and production infrastructure used by several departments. As one of your ...

To do that we are eager to add a highly skilled DevOps / SRE Engineer Engineer to our incredible team. This is a senior role working alongside the current backend and frontend software engineering ...

The COO/GTE/EPL/SRE team has members in Paris, Bangalore, and Montreal and is responsible for the production, security, performance, and scalability of all capabilities provided by EPL . What will be ...

We are looking for an experienced DevOps/SRE Engineer for our client. This is a permanent position, that can either be remote or in-office at Toronto! Our client is a large fintech firm with a ...

next page

Showing results 1-20

Site Reliability Engineer information

See Quebec salary details

$62.5K

$130.1K

$178K

How much do site reliability engineer jobs pay per year?

As of Jun 26, 2026, the average yearly pay for site reliability engineer in Quebec is $130,104.00, according to ZipRecruiter salary data. Most workers in this role earn between $109,500.00 and $149,500.00 per year, depending on experience, location, and employer.

Will SRE be replaced by AI?

Site Reliability Engineers (SREs) focus on maintaining system reliability, automation, and incident response, and AI tools are increasingly used to assist these tasks. While AI can automate routine processes, SREs' expertise in system design, troubleshooting, and decision-making remains essential, making complete replacement unlikely in the near future.

What Is a Site Reliability Engineer?

A site reliability engineer specializes in site reliability engineering, or SRE, a specific branch of operations first pioneered by Google. You are responsible for ensuring that when a website decides to scale a particular feature for various users to access, it does not break the underlying software or website functions. This means you need to use analytical problem-solving skills to determine how to make specific features on a new software release work on top of existing source code.

What engineers make $300,000 a year?

Senior-level engineers such as Site Reliability Engineers, Software Engineers, and Cloud Infrastructure Engineers can earn $300,000 or more annually, especially with extensive experience, specialized skills, and working at large tech companies or in high-cost-of-living areas. Compensation often includes base salary, bonuses, and stock options, with expertise in automation, cloud platforms, and monitoring tools being highly valued.

What are the key skills and qualifications needed to thrive as a Site Reliability Engineer, and why are they important?

To thrive as a Site Reliability Engineer, you need a strong background in computer science, systems administration, and software engineering, often supported by a degree in a technical field. Familiarity with cloud platforms (like AWS or GCP), container orchestration (such as Kubernetes), infrastructure as code (Terraform or Ansible), and monitoring tools (Prometheus, Grafana) is typically expected. Strong problem-solving skills, effective communication, and a proactive mindset help SREs excel at incident management and cross-functional collaboration. These skills are crucial for maintaining system reliability, minimizing downtime, and driving continuous improvement in complex technical environments.

Is SRE a stressful job?

Site Reliability Engineers (SREs) often work in high-pressure environments where they monitor system performance, troubleshoot outages, and ensure uptime. The role can involve on-call duties and incident response, which may contribute to stress, but it also offers opportunities for automation and process improvements to reduce workload. Overall, stress levels vary depending on the organization, team culture, and individual skills.

What are some of the most common challenges Site Reliability Engineers face when balancing system reliability with rapid software delivery?

Site Reliability Engineers (SREs) often navigate the challenge of maintaining highly reliable systems while supporting fast-paced software releases. This involves managing incidents, automating processes to reduce manual toil, and working closely with development teams to embed reliability into the software development lifecycle. SREs must carefully prioritize their efforts between proactive improvements and urgent, reactive fire-fighting. Effective communication and collaboration with both operations and development teams are crucial to ensuring service uptime without slowing down innovation.

What does a Site Reliability Engineer do?

A Site Reliability Engineer (SRE) is responsible for maintaining and improving the reliability, availability, and performance of software systems. They use automation, monitoring tools, and scripting to prevent outages and resolve issues quickly, often working closely with development teams to ensure scalable infrastructure. SREs typically have skills in systems engineering, coding, and cloud platforms, and may hold certifications like those in cloud services or DevOps practices.

What is the difference between Site Reliability Engineer vs DevOps Engineer?

AspectSite Reliability EngineerDevOps Engineer
CredentialsTypically requires a computer science degree, certifications like AWS, Google Cloud, or KubernetesSimilar credentials, often with cloud certifications and scripting skills
Work EnvironmentFocuses on maintaining and improving system reliability, often in large-scale production environmentsWorks on automation, CI/CD pipelines, and deployment processes across development and operations teams
Industry UsageCommon in tech, cloud services, and large-scale enterprise companiesWidely used in software development, cloud, and IT organizations

Both roles require strong technical skills and cloud knowledge, but SREs focus more on system reliability and uptime, while DevOps engineers emphasize automation and deployment processes. They often collaborate but have distinct primary responsibilities.

What is a Site Reliability Engineer?

A Site Reliability Engineer (SRE) is a professional who applies software engineering principles to infrastructure and operations problems. Their primary goal is to create scalable and highly reliable software systems, often bridging the gap between development and IT operations. SREs automate tasks, monitor system health, respond to incidents, and work to improve system reliability and performance. They also help define service level objectives (SLOs) and ensure systems meet customer expectations for uptime and availability.
What are the most commonly searched types of Site Reliability Engineer jobs in Quebec? The most popular types of Site Reliability Engineer jobs in Quebec are:
What are popular job titles related to Site Reliability Engineer jobs in Quebec? For Site Reliability Engineer jobs in Quebec, the most frequently searched job titles are:
What job categories do people searching Site Reliability Engineer jobs in Quebec look for? The top searched job categories for Site Reliability Engineer jobs in Quebec are:
What are popular job titles related to Site Reliability Engineer jobs in QC? For Site Reliability Engineer jobs in QC, the most frequently searched job titles are:
Infographic showing various Site Reliability Engineer job openings in Quebec as of June 2026, with employment types broken down into 92% Full Time, and 8% Temporary. Highlights an 93% Physical, 2% Hybrid, and 5% Remote job distribution, with an average salary of $130,104 per year, or $62.5 per hour.

Director, Infrastructure & SRE

TailorCare

Montreal, QC

Other

Posted 24 days ago


Job description

About the Role

The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare's infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations with Salesforce, AWS, and Cresta, among others, at the Director level.

This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical.

This is not a slideware role. We are not hiring a manager who reviews architecture diagrams from a distance. We are hiring an operator who codes, runs incidents, owns the platform, and ships

Primary Responsibilities

Infrastructure as Code

  • Converge all AWS resources to Terraform; eliminate manual provisioning
  • Establish reproducible environments (dev, staging, production) with proper isolation and parity
  • Standardize CI/CD pipelines across all engineering teams

Site Reliability

  • Define and operate SLOs, SLIs, and error budgets for all production systems (web/mobile applications, Salesforce, data processing, telephony stack)
  • Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations
  • Stand up the infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs
  • Own uptime, MTTR, and incident-volume trends as published metrics

Disaster Recovery & Business Continuity

  • Design and implement a tested DR strategy with documented RPO/RTO commitments
  • Validate recovery procedures on a recurring cadence
  • Align DR posture with HITRUST and HIPAA expectations

Integration Reliability

  • Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity
  • Partner with Data Engineering on the reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows.

Security & Compliance Engineering

  • Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption (at rest and in transit), network segmentation, secrets management, audit logging
  • Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation
  • Own vulnerability management across cloud and application layers

Email & Domain Infrastructure

  • Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability impacting patient and operational communications
  • Own all TailorCare domain and email infrastructure

Developer Experience

  • Build and maintain test, staging, and ephemeral environments engineers actually use
  • Reduce cycle time and remove infrastructure friction from the SDLC
  • Establish self-service tooling so engineers ship without filing tickets

Team & Function Leadership

  • Hire, level, develop, and retain the Infrastructure & SRE team
  • Own the function's MBR contribution: scorecard, risks, decisions needed
  • Partner with Engineering, Data, Product, and Security & Compliance leadership as a peer

Other duties as assigned

Qualifications

  • 10+ years in Infrastructure Engineering, SRE, or DevOps, with 3+ years in a senior IC or tech lead role and 2+ years directly managing engineers
  • Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response
  • Track record of hiring, leveling, and developing infrastructure or SRE engineers
  • Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect)
  • Production Terraform experience at scale (modules, state management, multi-environment)
  • Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
  • Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
  • Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
  • CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
  • Ability and willingness to travel up to 10% as needed for onsite meetings, team collaboration, and company events. 

Preferred Qualifications 

  • Salesforce platform integration and operational experience
  • Amazon Connect or comparable contact center telephony platforms
  • Data platforms (Databricks, Snowflake, Fivetran)
  • HITRUST certification participation (e1 or r2)
  • AI/LLM-assisted operations tooling
  • Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company

Who You Are

  • You own outcomes. When something breaks, you fix it and improve the system so it does not happen again.
  • You write code and ship infrastructure. You lead by doing, not by delegating.
  • You surface risks early. Bad news early is manageable; bad news late is expensive.
  • You build for clarity and simplicity. You distrust complexity that does not earn its keep.
  • You bring calm to incidents and discipline to operations.
  • You grow engineers. You hire well, develop your team, and create the kind of operating environment where senior people want to work.
  • You communicate with executives the way they want to be communicated with: concise, structured, honest, low-drama.

What you will deliver in year one

  • This role is explicitly hands-on. In year one:
  • You will personally write production Terraform and review infrastructure pull requests
  • You will influence product and engineering roadmaps in order to achieve the operational standards expected of the organization and our clients
  • You will participate in the infrastructure on-call rotation while it is being built
  • You will lead incidents until the team and process are mature enough to do so without you
  • You will pair directly with engineers on critical migrations