About the Role The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare's infrastructure, plus the team that ...
About the Role The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare's infrastructure, plus the team that ...
SRE specialist
Montreal, QC · Hybrid
The SRE will function as part of a special investigations unit that empowers and enables Applicative Support, Infrastructure Support, and the Incident Management team-coaching, guiding, and leading ...
SRE specialist
Montreal, QC · Hybrid
The SRE will function as part of a special investigations unit that empowers and enables Applicative Support, Infrastructure Support, and the Incident Management team-coaching, guiding, and leading ...
As a Site Reliability Engineer, you will have the opportunity to operate and support business critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and ...
As a Site Reliability Engineer, you will have the opportunity to operate and support business critical Cloud services. As part of your daily job, you will proactively monitor the service behavior and ...
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or ...
Quick apply
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or ...
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or ...
Quick apply
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Longueuil, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Longueuil, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Laval, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Laval, QC · On-site +1
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
ServiceNow Platform SRE
Montreal, QC · On-site
A career as a Site reliability engineer Plateform ServiceNow (SRE) in the productivity Tools Management team at National Bank means being an expert responsible for the reliability, availability and ...
As a Senior SRE, you will be responsible for improving and developing multiple systems to better use our development and production infrastructure used by several departments. As one of your ...
As a Senior SRE, you will be responsible for improving and developing multiple systems to better use our development and production infrastructure used by several departments. As one of your ...
We are growing SRE capabilities within our Reliability & Production Engineering (RPE) organization as part of the transformation of Morgan Stanley's Technology. In the Technology division, we ...
We are growing SRE capabilities within our Reliability & Production Engineering (RPE) organization as part of the transformation of Morgan Stanley's Technology. In the Technology division, we ...
Le(la) SRE travaillera en etroite collaboration avec les equipes de developpement et d'infrastructure. Le(la) SRE travaillera egalement avec d'autres equipes pour repondre aux incidents et resoudre ...
Le(la) SRE travaillera en etroite collaboration avec les equipes de developpement et d'infrastructure. Le(la) SRE travaillera egalement avec d'autres equipes pour repondre aux incidents et resoudre ...
Le(la) SRE travaillera en etroite collaboration avec les equipes de developpement et d'infrastructure. Le(la) SRE travaillera egalement avec d'autres equipes pour repondre aux incidents et resoudre ...
Le(la) SRE travaillera en etroite collaboration avec les equipes de developpement et d'infrastructure. Le(la) SRE travaillera egalement avec d'autres equipes pour repondre aux incidents et resoudre ...
Sur certains mandats, leadership technique - Piloter le volet fiabilite et infrastructure, guider les equipes clients sur les pratiques SRE et contribuer aux decisions architecturales. * Soutien a ...
Sur certains mandats, leadership technique - Piloter le volet fiabilite et infrastructure, guider les equipes clients sur les pratiques SRE et contribuer aux decisions architecturales. * Soutien a ...
DevOps / SRE Engineer (Remote)
Montreal, QC · On-site +1
To do that we are eager to add a highly skilled DevOps / SRE Engineer Engineer to our incredible team. This is a senior role working alongside the current backend and frontend software engineering ...
Quick apply
DevOps / SRE Engineer (Remote)
Montreal, QC · On-site +1
To do that we are eager to add a highly skilled DevOps / SRE Engineer Engineer to our incredible team. This is a senior role working alongside the current backend and frontend software engineering ...
Site Reliability Specialist
Sorel-tracy, QC · Hybrid
CA$60K - CA$70K/yr
We're expanding our team and hiring three Site Reliability Engineers to help secure, automate, and optimize our cloud infrastructure. In this role, you'll build reliable, scalable systems, improve ...
Site Reliability Specialist
Sorel-tracy, QC · Hybrid
CA$60K - CA$70K/yr
We're expanding our team and hiring three Site Reliability Engineers to help secure, automate, and optimize our cloud infrastructure. In this role, you'll build reliable, scalable systems, improve ...
Senior DEVOPS/SRE
Montreal, QC · On-site
The COO/GTE/EPL/SRE team has members in Paris, Bangalore, and Montreal and is responsible for the production, security, performance, and scalability of all capabilities provided by EPL . What will be ...
Senior DEVOPS/SRE
Montreal, QC · On-site
The COO/GTE/EPL/SRE team has members in Paris, Bangalore, and Montreal and is responsible for the production, security, performance, and scalability of all capabilities provided by EPL . What will be ...
DevOps Engineer
Montreal, QC · On-site +1
We are looking for an experienced DevOps/SRE Engineer for our client. This is a permanent position, that can either be remote or in-office at Toronto! Our client is a large fintech firm with a ...
Quick apply
DevOps Engineer
Montreal, QC · On-site +1
We are looking for an experienced DevOps/SRE Engineer for our client. This is a permanent position, that can either be remote or in-office at Toronto! Our client is a large fintech firm with a ...
Site Reliability Engineer information
See Quebec salary details
$62.5K - $73K
1% of jobs
$73K - $83.5K
3% of jobs
$83.5K - $94K
5% of jobs
$94K - $104.5K
9% of jobs
$109.8K is the 25th percentile. Wages below this are outliers.
$104.5K - $115K
14% of jobs
$115K - $125.5K
15% of jobs
The median wage is $127.8K / yr.
$125.5K - $136K
15% of jobs
$136K - $146.5K
13% of jobs
$146.9K is the 75th percentile. Wages above this are outliers.
$146.5K - $157K
13% of jobs
$157K - $167.5K
7% of jobs
$167.5K - $178K
5% of jobs
$62.5K
$130.1K
$178K
How much do site reliability engineer jobs pay per year?
Will SRE be replaced by AI?
What Is a Site Reliability Engineer?
A site reliability engineer specializes in site reliability engineering, or SRE, a specific branch of operations first pioneered by Google. You are responsible for ensuring that when a website decides to scale a particular feature for various users to access, it does not break the underlying software or website functions. This means you need to use analytical problem-solving skills to determine how to make specific features on a new software release work on top of existing source code.
What engineers make $300,000 a year?
What are the key skills and qualifications needed to thrive as a Site Reliability Engineer, and why are they important?
Is SRE a stressful job?
What are some of the most common challenges Site Reliability Engineers face when balancing system reliability with rapid software delivery?
What does a Site Reliability Engineer do?
What is the difference between Site Reliability Engineer vs DevOps Engineer?
| Aspect | Site Reliability Engineer | DevOps Engineer |
|---|---|---|
| Credentials | Typically requires a computer science degree, certifications like AWS, Google Cloud, or Kubernetes | Similar credentials, often with cloud certifications and scripting skills |
| Work Environment | Focuses on maintaining and improving system reliability, often in large-scale production environments | Works on automation, CI/CD pipelines, and deployment processes across development and operations teams |
| Industry Usage | Common in tech, cloud services, and large-scale enterprise companies | Widely used in software development, cloud, and IT organizations |
Both roles require strong technical skills and cloud knowledge, but SREs focus more on system reliability and uptime, while DevOps engineers emphasize automation and deployment processes. They often collaborate but have distinct primary responsibilities.
What is a Site Reliability Engineer?

Other
Posted 24 days ago
Job description
About the Role
The Director of Infrastructure & SRE owns the function end-to-end: reliability, security, scalability, and operational governance of TailorCare's infrastructure, plus the team that delivers it. You will be a peer to the Director of Software Engineering, Director of Data Engineering, and Director of Data Science, own the Infrastructure & SRE scorecard in front of the executive team, and lead vendor escalations with Salesforce, AWS, and Cresta, among others, at the Director level.
This is a player-coach role. In year one you will spend roughly 60% of your time hands-on (writing Terraform, leading incidents, doing architecture work) and 40% building the team and the practice. As the team scales, that ratio shifts toward leadership, but you will never stop being technical.
This is not a slideware role. We are not hiring a manager who reviews architecture diagrams from a distance. We are hiring an operator who codes, runs incidents, owns the platform, and ships
Primary Responsibilities
Infrastructure as Code
- Converge all AWS resources to Terraform; eliminate manual provisioning
- Establish reproducible environments (dev, staging, production) with proper isolation and parity
- Standardize CI/CD pipelines across all engineering teams
Site Reliability
- Define and operate SLOs, SLIs, and error budgets for all production systems (web/mobile applications, Salesforce, data processing, telephony stack)
- Build observability (metrics, logs, traces, alerting) across AWS, Salesforce, telephony/omni-channel, and Cresta integrations
- Stand up the infrastructure on-call rotation, incident management, and post-incident review discipline, including RCAs
- Own uptime, MTTR, and incident-volume trends as published metrics
Disaster Recovery & Business Continuity
- Design and implement a tested DR strategy with documented RPO/RTO commitments
- Validate recovery procedures on a recurring cadence
- Align DR posture with HITRUST and HIPAA expectations
Integration Reliability
- Stabilize Salesforce, telephony/omni-channel, and Cresta integrations; close persistent gaps in skills-based routing, warm transfers, and telephony data parity
- Partner with Data Engineering on the reliability of data ingest paths (Fivetran, SFTP, S3) and Salesforce bulk API flows.
Security & Compliance Engineering
- Translate Security & Compliance policy into enforced infrastructure controls: IAM, encryption (at rest and in transit), network segmentation, secrets management, audit logging
- Partner with Security & Compliance on HITRUST evidence, audit readiness, and remediation
- Own vulnerability management across cloud and application layers
Email & Domain Infrastructure
- Fix DNS, SPF, DKIM, DMARC, and IP reputation to resolve spam-folder deliverability impacting patient and operational communications
- Own all TailorCare domain and email infrastructure
Developer Experience
- Build and maintain test, staging, and ephemeral environments engineers actually use
- Reduce cycle time and remove infrastructure friction from the SDLC
- Establish self-service tooling so engineers ship without filing tickets
Team & Function Leadership
- Hire, level, develop, and retain the Infrastructure & SRE team
- Own the function's MBR contribution: scorecard, risks, decisions needed
- Partner with Engineering, Data, Product, and Security & Compliance leadership as a peer
Other duties as assigned
Qualifications
- 10+ years in Infrastructure Engineering, SRE, or DevOps, with 3+ years in a senior IC or tech lead role and 2+ years directly managing engineers
- Recent hands-on technical work (within the last 12 to 18 months) in Terraform, AWS, and production incident response
- Track record of hiring, leveling, and developing infrastructure or SRE engineers
- Deep AWS expertise (VPC, IAM, ECS/EKS, Lambda, RDS, DynamoDB, S3, API Gateway, WAF, Connect)
- Production Terraform experience at scale (modules, state management, multi-environment)
- Hands-on with observability stacks (CloudWatch, Datadog, Grafana, or equivalents)
- Demonstrated experience standing up SRE practices: SLOs, on-call, incident management, blameless postmortems
- Experience operating in a HIPAA or comparably regulated environment (PCI, SOC 2 Type II, HITRUST, FedRAMP)
- CI/CD pipeline design (GitHub Actions, GitLab CI, or equivalent)
- Ability and willingness to travel up to 10% as needed for onsite meetings, team collaboration, and company events.Â
Preferred QualificationsÂ
- Salesforce platform integration and operational experience
- Amazon Connect or comparable contact center telephony platforms
- Data platforms (Databricks, Snowflake, Fivetran)
- HITRUST certification participation (e1 or r2)
- AI/LLM-assisted operations tooling
- Experience scaling an infrastructure function in a healthcare or other regulated growth-stage company
Who You Are
- You own outcomes. When something breaks, you fix it and improve the system so it does not happen again.
- You write code and ship infrastructure. You lead by doing, not by delegating.
- You surface risks early. Bad news early is manageable; bad news late is expensive.
- You build for clarity and simplicity. You distrust complexity that does not earn its keep.
- You bring calm to incidents and discipline to operations.
- You grow engineers. You hire well, develop your team, and create the kind of operating environment where senior people want to work.
- You communicate with executives the way they want to be communicated with: concise, structured, honest, low-drama.
What you will deliver in year one
- This role is explicitly hands-on. In year one:
- You will personally write production Terraform and review infrastructure pull requests
- You will influence product and engineering roadmaps in order to achieve the operational standards expected of the organization and our clients
- You will participate in the infrastructure on-call rotation while it is being built
- You will lead incidents until the team and process are mature enough to do so without you
- You will pair directly with engineers on critical migrations