2

Remote Reliability Engineer Jobs in New Rochelle, NY

Senior DevOps Engineer

New York, NY · On-site +1

$170K - $190K/yr

... remote/New York setting, with strong ownership of your domain. * Passionate about automation, observability, and building reliable, scalable systems. * Experienced with SRE practices - you think ...

Senior Site Reliability Engineer

New York, NY · Remote

$58.25 - $77.50/hr

Highly collaborative, "Bridge-builder," empathetic to Developer deadlines, and pragmatic. * Acts as the liaison between Project Armor and Product/Dev teams. Translates security mandates into ...

SRE Developer

New York, NY · On-site +1

$62.25 - $82.75/hr

Position Overview - Qualifications & Experience - Pay Range - Apollo Global Management, Inc. (together with its subsidiaries and affiliates) is committed to championing opportunity. The firm and its ...

Senior Azure Cloud Engineer

New York, NY · Remote

$61 - $81.50/hr

We are open to experienced remote candidates based in an ET location that possess a robust ... Partner with the infrastructure team to enhance the reliability and performance of our systems.

DevOps Engineer

Hoboken, NJ · On-site +1

$57.75 - $79/hr

Our culture is people-first, fully remote, and rooted in respect, innovation, and teamwork, because ... Observability and Reliability * Manage monitoring and observability efforts using tools such as ...

DevOps Engineer

Hoboken, NJ · Remote

$57.75 - $79/hr

Our culture is people-first, fully remote, and rooted in respect, innovation, and teamwork, because ... Observability and Reliability * Manage monitoring and observability efforts using tools such as ...

next page

Showing results 1-20

Remote Reliability Engineer information

See New Rochelle, NY salary details

$62.8K

$121.4K

$145.1K

How much do remote reliability engineer jobs pay per year?

As of Jun 15, 2026, the average yearly pay for remote reliability engineer in New Rochelle, NY is $121,402.00, according to ZipRecruiter salary data. Most workers in this role earn between $105,500.00 and $132,700.00 per year, depending on experience, location, and employer.

What is the difference between Remote Reliability Engineer vs Remote Site Reliability Engineer?

AspectRemote Reliability EngineerRemote Site Reliability Engineer
CredentialsTypically requires certifications like AWS Certified Solutions Architect, Linux Foundation certificationsSimilar credentials, often with additional focus on site-specific tools and monitoring
Work EnvironmentPrimarily remote, focusing on cloud infrastructure and system reliabilityRemote with some on-site responsibilities, focusing on infrastructure and operational stability
Industry UsageUsed across tech, cloud providers, SaaS companiesCommon in data centers, cloud providers, and large enterprise IT
Search & Comparison IntentOften compared due to overlapping roles in system reliability and cloud infrastructureCompared for on-site vs remote operational responsibilities

The main difference is that Remote Reliability Engineers focus on cloud and system reliability remotely, while Remote Site Reliability Engineers may have some on-site duties related to infrastructure. Both roles require similar skills and certifications but differ in their work environment and specific responsibilities.

What are the key skills and qualifications needed to thrive as a Remote Reliability Engineer, and why are they important?

To thrive as a Remote Reliability Engineer, you need a strong background in systems engineering, software development, and infrastructure management, often supported by a degree in computer science or a related field. Proficiency with cloud platforms (such as AWS, Azure, or GCP), monitoring tools (like Prometheus, Grafana), and relevant certifications (e.g., AWS Certified DevOps Engineer) is highly valuable. Excellent problem-solving, communication, and collaboration skills are crucial for working effectively across distributed teams and responding to incidents. These abilities ensure system reliability, quick incident resolution, and seamless remote teamwork, which are vital for maintaining high service uptime and user satisfaction.

How do Remote Reliability Engineers typically collaborate with on-site teams to address urgent technical issues?

Remote Reliability Engineers often utilize a combination of video conferencing, instant messaging, and collaborative monitoring tools to stay closely connected with on-site teams. When urgent technical issues arise, they participate in real-time troubleshooting sessions, analyze system logs remotely, and may guide on-site staff through step-by-step resolution procedures. Building strong communication channels and regular check-ins are essential to ensure swift and effective collaboration, even across different time zones. This structure allows Remote Reliability Engineers to contribute significantly to system uptime while working from a distance.

What is a Remote Reliability Engineer?

A Remote Reliability Engineer is a professional who works from a remote location to ensure that systems, applications, or infrastructure are reliable, available, and performing well. Their responsibilities typically include monitoring system health, diagnosing issues, implementing preventative measures, and collaborating with teams to improve system reliability. They often use tools for automation, incident response, and performance monitoring, all while working offsite. This role is critical in minimizing downtime and ensuring a smooth user experience, especially for companies with complex technical environments. Remote Reliability Engineers must have strong problem-solving skills and be proficient in cloud technologies, automation, and incident management.
What cities near New Rochelle, NY are hiring for Remote Reliability Engineer jobs? Cities near New Rochelle, NY with the most Remote Reliability Engineer job openings:
Senior Cloud Engineer (AWS / Azure / GCP) - VP

Senior Cloud Engineer (AWS / Azure / GCP) - VP

Morgan Stanley

New York, NY • Remote

$61 - $81.50/hr

Full-time

Posted 16 days ago


Morgan Stanley rating

8.3

Company rating: 8.3 out of 10

Based on 147 frontline employees who took The Breakroom Quiz

40th of 138 rated financial services


Job description

Role Summary

We are seeking a Senior Cloud Engineer / Site Reliability Engineer (SRE) to design, build, and operate secure, scalable cloud platforms across AWS, Azure, and GCP. This role is responsible for configuring, deploying, and maintaining virtual machines and containerized applications, using Terraform to automate infrastructure provisioning and lifecycle management. You will provide specialized support for high-stakes production deployments, lead incident response for technical escalations, and apply SRE principles (SLIs/SLOs, error budgets, automation, and reliability engineering) to improve availability, performance, and operational excellence in a multi-cloud environment.

Key Responsibilities

Cloud Platform Engineering (AWS / Azure / GCP)

  • Architect, implement, and maintain cloud infrastructure across AWS, Azure, and GCP using Terraform (IaC).
  • Design and implement cloud landing zones aligned with best practices:
    • Account/subscription/project structure, environment separation, identity boundaries
    • Baseline guardrails and policy enforcement (Azure Policy, AWS Organizations/SCPs, GCP Org Policies)
    • Centralized audit logging, monitoring, and cost allocation standards
  • Build and operate cloud-native virtual network constructs (cloud-focused only):
    • Azure: VNETs, subnets, NSGs, route tables, Private Endpoints, hub/spoke patterns.
    • AWS: VPCs, subnets, security groups, NACLs, route tables, VPC endpoints/PrivateLink, multi-account connectivity patterns.
    • GCP: VPC networks, subnets, firewall rules, routes, Private Service Connect, Shared VPC patterns.
  • Implement private-by-default service access patterns (private endpoints, controlled egress, service-to-service access controls).

Compute, Virtual Machines, and Containers

  • Configure, deploy, and maintainvirtual machinesand scalable compute patterns:
    • AWS EC2 (Launch Templates, Auto Scaling Groups)
    • Azure Virtual Machines / VM Scale Sets
    • GCP Compute Engine / Managed Instance Groups
  • Own OS hardening, baseline configuration, patching strategies, and instance bootstrapping (cloud-init, image pipelines).
  • Deploy and operatecontainerized workloadsusing Kubernetes:
    • EKS / AKS / GKE (cluster design, upgrades, node pools, RBAC, scaling)
    • Container registries (ECR / ACR / Artifact Registry) and artifact promotion strategies
  • Implement workload delivery patterns (Helm/Kustomize), rollout strategies (blue/green, canary), and safe rollbacks.

Infrastructure as Code, Automation & CI/CD (Terraform)

  • Build reusable, versioned Terraform modules with standards for naming, tagging/labels, and secure defaults.
  • Implement Terraform best practices: remote state, locking, environment isolation, secrets handling, and drift detection.
  • Integrate IaC into CI/CD pipelines (e.g., GitHub Actions, Azure DevOps, GitLab CI):
    • Automated validation, linting, security scanning, plan/apply workflows, approvals, and promotions
  • Implement policy-as-code guardrails (OPA/Conftest, Sentinel where applicable) to prevent unsafe changes.

SRE: Reliability Engineering, Observability & Operational Excellence

  • Define, implement, and improveSLIs/SLOs(availability, latency, error rates, saturation) for critical services and platforms.
  • Manage and enforceerror budgetsto balance reliability with delivery velocity.
  • Establish and continuously improve observability standards:
    • Metrics, logs, traces, dashboards, and alerting across cloud services and Kubernetes
    • Tooling such as CloudWatch, Azure Monitor/Log Analytics, GCP Cloud Monitoring/Logging, OpenTelemetry, Prometheus/Grafana (where used)
  • Improve incident detection quality by reducing alert noise, implementing actionable alerts, and creating clear escalation paths.
  • Drive reliability improvements through:
    • Capacity planning, performance tuning, load testing support
    • Resilience engineering (multi-zone design, graceful degradation, retries/timeouts, backpressure)
    • Continuous automation to eliminate toil (self-healing, auto-remediation runbooks, ChatOps where applicable)

Production Support, Incident Response & Escalations

  • Provide specialized support for high-stakes production deployments (major releases, platform cutovers, migrations).
  • Lead incident response: triage, mitigation, recovery, communication, and post-incident review (PIR/RCA).
  • Troubleshoot escalations across cloud services, Kubernetes, IAM, storage, and CI/CD pipelines using evidence-driven debugging.
  • Build and maintain runbooks, operational playbooks, and postmortem action tracking to prevent repeat incidents.
  • Participate in on-call rotation and continuously improve on-call health through automation and better observability.

Security, Identity, and Governance

  • Implement least-privilege access controls across AWS/Azure/GCP (IAM/RBAC), including role design and permission boundaries.
  • Enforce secure configurations: encryption at rest/in transit, secrets management, key management (KMS/Key Vault/Cloud KMS).
  • Implement compliance-oriented logging and auditing, and partner with security teams to remediate findings and harden platforms.

Required Skills & Experience

  • 10+ years in cloud engineering, platform engineering, DevOps, or SRE roles with significant production ownership.
  • Strong hands-on experience acrossAWS and Azure, plus practical experience inGCP(production exposure preferred).
  • Expert-level Terraform (modules, state, CI integration, scalable environment patterns).
  • Strong Kubernetes operations experience (EKS/AKS/GKE), including upgrades, scaling, and workload reliability.
  • Experience implementing SRE practices: SLIs/SLOs, alerting strategies, incident response, postmortems, and automation/toil reduction.
  • Strong Linux and scripting (Bash/Python) and ability to debug systems from symptoms to root cause.
  • Strong security fundamentals: IAM/RBAC, encryption, secrets, and auditability in cloud environments.
  • Proven ability to lead technical escalations and coordinate resolution across teams.

WHAT YOU CAN EXPECT FROM MORGAN STANLEY:

At Morgan Stanley, we raise, manage and allocate capital for our clients - helping them reach their goals. We do it in a way that's differentiated - and we've done that for 90 years. Our values - putting clients first, doing the right thing, leading with exceptional ideas, committing to diversity and inclusion, and giving back - aren't just beliefs, they guide the decisions we make every day to do what's best for our clients, communities and more than 80,000 employees in 1,200 offices across 42 countries. At Morgan Stanley, you'll find an opportunity to work alongside the best and the brightest, in an environment where you are supported and empowered. Our teams are relentless collaborators and creative thinkers, fueled by their diverse backgrounds and experiences. We are proud to support our employees and their families at every point along their work-life journey, offering some of the most attractive and comprehensive employee benefits and perks in the industry. There's also ample opportunity to move about the business for those who show passion and grit in their work.

To learn more about our offices across the globe, please copy and paste https://www.morganstanley.com/about-us/global-offices into your browser.

Expected base pay rates for the role will be between $150,000 and $210,000 per year at the commencement of employment. However, base pay if hired will be determined on an individualized basis and is only part of the total compensation package, which, depending on the position, may also include commission earnings, incentive compensation, discretionary bonuses, other short and long-term incentive packages, and other Morgan Stanley sponsored benefit programs

Morgan Stanley is an equal opportunity employer committed to building and maintaining a workforce that is diverse in experience and background. Our recruiting efforts reflect our strong commitment to a culture of inclusion, where individuals are hired, developed, and advanced based on their skills and talents.

Our workforce reflects a broad cross-section of the global communities in which we operate, bringing a variety of backgrounds, talents, perspectives, and experiences.

For more information, please visit: https://www.morganstanley.com/people-opportunities/eeo.


What Morgan Stanley employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom