1

Service Reliability Engineer Jobs in New York (NOW HIRING)

SRE Manager / SRE Architect

Manhattan, NY · On-site

$62.50 - $83/hr

Financial Services Position Overview We are seeking a highly experienced and hands-on Site Reliability Engineering (SRE) Manager / SRE Architect to lead reliability, availability, performance, and ...

SRE Manager / SRE Architect

Manhattan, NY · On-site

$62.50 - $83/hr

Financial Services Position Overview We are seeking a highly experienced and hands-on Site Reliability Engineering (SRE) Manager / SRE Architect to lead reliability, availability, performance, and ...

Reliability Engineer I

Farmingdale, NY

$104K - $131K/yr

... services enable customers to reduce the time required to develop new products and bring them to ... Under the guidance of a senior reliability engineer, serve as a contributing member of the ...

Reliability Engineer I

Farmingdale, NY · On-site

$104K - $131K/yr

... services enable customers to reduce the time required to develop new products and bring them to ... Under the guidance of a senior reliability engineer, serve as a contributing member of the ...

SRE Engineer

Jersey City, NJ

$59.50 - $79/hr

We are seeking an experienced Site Reliability Engineer (SRE) to design, build, and maintain highly available, scalable, and reliable infrastructure and services. The ideal candidate will have strong ...

Maintenance & Reliability Engineer

Linden, NJ · On-site

$105K - $133K/yr

In addition, we provide an unmatched array of sulfur-based chemicals and related services to a diverse set of industries. POSITION PURPOSE NEXERPA's Maintenance & Reliability Engineer will be ...

next page

Showing results 1-20

Service Reliability Engineer information

See New York salary details

$66.7K

$129.1K

$154.3K

How much do service reliability engineer jobs pay per year?

As of Jun 14, 2026, the average yearly pay for service reliability engineer in New York is $129,066.00, according to ZipRecruiter salary data. Most workers in this role earn between $112,100.00 and $141,100.00 per year, depending on experience, location, and employer.

What are Service Reliability Engineers?

Service Reliability Engineers (SREs) are IT professionals who apply software engineering principles to infrastructure and operations problems. Their main goal is to ensure that services are reliable, scalable, and highly available by automating processes, monitoring system performance, and responding to incidents. SREs work closely with development and operations teams to design, build, and maintain robust systems, often using code to manage infrastructure. They also focus on improving system reliability through monitoring, incident response, and post-incident analysis.

How does a Service Reliability Engineer typically collaborate with development and operations teams to improve service uptime?

Service Reliability Engineers (SREs) work closely with both development and operations teams to ensure systems are highly available and resilient. They often participate in incident response, conduct post-incident reviews, and help implement automation to reduce manual intervention. Regular collaboration includes reviewing application changes, contributing to infrastructure design, and sharing best practices for monitoring and alerting. This cross-functional teamwork helps to quickly identify potential issues and proactively enhance system reliability.

Will AI replace SRE jobs?

AI is expected to augment the work of Service Reliability Engineers (SREs) by automating routine tasks such as monitoring, incident response, and data analysis. However, SREs will continue to be essential for designing, managing, and improving complex systems that require human judgment and expertise. The role is likely to evolve with increased use of AI tools but not be fully replaced.

What engineers make $500,000?

Senior engineers in fields such as software, data engineering, and cloud infrastructure can earn $500,000 or more annually, especially with experience, specialized skills, and stock options. Roles in high-demand industries like technology and finance often offer compensation at this level for top-tier professionals.

What are the key skills and qualifications needed to thrive as a Service Reliability Engineer, and why are they important?

To thrive as a Service Reliability Engineer, you need a solid background in systems administration, networking, coding (often in Python or Go), and experience with cloud infrastructure, typically supported by a degree in computer science or a related field. Familiarity with monitoring tools (like Prometheus), CI/CD pipelines, automation frameworks, and certifications such as AWS Certified DevOps Engineer are highly valued. Strong problem-solving abilities, collaboration, and effective communication skills help you proactively address issues and work well within cross-functional teams. These skills ensure system reliability, quick incident recovery, and the seamless delivery of high-availability services.

What is the difference between Service Reliability Engineer vs Site Reliability Engineer?

AspectService Reliability EngineerSite Reliability Engineer
CredentialsTypically requires experience in software engineering, cloud platforms, and monitoring toolsSimilar credentials, often with a focus on software development and systems engineering
Work EnvironmentWorks closely with development and operations teams to ensure service reliabilityWorks on maintaining and improving system reliability, often in cloud or data center environments
Industry UsageCommon in tech companies focusing on service uptime and customer experienceWidely used in tech, especially in cloud and large-scale infrastructure companies

Both roles focus on ensuring system reliability, often requiring similar skills and certifications. The main difference lies in terminology preference and specific organizational focus, but they generally perform comparable functions in maintaining high service availability.

What engineers make $300,000 a year?

Senior-level engineers in fields such as software engineering, data engineering, and site reliability engineering can earn $300,000 or more annually, especially with extensive experience, specialized skills, and working in high-demand industries or companies. Compensation often includes base salary, bonuses, and stock options, particularly in technology firms or startups with significant growth potential.

What does a service reliability engineer do?

A Service Reliability Engineer (SRE) is responsible for ensuring the availability, performance, and reliability of software services. They monitor systems, automate incident response, implement best practices for system stability, and often use tools like monitoring dashboards and automation scripts to prevent outages and improve service quality.
What job categories do people searching Service Reliability Engineer jobs in New York look for? The top searched job categories for Service Reliability Engineer jobs in New York are:
Infographic showing various Service Reliability Engineer job openings in New York as of June 2026, with employment types broken down into 100% Full Time. Highlights an 100% In-person job distribution, with an average salary of $129,066 per year, or $62.1 per hour.

SRE Manager / SRE Architect

Qode

Manhattan, NY • On-site

$62.50 - $83/hr

Full-time

Posted 12 days ago


Job description

Job Description - SRE Manager / SRE Architect (Hands-on)
Location: New York City, NY / Fort Mill, SC (Hybrid)
Employment Type: Full-Time / Contract
Industry: Financial Services
Position Overview
We are seeking a highly experienced and hands-on Site Reliability Engineering (SRE) Manager / SRE Architect to lead reliability, availability, performance, and release management initiatives across enterprise-scale applications and platforms. This role requires a strong blend of SRE, DevOps, Release Management, Cloud Engineering, Automation, and Production Operations expertise.
The ideal candidate will be deeply involved in designing and implementing reliability strategies, driving release governance, improving deployment processes, and ensuring operational excellence across cloud-native environments.
LaunchDarkly experience is highly preferred but not mandatory.
Key Responsibilities
Site Reliability Engineering (SRE)
  • Design and implement SRE best practices focused on reliability, scalability, performance, and availability.
  • Define and monitor SLIs, SLOs, and error budgets across critical applications and services.
  • Drive proactive monitoring, alerting, observability, and incident management processes.
  • Lead root cause analysis (RCA) efforts and implement preventive measures.
  • Improve system resiliency through automation, self-healing capabilities, and operational excellence.
  • Establish reliability standards across distributed systems and cloud platforms.

Release Management
  • Own and drive end-to-end release management processes across multiple environments.
  • Coordinate application releases across development, QA, UAT, staging, and production environments.
  • Develop release governance, release calendars, deployment strategies, rollback procedures, and change management processes.
  • Partner with development, QA, infrastructure, and business teams to ensure smooth production deployments.
  • Identify and mitigate release risks while minimizing downtime and business impact.
  • Implement deployment automation and continuous delivery best practices.

DevOps & Automation
  • Design and maintain CI/CD pipelines using modern DevOps tools.
  • Automate infrastructure provisioning, deployment, monitoring, and operational workflows.
  • Drive Infrastructure as Code (IaC) adoption using Terraform or similar technologies.
  • Support cloud-native architectures and containerized application deployments.
  • Partner with engineering teams to improve developer productivity and deployment velocity.

Cloud & Platform Engineering
  • Manage and optimize cloud infrastructure on AWS and/or Azure.
  • Support Kubernetes, container orchestration, and cloud-native application platforms.
  • Ensure platform scalability, security, compliance, and operational readiness.
  • Drive platform modernization initiatives and operational transformation efforts.

Required Skills & Experience
Core SRE Skills
  • 15+ years of IT experience with strong focus on SRE, DevOps, Platform Engineering, or Production Support.
  • Extensive hands-on experience implementing SRE practices in enterprise environments.
  • Strong understanding of:
  • SLI/SLO/Error Budgets
  • Incident Management
  • Problem Management
  • Capacity Planning
  • Reliability Engineering
  • Observability & Monitoring

Release Management
  • Proven experience managing large-scale production releases.
  • Strong expertise in:
  • Release Planning
  • Release Governance
  • Change Management
  • Deployment Automation
  • Rollback Strategies
  • Production Readiness Reviews

DevOps & Cloud
  • Hands-on experience with:
  • AWS and/or Azure
  • Kubernetes (EKS, AKS, OpenShift preferred)
  • Docker
  • Terraform
  • GitHub Actions, Jenkins, Azure DevOps, GitLab CI/CD
  • Experience building and maintaining CI/CD pipelines.

Monitoring & Observability
  • Strong experience with:
  • Dynatrace
  • Datadog
  • Splunk
  • Prometheus
  • Grafana
  • ELK Stack
  • CloudWatch

Scripting & Automation
  • Experience with Python, Bash, PowerShell, or similar scripting languages.
  • Strong automation mindset with focus on operational efficiency.

Nice to Have
  • LaunchDarkly end-to-end implementation experience
  • Feature flag management and progressive delivery strategies.
  • Financial Services, Banking, or Wealth Management domain experience.
  • Experience leading SRE or DevOps transformation initiatives.
  • Cloud certifications (AWS, Azure, Kubernetes).

Preferred Candidate Profile
  • Strong hands-on SRE leader, not just a people manager.
  • Deep expertise in Release Management and Production Support.
  • Proven background in DevOps, Cloud Engineering, and Platform Reliability.
  • Ability to work with development, infrastructure, security, and business teams.

Keywords
SRE, Site Reliability Engineering, Release Management, DevOps, Terraform, AWS, Azure, Kubernetes, Dynatrace, CI/CD, LaunchDarkly, Production Support, Incident Management, Reliability Engineering, Observability, Platform Engineering, Infrastructure Automation.