1

High Availability Jobs (NOW HIRING)

Lansing, MI - Manage and monitor SQL Server 2016+ environments for uptime and performance. - Set up and troubleshoot High Availability (HA) and Disaster Recovery (DR) solutions. - Perform server ...

next page

Showing results 1-20

High Availability information

See salary details

$8

$21

$35

How much do high availability jobs pay per hour?

As of Jun 7, 2026, the average hourly pay for high availability in the United States is $21.43, according to ZipRecruiter salary data. Most workers in this role earn between $15.14 and $26.92 per hour, depending on experience, location, and employer.

What is high availability in IT and why is it important?

High availability (HA) refers to systems or components that are continuously operational for a long period of time, minimizing downtime and ensuring reliable access to services or applications. It is achieved through redundancy, failover mechanisms, and careful system design to prevent single points of failure. High availability is important because it helps organizations maintain business continuity, meet service level agreements, and provide a seamless experience for users, even during hardware or software failures.

What are common challenges faced by professionals working in High Availability roles, and how can they prepare for them?

Professionals in High Availability roles often face the challenge of ensuring continuous system uptime and rapid recovery from outages, which can involve responding to unexpected incidents at any hour. They must coordinate closely with development, operations, and network teams to implement robust failover strategies and perform regular testing of backup systems. Staying updated on the latest technologies and best practices is essential, as is developing strong problem-solving skills to address complex and high-pressure situations. Proactive communication and thorough documentation are also key to minimizing downtime and ensuring seamless collaboration across teams.

What is the difference between High Availability vs Network Administrator?

AspectHigh AvailabilityNetwork Administrator
Required CredentialsCertifications like Cisco CCNA, CompTIA Network+Certifications like Cisco CCNA, CompTIA Network+
Work EnvironmentData centers, enterprise IT, cloud servicesCorporate offices, data centers, IT departments
Industry UsageIT, telecommunications, cloud providersIT, telecommunications, enterprise sectors
Primary FocusEnsuring system uptime and redundancyManaging and maintaining network infrastructure

High Availability specialists focus on designing and implementing systems that minimize downtime through redundancy and failover strategies. Network Administrators manage and maintain network infrastructure, ensuring connectivity and security. While both roles require similar certifications and work in related environments, their core responsibilities differ: High Availability emphasizes system uptime, whereas Network Administrators focus on network performance and management.

What are the key skills and qualifications needed to thrive as a High Availability Engineer, and why are they important?

To thrive as a High Availability Engineer, you need expertise in systems architecture, redundancy strategies, and disaster recovery, typically supported by a degree in computer science or related field. Proficiency with clustering technologies, load balancers, databases, and monitoring tools such as Linux HA, VMware, or AWS is often required, along with certifications like AWS Certified Solutions Architect or RHCE. Strong problem-solving skills, attention to detail, and effective communication are crucial soft skills for anticipating issues and collaborating across teams. These skills ensure critical systems remain operational with minimal downtime, directly supporting business continuity and customer satisfaction.
More about High Availability jobs
Infographic showing various High Availability job openings in the United States as of May 2026, with employment types broken down into 81% Full Time, and 19% Part Time. Highlights an 95% Physical, 1% Hybrid, and 4% Remote job distribution, with an average salary of $44,572 per year, or $21.4 per hour.
Software Development Engineer, EC2 UltraServer Availability

Software Development Engineer, EC2 UltraServer Availability

Amazon

Seattle, WA • On-site

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 2 days ago


Amazon rating

7.4

Company rating: 7.4 out of 10

Based on 6,820 frontline employees who took The Breakroom Quiz

7th of 39 rated national retailers


Job description

The Software Development Engineer II will design, build, and maintain cloud-based repair and recovery workflows for NVIDIA GB200 / GB300 UltraServers, orchestrating repair and recovery operations from impairment detection through completed recovery. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Capacity Management, Hardware Engineering, and Datacenter Operations to manage AI/ML infrastructure.
Key job responsibilities
The Software Development Engineer (SDE II) on the EC2 UltraServer Availability team is responsible for ensuring high availability of customer GB200 and GB300 UltraServers by orchestrating complex repair and recovery workflows. Following are the core responsibilities
System Design & Architecture
* Design and architect solutions that are cross-functional to Capacity Management, Hardware Engineering, and Datacenter Operations
* Work in environments where the technology strategy is defined but the solution design is not
* Build solutions that are stable, logical, testable, and efficient with the ability to independently make trade-off decisions
* Investigate and develop design concepts to frame solution sets at an application and product level
Software Development
* Build cloud-based solutions using AWS native services for scaling infrastructure frameworks
* Write high-quality, maintainable code with proper testing and code reviews
* Develop and maintain the repair and recovery workflows for GB200 and GB300 UltraServer hosts
* Implement automation for diagnostic triage, hardware testing, cable validation, and testing processes
* Create observable systems with appropriate metrics and alarming
Operational Excellence
* Execute and monitor UltraServer workflows for UltraServer repair
* Troubleshoot workflow failures and coordinate with downstream teams
* Focus on operational excellence by identifying problems and proposing solutions
Hardware & Software Integration
* Work with hardware and software integrations specific to GPU clusters and AI/ML training systems
* Manage network partition configurations for multi-node GPU clusters
* Handle firmware validation and consistency checks across asset groups
Team Collaboration
* Collaborate with customers and stakeholders to convert business needs into technical designs
* Participate in code reviews and technical assessments
A day in the life
This is a hands-on position in which you will own everything from end to end: requirements gathering, designs, design reviews, implementations, code reviews, incremental feature launches, operations, mentoring, and the driving of continuous improvement.
About the team
The EC2 UltraServer Availability team is a high-performing engineering organization responsible for maintaining high availability of NVIDIA-based ML infrastructure at scale. We manage end-to-end repair and recovery workflows for GB200 and GB300 UltraServers, from initial problem detection through repair and recovery. Our team drives operational excellence through continuous improvement of problem detection, repair efficacy, and customer impact mitigation. We work closely with hardware engineering, data center operations, and EC2 service teams to ensure reliable, efficient recovery of critical ML compute capacity. This is a high-impact role leading a two-pizza team of talented engineers solving complex technical challenges in one of Amazon's fastest-growing infrastructure domains.
BASIC QUALIFICATIONS
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience programming with at least one software programming language
PREFERRED QUALIFICATIONS
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Bachelor's degree in computer science or equivalent
- Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you're applying in isn't listed, please contact your Recruiting Partner.
The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.
USA, WA, Seattle - 143,700.00 - 194,400.00 USD annually

What Amazon employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom


Amazon logo

About Amazon

Sourced by ZipRecruiter

Amazon.com, Inc., commonly known as Amazon, is an American multinational technology company. It was founded by Jeff Bezos in 1994 and initially started as an online marketplace for books. Since then, Amazon has expanded its operations and become one of the largest e-commerce companies in the world. Amazon's primary business is its online retail platform, where customers can purchase a vast array of products, including electronics, clothing, books, home goods, and much more. The company offers a convenient and user-friendly shopping experience, with features such as fast shipping, customer reviews, and personalized recommendations. In addition to its e-commerce platform, Amazon has diversified its business into various other areas. One of its notable ventures is Amazon Web Services (AWS), a comprehensive cloud computing platform that provides services such as storage, compute power, and database management to individuals and businesses. AWS has become a leader in the cloud computing industry, powering many websites and applications worldwide. Amazon has also developed its own consumer electronics, including the popular Amazon Kindle e-reader, Fire tablets, Fire TV streaming devices, and the Alexa-powered Echo smart speakers. The Alexa voice assistant, integrated into these devices, allows users to interact with their devices using voice commands, perform tasks, and access information. Furthermore, Amazon has expanded into media and entertainment. It operates Prime Video, a streaming service that offers a wide range of movies, TV shows, and original content. Amazon Music provides a platform for streaming and purchasing digital music, while Audible offers audiobooks and other audio content. The company's commitment to customer satisfaction and convenience is demonstrated by its membership program, Amazon Prime. Prime members receive various benefits, including free two-day shipping, access to streaming services, exclusive deals, and more.

Industry

It services, book publishers, retail, real estate and computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Seattle, WA, US