HPC System Administrator
$129K - $161K/yr
HPC System Administrator Position Type:Regular Hiring Range: $129,000 - $161,265/annually; Compensation will be based on education, experience, skills relevant to the role, and internal equity. Pay ...
$129K - $161K/yr
HPC System Administrator Position Type:Regular Hiring Range: $129,000 - $161,265/annually; Compensation will be based on education, experience, skills relevant to the role, and internal equity. Pay ...
$129K - $161K/yr
HPC System Administrator Position Type:Regular Hiring Range: $129,000 - $161,265/annually; Compensation will be based on education, experience, skills relevant to the role, and internal equity. Pay ...
Santa Clara, CA · On-site
$129K - $161K/yr
HPC System Administrator Position Type: Regular Hiring Range: $129,000 - $161,265 /annually; Compensation will be based on education, experience, skills relevant to the role, and internal equity. Pay ...
Santa Clara, CA · On-site
$129K - $161K/yr
HPC System Administrator Position Type: Regular Hiring Range: $129,000 - $161,265 /annually; Compensation will be based on education, experience, skills relevant to the role, and internal equity. Pay ...
Saline, MI · On-site
CAE HPC System Administrator Saline, Michigan (Hybrid) Description We are seeking a highly skilled CAE HPC Systems Administrator to manage, optimize, and support enterprise-level High-Performance ...
Quick apply
Saline, MI · On-site
CAE HPC System Administrator Saline, Michigan (Hybrid) Description We are seeking a highly skilled CAE HPC Systems Administrator to manage, optimize, and support enterprise-level High-Performance ...
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Quick apply
Houston, TX · On-site
MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of ...
Annapolis Junction, MD · On-site
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelors degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
Quick apply
Annapolis Junction, MD · On-site
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelors degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelor's degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
Quick apply
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelor's degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelor's degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
Quick apply
$155K - $194K/yr
Description HPC SYSTEM ADMINISTRATOR IV shall have a Bachelor's degree in Computer Science or related field, and have ten years of demonstrable experience in system administration and support of a ...
$180K - $220K/yr
System Administrators (HPC), must provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities ...
$180K - $220K/yr
System Administrators (HPC), must provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities ...
The System Administrator will provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities include ...
The System Administrator will provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities include ...
Annapolis Junction, MD · On-site
$180K - $220K/yr
System Administrators (HPC), must provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities ...
Quick apply
Annapolis Junction, MD · On-site
$180K - $220K/yr
System Administrators (HPC), must provide High Performance Computing (HPC) services in the form of HPC enhanced sustainment capabilities to two geographically dispersed areas. These capabilities ...
Annapolis, MD · On-site
$210K - $230K/yr
Systems Administrator IV (HPC) Location: Annapolis Junction, MD *Clearance: *Active TS/SCI w ... SYSTEM ADMINISTRATOR IV SYSTEM ADMINISTRATOR IV shall have a Bachelor's Degree in Computer Science ...
Annapolis, MD · On-site
$210K - $230K/yr
Systems Administrator IV (HPC) Location: Annapolis Junction, MD *Clearance: *Active TS/SCI w ... SYSTEM ADMINISTRATOR IV SYSTEM ADMINISTRATOR IV shall have a Bachelor's Degree in Computer Science ...
Salt Lake City, UT · On-site
$81K - $110K/yr
> Senior HPC Systems Administrator Fuse Engineering LLC Senior HPC Systems Administrator Salt Lake ... System integration and Acceptance testing during the purchase process of the HPS systems ...
Salt Lake City, UT · On-site
$81K - $110K/yr
> Senior HPC Systems Administrator Fuse Engineering LLC Senior HPC Systems Administrator Salt Lake ... System integration and Acceptance testing during the purchase process of the HPS systems ...
Hanover, MD · On-site
$110K - $165K/yr
We are seeking a System Administrator I for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi ...
Hanover, MD · On-site
$110K - $165K/yr
We are seeking a System Administrator I for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi ...
The High Performance Computing (HPC) Systems Administrator will leverage technical expertise to ... Troubleshoot and resolve system, hardware, and application issues across HPC environments.
The High Performance Computing (HPC) Systems Administrator will leverage technical expertise to ... Troubleshoot and resolve system, hardware, and application issues across HPC environments.
... Linux Systems Administrator to handle High Performance Computing (HPC) systems to support ... Modify system wide parameters to improve performance and evaluate the effects of those changes.
... Linux Systems Administrator to handle High Performance Computing (HPC) systems to support ... Modify system wide parameters to improve performance and evaluate the effects of those changes.
Hanover, MD · On-site
$110K - $165K/yr
... System Administrator I for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi-vendor HPC ...
Quick apply
Hanover, MD · On-site
$110K - $165K/yr
... System Administrator I for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi-vendor HPC ...
The High Performance Computing (HPC) Systems Administrator will leverage technical expertise to ... Troubleshoot and resolve system, hardware, and application issues across HPC environments.
The High Performance Computing (HPC) Systems Administrator will leverage technical expertise to ... Troubleshoot and resolve system, hardware, and application issues across HPC environments.
Hanover, MD · On-site
$145K - $188K/yr
... a System Administrator for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi-vendor HPC ...
Quick apply
Hanover, MD · On-site
$145K - $188K/yr
... a System Administrator for a funded role to provide High Performance Computing (HPC) services in the forms of HPC enhanced sustainment capabilities. These capabilities include Multi-vendor HPC ...
$41K - $49.8K
3% of jobs
$49.8K - $58.5K
6% of jobs
$58.5K - $67.3K
11% of jobs
$70K is the 25th percentile. Wages below this are outliers.
$67.3K - $76.1K
16% of jobs
The median wage is $84.2K / yr.
$76.1K - $84.9K
15% of jobs
$84.9K - $93.6K
14% of jobs
$101.2K is the 75th percentile. Wages above this are outliers.
$93.6K - $102.4K
12% of jobs
$102.4K - $111.2K
9% of jobs
$111.2K - $120K
6% of jobs
$120K - $128.7K
5% of jobs
$128.7K - $137.5K
3% of jobs
$41K
$88.9K
$137.5K
To thrive as an HPC System Administrator, you need expertise in Linux/UNIX system administration, networking, and high-performance computing architectures, often supported by a degree in computer science or a related field. Familiarity with workload managers (like Slurm), parallel file systems, scripting languages (such as Python or Bash), and relevant certifications (e.g., RHCSA or CompTIA Linux+) are typical requirements. Strong troubleshooting skills, attention to detail, and the ability to communicate complex technical concepts effectively are valuable soft skills. These competencies ensure reliable operation, efficient resource management, and effective support of users in demanding research and enterprise environments.
An HPC (High-Performance Computing) System Administrator is responsible for deploying, managing, and maintaining HPC clusters used for complex computations. Their duties include configuring hardware and software, optimizing system performance, ensuring security, and troubleshooting technical issues. They work with researchers, engineers, and developers to provide a stable computing environment that supports scientific simulations, data analysis, and other resource-intensive tasks.
As an HPC System Administrator, your daily tasks typically include monitoring and maintaining computing clusters, managing user accounts and permissions, deploying software updates, troubleshooting hardware and software issues, and ensuring optimal system performance. You may also assist researchers or engineers with job submissions, resolve resource bottlenecks, and participate in planning for future capacity needs. Regular collaboration with other IT staff and end users is essential to ensure that the computing environment meets evolving project requirements. This role provides a dynamic, team-oriented work environment where ongoing learning and problem-solving are key.

$129K - $161K/yr
Full-time
Posted 23 days ago
$129,000 - $161,265/annually;Compensation will be based on education, experience, skills relevant to the role, and internal equity.
Pay Frequency:AnnualThe High-Performance Computing (HPC) System Administrator is an expert, hands-on role responsible for the design, configuration, optimization, and operation of the organization's high-performance computing infrastructure. This individual will focus on advanced system optimization, complex troubleshooting, and strategic planning for future infrastructure enhancements across compute, storage, and high-speed interconnects (InfiniBand). A key responsibility is to mentor and cross-train existing system administrators, building the team's collective HPC expertise, strengthening shared support capabilities, and ensuring long-term operational resilience and efficiency.
The HPC Systems Administrator is a member of the Enterprise Systems team within the Cyberinfrastructure Technologies department. The incumbent works with the other Cyberinfrastructure teams - Network and Telecommunications, Enterprise Applications, and the Information Security Office - and other campus divisions in coordinating services, providing support and providing appropriate guidance. This incumbent will also work with University vendors and partners.
The HPC Systems Administrator will have a passion for providing excellent customer service, and a focus on continual improvement across all units; a commitment to supporting innovative infrastructure technologies; and a desire to identify and deliver the best possible technology resources and services to meet the needs of the campus community.
B. ESSENTIAL DUTIES AND RESPONSIBILITIES1. HPC Infrastructure Management and OptimizationCompute: Manages the entire lifecycle of all compute nodes, including procuring, installing, configuring, and maintaining hardware, operating systems, and core system software to ensure optimal performance, stability, and resource utilization for scientific workloads.
Storage: Directs the management of the high-performance parallel file systems (e.g., Lustre, GPFS), NAS, and backup solutions, executing capacity planning, performance tuning, and integrity checks to guarantee secure, high-speed, and reliable data access for all users.
InfiniBand: Designs, deploys, and provides expert-level troubleshooting and maintenance for the InfiniBand high-speed interconnect fabric, ensuring low-latency, high-bandwidth inter-node communication essential for scalable HPC application performance.
Slurm: Administers, configures, and tunes the Slurm Workload Manager, actively managing job queues, partitions, and resource allocation policies to enforce fair-share scheduling, maximize cluster utilization, and meet diverse research computational needs.
System Imaging: Develops, maintains, and updates standardized, optimized system images for all compute nodes, utilizing automation tools to facilitate rapid, consistent deployment, efficient patching, and streamlined upgrades across the cluster environment.
Software Licenses: Oversees the administration and compliance of all commercial scientific software licenses, ensuring adherence to vendor agreements and strategically managing license servers and usage policies to optimize utilization and accessibility for the HPC user base.
Knowledge Transfer: Develops and implements a formal cross-training program for existing system administrators by creating documentation and delivering hands-on instruction to enhance the team's collective expertise in HPC-specific technologies (Slurm, InfiniBand, parallel file systems).
Operational Resilience: Ensures robust, shared support capabilities across the IT team by strategically transferring HPC knowledge, actively preventing single points of failure, and improving the overall efficiency and responsiveness of the operational support model.
Strategic Enhancement: Contributes to the strategic planning and roadmap development for future HPC infrastructure and software enhancements by researching emerging technologies, evaluating vendor solutions, and providing expert recommendations to ensure the environment remains cutting-edge and meets long-term organizational goals.
Use broad expertise and unique skills to play an active role as a technical expert during the planning and implementation phases of new technologies, and participate in architecture brainstorming and design discussions with technical team members.
Provide technical guidance on complex infrastructure architecture challenges to IS team members and other solution partners.
Act as a role model for developing and trying different problem-solving approaches and supporting team members to do the same.
Coaches and develops new team members on how to provide the best customer service.
Models and supports other team members to conduct themselves with openness and honesty to enhance positive relationships based on trust, predictability, and communication.
Provide input on setting Enterprise Systems, and CIT, goals, objectives and strategies based on the University's mission, goals and strategic plan.
Provide input in technology planning processes to develop cost-effective customer-focused solutions.
Uses strong technical and organizational knowledge to plan and lead projects and working groups.
Work closely with the ES Manager in the creation, planning, maintenance, and secure expansion of SCU's computing infrastructure. This includes, but is not limited to, local and hosted servers, virtual appliances and devices, and storage.
Work closely with ES Manager to ensure that architecture principles and standards are consistently applied across the data center compute and storage services.
Collaborate with the Information Security Office (ISO) to ensure a secure and compliant enterprise environment.
Work with the ISO to ensure that systems are secure and to plan for future security needs and threats.
Ensure the appropriate distribution of infrastructure services to faculty, staff, and students.
Create and document standards and practices regarding data center, compute and storage services for use across the University.
Oversee the creation and performance of infrastructure production and test environments.
Create scalable, interoperable, and flexible infrastructure solutions.
Support assigned systems with on-call availability and respond within agreed upon timeframes.
Analyze and evaluate processes to document and implement standard routine and process for the application of patches/updates to operating systems, applications, and hardware and firmware to ensure all physical, virtual, and hosted systems are patched with the appropriate level of security and versioning.
Participate as necessary in backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media and are available off site.
Participate in disaster recovery and business continuity planning.
Perform daily system monitoring, verifying integrity and availability of all hardware, server resources, systems, and key processes. Check for potential problems, resource availability, capacity, performance and load characteristics, network integrity, and security threats. Monitor systems activity and usage to maintain a secure environment. Develop related solutions as warranted .
Work with the CISO and system stakeholders to establish upgrade and update schedules, and maintenance windows.
Keep abreast of software releases and updates, keeping all systems at current release levels as appropriate for the successful operation of the data center in support of the University.
Serve as the liaison with hosted platform and third party providers to monitor service level agreements and ensure that performance expectations and requirements are met.
Enhance existing architecture frameworks in order to define, design, and implement simplified, standards based system architectures.
Assist in the design, planning, and implementation of infrastructure systems optimization and process improvement projects.
Test and assess existing infrastructure against industry standard internal and external benchmarks to ensure optimal performance and service delivery.
Participate in IT and information security audits and prioritize corrective actions and successful remediation of areas supervised to ensure that continuous improvements are made on an ongoing basis.
Participate in the change management process to ensure all changes to relevant services are documented, tested, deployed,and prepped for back-out strategies if necessary.
Aware of industry trends and how to incorporate them with our infrastructure environment to improve services and/or cut costs.
Effectively communicate complex data analyses to provide technical and strategic input during the planning phase of potential projects in the form of technical architecture designs and recommendations.
Regularly communicate with Cyberinfrastructure Technologies colleagues regarding initiatives.
Keep the ES Manager informed of current and potential issues, activities, operational outages, and any other risks that might jeopardize or degrade IT service delivery to the University community.
Suggest operation strategies to accommodate major shifts in customer needs.
Determine procedures and methods for operational tasks required to maintain data center servers and related systems in reliable, stable operation. This person will use their experience and judgment to plan and accomplish goals and objectives, and to identify potential problems and define/implement solutions.
Supports academic programs by providing the necessary expertise and technical support to make faculty and student technology adoption successful, e.g., consulting with faculty launching initiatives, identifying their needs, evaluating solutions, implementing those solution
Support compute and storage needs of institutional programs by providing the necessary expertise and technical support to build a robust and highly available solution to meet their needs
Empower end users to successfully use the technology.
Interface with vendors, external resources.
Evaluate new software or systems under consideration for adoption.
Ensure asset management procedures are maintained and documented.
Work with the Enterprise Systems team to automate and streamline procedures within the department.
On occasion work beyond and in addition to traditional work schedules/hours. Required to carry a cell phone and be on-call.
Utilize technologies and tools to support the compute and storage infrastructure: programming, scripting, diagnostic tools.
10. Other duties as assigned by the Manager of Enterprise Systems and IS leadership.
C. PROVIDES WORK DIRECTIONMay supervise student workers.
D. GENERAL GUIDELINESRecommends initiatives and implements changes to improve quality and services.
Identifies and determines cause of problems; develops and presents recommendations for improvement of established processes and practices.
Maintains contact with customers and solicits feedback for improved services.
Maximizes productivity through use of appropriate tools; plan training and performance initiatives.
Researches and develops resources that create timely and efficient work flow.
Prepares progress reports; informs supervisor of project status and deviation from goals. Ensures completeness, accuracy and timeliness of all operational functions.
Prepares and submits reports as requested and required.
Develops and implements guidelines to support the functions of the unit.
To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The items below are representative of the knowledge, skills, abilities, education, and experience required or preferred.
This position requires the ability to effectively establish and maintain cooperative working relationships within a diverse multicultural environment.
1. Knowledge, Skills and Abilities
General
Knowledge of information technology, campus technology, and information security issues and trends in higher education, and ability to continually develop new knowledge regarding the same.
Ability to listen and understand customer needs.
Ability to plan, implement, and evaluate customer service initiatives.
Ability to work in a collaborative environment, as either a member or leader of a team, to meet deadlines and achieve goals.
Ability to manage a diverse workforce to provide excellent customer service.
Self-motivated and shows initiative.
Ability to successfully manage multiple projects simultaneously.
Proven track record in project planning and project management.
Ability to exercise independent judgment and engage in critical thinking and problem solving.
Ability to work effectively under pressure in a busy (sometimes chaotic) and demanding information services environment.
Ability to explain technical issues and policies to non-technical people.
Ability to give presentations on technical issues to a broad range of audiences.
Ability to foster and maintain good working relationships with faculty, administrators, students, senior management, and other leaders.
Ability to handle sensitive matters with diplomacy...
Sourced by ZipRecruiter
Education programs administration
1,001 - 5,000 Employees
Santa Clara, CA, US
1851