Skip to Main Content

Senior HPC Systems Engineer

Corvid Technologies LLC
Mooresville, NC
  • Posted: over a month ago
  • Full-Time
Job Description

Corvid Technologies is seeking a Sr. HPC Systems Engineer with a strong background and enthusiasm for Linux to support our Linux based High Performance Computer consisting of 50,000+ processor cores. If you enjoy learning, playing with hardware, optimizing performance, efficiency, and spend most of your time on the command line, this is the job for you.

This candidate will be responsible for the following:

  • Compile (with icc optimizations), install, and test HPC software for internal and external customers
  • Manage Linux license servers and licensed applications for internal and external HPC customers
  • Write and troubleshoot custom job scheduler submission scripts for internal and external HPC customers
  • Troubleshoot slow, hanging, or failing HPC jobs on internal or customer HPC clusters
  • Automate repetitive tasks and implement custom solutions using scripting/programming languages such as bash or python
  • Configure and troubleshoot a heterogeneous (QDR, FDR, EDR) InfiniBand network and associated subnet manager
  • Provide guidance and support on HPC best practices and solutions for internal and external customers
  • Design, test and implement an HPC environment consisting of a provisioner (e.g. xcat, warewulf), scheduler (e.g. Slurm, SGE, PBS), RDMA connections (e.g. InfiniBand), a subnet manager, and 5+ compute nodes
  • Troubleshoot and monitor resource utilization/availability on Linux servers
  • Configure, maintain, and troubleshoot HPC scheduler issues
  • Install and configure cluster nodes on internal HPC cluster
  • Troubleshoot hardware and software issues on HPC cluster nodes

Requirements:

  • Bachelor's degree in Engineering (Masters Preferred)
  • 5+ yrs scripting experience
  • 5+ yrs professional/personal experience using command line Linux
  • Obtain and maintain a U.S security clearance

Preferred Skills:

  • Experience installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.) required
  • Experience configuring, installing, and troubleshooting MPI and OpenMP applications preferred
  • Experience with operating system deployment tools (e.g. XCAT, ROCKS) preferred
  • Hands-on experience of at least one distributed file system (Spectrum Scale-GPFS, Lustre, BeeGFS, Gluster, IMRIX, PVFS, etc.) preferred
  • Direct experience working with InfiniBand
  • Experience configuring, installing, tuning, and maintaining scientific software on large-scale systems preferred
  • Experience supporting HPC compilers and libraries preferred
  • Experience with configuration management tools such as Ansible or Puppet preferred
  • Experience configuring, installing, maintaining, and using performance monitoring and optimization tools preferred

Corvid offers a competitive benefits package comprised of BCBS healthcare, dental, 5.5% 401k match, gym membership, paid leave, and paid holidays.


Corvid Technologies LLC

Address

Mooresville, NC
28117 USA

Industry

Technology

View all jobs at Corvid Technologies LLC

What email should the hiring manager reach you at?

By clicking the button above, I agree to the ZipRecruiter Terms of Use and acknowledge I have read the Privacy Policy, and agree to receive email job alerts.