Job Title: Senior HPC Infrastructure Engineer
Primary Location: Chicagoland, Hybrid with minimum of 2 days in-office.
Position Type: 12 mos. Contract with Contract-to-Hire potential.
Overview
client is casting a line for a Sr. HPC Infrastructure Engineer! This is a 12 mos. contract with potential to convert to FTE opportunity. This position plays a key role in supporting the design, deployment, and optimization of high-performance computing (HPC) infrastructure— both on-prem and in cloud environments. This role combines deep technical system expertise with hands-on administration to ensure scalable, reliable, and secure environments for advanced scientific research and computational workloads.
What You Bring to the Role (Ideal Experience)
• Strong background in Linux/Unix system administration
• Experience designing and supporting HPC clusters in research, academic, or scientific computing environments
• Proficiency with parallel computing frameworks such as MPI and OpenMP
• Familiarity with job scheduling/resource management systems (e.g., Slurm, Torque, PBS)
• Hands-on experience with high-speed interconnects (e.g., InfiniBand, Omni-Path)
• Strong understanding of networking, storage solutions, and system performance tuning
• Experience with backup, disaster recovery, and data integrity solutions in high-performance environments
• Fluency in scripting (e.g., Bash, Python)
• Strong troubleshooting skills and collaborative communication style
• Bachelor's degree in Computer Science, Engineering, or equivalent experience (Master's preferred)
• Relevant technical certifications (e.g., Red Hat, CompTIA Linux+) are a plus
What You'll Do (Skills Used in this Position)
• Design, deploy, and manage scalable HPC systems across both cloud and on-prem environments
• Define system requirements and optimize Linux-based systems for performance, reliability, and scalability
• Maintain, monitor, and patch HPC environments to ensure high availability and security
• Design and manage high-performance storage systems with robust backup, replication, and archival strategies
• Conduct benchmarking and performance tuning, collaborating with HPC operations to resolve bottlenecks
• Partner with cybersecurity teams to ensure compliance and security in HPC environments
• Maintain technical documentation, SOPs, and troubleshooting guides
• Provide end-user training and technical support, managing on-site computing technologies
• Contribute to overall operational efficiency through team collaboration and continual improvement initiatives
If you are interested or have any references please share resume at mukul@brightmindsol.com.