Job Summary:
The Federal Reserve Bank of Philadelphia is one of the 12 regional Reserve Banks that, together with the Board of Governors in Washington, D.C., make up the Federal Reserve System. The Infrastructure & HPC Systems Engineer will ensure the integrity, reliability, and availability of agile research computing environments by managing server infrastructure, HPC clusters, and providing advanced technical support to end users while developing automation tools.
Responsibilities:
• You will respond to problems and maintains Windows and Linux server environments in research settings
• Design, deploy, configure, and administer HPC clusters and associated systems
• Monitor system health, performance metrics, and resource utilization to ensure optimal, efficient operation
• Implement robust security protocols and perform regular maintenance including upgrades and patching
• Manage job scheduling and workload optimization using tools like Slurm
• Support and troubleshoot user endpoints, servers, and services in various environments (i.e. cloud, on-premises, collocation)
• Participate in planning, budgeting, and monitoring of various environments
• You develop tools and scripts to automate management and creation of systems and services in various environments
• Create and maintain automation scripts to streamline system administration tasks
• Optimize scientific applications and computational workflows for performance
• Implement container technologies (Docker) for reproducible research
• Support GPU computing and accelerator technologies for specialized workloads
• Design and implement innovative HPC solutions to address evolving research requirements
• Define and track performance metrics to ensure efficient current and future use of resources
• You will respond to research end user requests to diagnose problems and provide specialized technical support
• Troubleshoot highly complex hardware and software issues in multi-user research environments
• Resolve problems quickly and accurately with thorough follow-up to ensure complete resolution
• Assist staff with IT-related problem resolution and use of facilities
• You partner closely with researchers to understand computational needs and translate them into technical solutions
• Collaborate with network, security, and data teams to ensure integrated operations
• Build and maintain relationships with vendors and technology partners
• Collaborate as technical advisor on infrastructure planning and technology roadmaps
• Participate in product and technology evaluations, testing, and pilot activities to provide sound recommendations
• Engage in Federal Reserve System, academic, and other HPC communities to stay current with emerging technologies and effective practices
• Develop comprehensive documentation for systems, policies, and procedures
• Create user guides and training materials for researchers utilizing HPC resources
• Conduct workshops and training sessions on effective use of HPC resources and research computing tools
Qualifications:
Required:
• Bachelor’s degree in computer science, engineering, mathematics, or related field, or equivalent combination of education and experience.
• Minimum of 5 years of relevant experience in HPC administration and systems engineering.
• Extensive experience with Linux operating systems (Red Hat/CentOS) in an HPC environment.
• Command line skills and proficiency in scripting languages (Python, Bash).
• Experience with job scheduling systems (SLURM) and resource management.
• Knowledge of parallel file systems and storage technologies (e.g. ceph, GPFS, Lustre, BeeGFS).
• Familiarity with parallel programming models (MPI, OpenMP) and scientific computing frameworks.
• Experience with configuration management and automation tools (Terraform).
• Demonstrated specialized problem-solving abilities and analytical thinking.
• Solid appreciation for research, sound judgment and healthy professional skepticism, understands sensitivities, considers big picture in addition to tactical details.
• Ability to communicate effectively with PhD economists as well as with various levels of personnel and different types of specialists, strong interpersonal and listening skills, approachable.
• Agile and comfortable working in evolving rigorous research environments.
• Research support-oriented, responsive to time-sensitive matters and custom needs.
Company:
The Federal Reserve Bank of Philadelphia helps formulate and implement monetary policy, supervises banks and bank holding companies, and provides financial services to depository institutions and the federal government. Founded in 1914, the company is headquartered in Philadelphia, PA, US, , with a team of 501-1000 employees. The company is currently Late Stage.