... Supercomputer systems and other OLCF managed HPC clusters. Job Responsibilities: * Work with the team to define and implement best practices and standards within the organization * Keeping the ...
... Supercomputer systems and other OLCF managed HPC clusters. Job Responsibilities: * Work with the team to define and implement best practices and standards within the organization * Keeping the ...
Senior Platform Engineer
Knoxville, TN · On-site
$93K - $127K/yr
... Supercomputer systems and other OLCF managed HPC clusters. Job Responsibilities: * Work with the team to define and implement best practices and standards within the organization * Keeping the ...
Senior Platform Engineer
Knoxville, TN · On-site
$93K - $127K/yr
... Supercomputer systems and other OLCF managed HPC clusters. Job Responsibilities: * Work with the team to define and implement best practices and standards within the organization * Keeping the ...
Translate complex technical concepts to communicate effectively with scientific, operations, project, and management staff * Help shape the Supercomputing security architectural vision and lead ...
Translate complex technical concepts to communicate effectively with scientific, operations, project, and management staff * Help shape the Supercomputing security architectural vision and lead ...
HPC Cybersecurity Engineer
Oak Ridge, TN · On-site
Translate complex technical concepts to communicate effectively with scientific, operations, project, and management staff * Help shape the Supercomputing security architectural vision and lead ...
HPC Cybersecurity Engineer
Oak Ridge, TN · On-site
Translate complex technical concepts to communicate effectively with scientific, operations, project, and management staff * Help shape the Supercomputing security architectural vision and lead ...
Manager, Operations
Memphis, TN · On-site
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
Manager, Operations
Memphis, TN · On-site
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
Manager, Operations
Memphis, TN · On-site
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
Quick apply
Manager, Operations
Memphis, TN · On-site
... support our supercomputing clusters. You will build and lead high-performing operations, power ... Manage operational budgets, vendor relationships (maintenance contractors, fiber providers, power ...
Geospatial Analyst
Oak Ridge, TN · On-site
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
Geospatial Analyst
Oak Ridge, TN · On-site
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
Geospatial Analyst
Oak Ridge, TN · On-site
$50K/yr
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
Geospatial Analyst
Oak Ridge, TN · On-site
$50K/yr
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
Postdoctoral Research Associate - HPC Confidential Computing and Secure Multi-tenancy
Oak Ridge, TN · On-site
... supercomputing resources. The candidate will also contribute to the development of large-scale identity and key management solutions. We are a leader in computational and computer science, with ...
Postdoctoral Research Associate - HPC Confidential Computing and Secure Multi-tenancy
Oak Ridge, TN · On-site
... supercomputing resources. The candidate will also contribute to the development of large-scale identity and key management solutions. We are a leader in computational and computer science, with ...
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
Quick apply
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
... supercomputing goals through cross-functional collaboration. * Partner with engineering and ... Manage change orders, identifying value-engineered solutions to minimize unnecessary changes and ...
New
Quick apply
... supercomputing goals through cross-functional collaboration. * Partner with engineering and ... Manage change orders, identifying value-engineered solutions to minimize unnecessary changes and ...
New
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
Quick apply
... supercomputing data centers. Working hands-on with MEP systems, you'll tackle maintenance ... Experience with PLC-based control systems and/or with Building Management Systems (Client)
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
... managing and organizing large volumes of geospatial data; (4) providing technical support and ... Support prototype and production workflows within HPC, Supercomputer and Cloud processing ...
Project Manager, Construction
Memphis, TN · On-site
... supercomputing goals through cross-functional collaboration. * Partner with engineering and ... Manage change orders, identifying value-engineered solutions to minimize unnecessary changes and ...
Project Manager, Construction
Memphis, TN · On-site
... supercomputing goals through cross-functional collaboration. * Partner with engineering and ... Manage change orders, identifying value-engineered solutions to minimize unnecessary changes and ...
... supercomputing resources. The candidate will also contribute to the development of large-scale identity and key management solutions. We are a leader in computational and computer science, with ...
... supercomputing resources. The candidate will also contribute to the development of large-scale identity and key management solutions. We are a leader in computational and computer science, with ...
Sr. Manager, Engineering
Memphis, TN · On-site
... supercomputing clusters. You will build and lead a world-class multidisciplinary engineering team while driving technical strategy for new builds, expansions, and continuous improvement of mission ...
Sr. Manager, Engineering
Memphis, TN · On-site
... supercomputing clusters. You will build and lead a world-class multidisciplinary engineering team while driving technical strategy for new builds, expansions, and continuous improvement of mission ...
Manager Supercomputer information
What is the difference between Manager Supercomputer vs Supercomputing Systems Engineer?
| Aspect | Manager Supercomputer | Supercomputing Systems Engineer |
|---|---|---|
| Required Credentials | Bachelor's or master's in computer science, engineering, or related field; management experience | Bachelor's or master's in computer science, computer engineering, or related field; technical certifications |
| Work Environment | Oversees supercomputing facilities, manages teams, strategic planning | Designs, develops, and maintains supercomputing systems, works hands-on with hardware/software |
| Employer & Industry Usage | Research labs, government agencies, large tech companies | Research institutions, high-performance computing centers, tech firms |
The Manager Supercomputer primarily oversees supercomputing operations and manages teams, focusing on strategic and administrative tasks. In contrast, the Supercomputing Systems Engineer is more technically involved, designing and maintaining supercomputing systems. Both roles require strong technical backgrounds, but their responsibilities differ in scope and focus.
Full-time
Medical, Dental, Vision, Retirement, PTO
Posted 22 days ago
Job description
**Please note: The first step in the interview process requires candidates to join a Microsoft Teams meeting with the video turned on.**
- Working with highly talented team members
- 3 weeks’ vacation
- Excellent medical insurance, including employer-paid benefits
As a Kubernetes Platform Engineer for the HPC Platform teams, you will work to support all activities of our supercomputer center. Our primary platform is the OLCF Slate Service, built on Kubernetes and RKE2, which provides a container orchestration service for running critical operation applications and user-managed persistent applications that run alongside the OLCF Supercomputer systems and other OLCF managed HPC clusters.
- Work with the team to define and implement best practices and standards within the organization
- Keeping the Kubernetes platform reliable, available, and fast
- Architecting solutions to problems that improve the reliability, scalability, performance, and efficiency of our services
- Respond to, investigate, and fix service issues all the way from bare metal through the OS to the application code
- Coordinate with vendors to resolve hardware and software problems
- Participate in an on-call rotation providing 24-hour, 7-day support and off-hours maintenance windows
- Work with users to help them use Kubernetes
- Bachelor’s degree in a scientific field and a minimum of 5-8 years of relevant experience. An equivalent combination of education and experience will be considered.
- Experience with Kubernetes as a cluster administrator for on-premises deployments
- Excellent interpersonal/communications skills, and the ability to work as part of a team
- Strong working knowledge of Linux systems fundamentals and networked computing environment concepts
- Experience with code reviews, code quality, CI/CD tooling, GitOps, SCM (e.g. GitLab)
- Ability to identify requirements and to define, plan, and implement requisite solutions for small and medium projects
- Ability to develop and maintain programs and scripts that aid in the operation and automation of tasks using various shell and scripting languages (primarily bash, Python, and Go)
- Experience with on-call rotation
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
- Bachelor’s degree in a scientific field and 8-10 years of relevant experience.
- Subject matter expert in Kubernetes as a cluster administrator for bare metal, on-premises deployments
- Excellent interpersonal/communications skills, be able to effectively communicate with other teams and organizational leadership. Convey technical details to a non-or semi-technical audience.
- Ability to identify requirements and to define, plan, and implement requisite solutions for large, organizationally impactful projects.
- Self-driven with the ability to work in a dynamic, loosely structured research amp; development environment.
- Experience with RKE2 (nice to haves: Red Hat OpenShift and Talos). Multi-cluster management tools for Kubernetes (e.g. Fleet), and container security tools (Neuvector, SCC, pod admission control)
- Experiencing with managing image registries such as Quay or Harbor
- Experience using tools such as Prometheus, Nagios, and Grafana to monitor systems, metrics and create dashboards
- Experience designing and implementing highly-available systems/services
- Experience with Infrastructure-as-Code tooling such as Terraform, Helm, and Puppet
- Experience implementing systems-level security technologies (e.g. SELinux, Seccomp, linux capabilities), experience with DevSecOps, and general security best practices.
- Experience with AIOps and MLOps tooling – e.g. KServe, Kubeflow, vLLM, NVidia Enterprise AI, AMD Silo AI, ClearML, MLFlow
- Experience using HPC hardware for Kubernetes – e.g. RDMA, DPUs, Infiniband, many-core CPUs
- Experience with declarative CI/CD tools such as ArgoCD
- Experience with workflow engines such as Apache Airflow or Argo Workflows
- Experience with infrastructure automation
- Cloud engineering experience with at least one cloud service provider
- Experience with reusable, automated workflows such as PagerDuty playbooks
About Cadre5
Sourced by ZipRecruiter
Industry
Software development
Company size
11 - 50 Employees
Headquarters location
Knoxville, TN, US
Year founded
1999