Job Summary:
SpaceX is a pioneering aerospace manufacturer and space transport services company, and they are seeking a Sr. Kubernetes Engineer to join their Information Technology Linux Infrastructure team. The role involves providing expertise in Kubernetes design, maintenance, scaling, and optimization to support critical business functions.
Responsibilities:
• Build, install, manage, scale and optimize Kubernetes and RKE clusters using Ansible, Terraform and adjacent technologies in production environments.
• Work closely with other SpaceX engineers to gather requirements, research, evaluate, design, plan, deploy, and support software platforms and related technologies running in Kubernetes within a world-class environment that meets the needs of the demanding SpaceX engineering teams. Build highly resilient, high-performance, scalable, and robust systems.
• Exercise a high degree of personal responsibility for the processes, systems, and tools you create and manage; all supporting the goal of making humanity an interplanetary species.
• Make recommendations, justify, and implement improvements using an accepted change control methodology.
• Work within a diverse group to design and deliver creative solutions and resolve problems in a timely and proactive manner by interacting with internal business units.
• Define, document and follow standards and best practices for systems design, testing, and implementation.
• Foster an environment of collaboration and cross-training, upskilling the team in Kubernetes expertise and ensuring peers are developed into capable engineers.
• Drive scripting, self-service and automation to develop solutions to reduce administrative overhead and TOIL.
• Participate in on-call rotation to handle urgent after-hours work when necessary.
Qualifications:
Required:
• Bachelor’s degree in Computer Science or a STEM discipline and 5+ years of systems engineering experience; OR 7+ years of systems engineering experience in lieu of a degree.
• Experience deploying and supporting Linux servers in physical and virtualized environments (e.g. VMware via automation).
• Experience with the Linux shell as well as configuring and extending Linux instances (e.g. kernel modules, cgroups, pki, iptables, interfaces).
• Experience supporting and scaling containerized applications in Linux environments.
• Experience using automation frameworks (e.g. Ansible, Terraform) to manage provisioning and post-provisioning lifecycles of infrastructure and Kubernetes installations.
• Must be willing to work extended hours and weekends as needed.
• Ability to pass Air Force background check for Cape Canaveral.
• To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State.
Preferred:
• Expertise in creating repeatable, reliable, scalable systems architectures, with high availability, fault tolerance, performance tuning, monitoring, and statistics/metrics collection.
• Expertise in source code version control tools such as Git and Subversion and collaborating on source code via Pull Requests and other Git-based workflows.
• Strong understanding of Linux Container Runtime.
• Experience implementing configuration management provisioning and workflow automation solutions via Infrastructure as Code, CI/CD and GitOps (e.g. Ansible, AWX/Tower, Vagrant, Puppet, Redfish, Jenkins, cloud-init, ArgoCD, etc).
• Experience writing test automation to ensure backwards compatibility of feature and change development for automation processes and Kubernetes deployments.
• Experience with programming and scripting languages such as Python and Golang to develop software solutions and integrate with external systems to implement automation against RESTful API services.
• Experience installing, configuring and troubleshooting Kubernetes internals, CNI, CRI and CSI plugins (e.g. Docker, Cri-O, Ceph, Cilium), load balancing (e.g. MetalLB), Service Mesh (e.g. Istio) and software-defined storage (e.g. rook-ceph) in cloud or on-premise environments.
• Experience developing solutions using Kubernetes patterns to extend system functionality and solve custom use cases (e.g. webhooks, controllers, operators, sidecars).
• Experience implementing proactive alert/monitoring workflows and dashboards for Linux systems and Kubernetes deployments using Prometheus, Grafana, InfluxDB or similar technologies.
• Experience with dynamic system configuration templating using Jinja, Jsonnet, YAML and Helm.
Company:
SpaceX develops and operates rockets, satellite networks, and AI infrastructure including launch, connectivity, and cloud services. Founded in 2002, the company is headquartered in Hawthorne, USA, with a team of 1001-5000 employees. The company is currently Late Stage.