Job Summary:
SpaceX was founded with the goal of enabling human life on Mars through innovative technology development. The Site Reliability Engineer will design, develop, and test solutions for SpaceX flight systems, ensuring high-performance and scalable web applications. This role involves collaborating across engineering teams to build robust systems that support critical software applications for future missions.
Responsibilities:
• Develop automation to deploy and manage applications both on-premises and in the cloud
• Deploy and manage core infrastructure technologies such as applications servers, databases, messaging queues and storage
• Install, manage, scale and optimize Kubernetes and RKE clusters using Ansible and adjacent technologies in production environments
• Closely collaborate with software engineers to create highly scalable, operable and maintainable products
• Closely collaborate with IT and software engineers to develop test automation suite leveraging DevOps infrastructure
• Engage in and improve the whole lifecycle of services -- from inception and design, through deployment, operation and refinement
• Work closely with other SpaceX engineers to gather requirements, research, evaluate, design, plan, deploy, and support software platforms and related technologies running in Kubernetes within a world-class environment that meets the needs of the demanding SpaceX engineering teams
• Build highly resilient, high-performance, scalable, and robust systems
• Exercise a high degree of personal responsibility for the processes, systems, and tools you create and manage; all supporting the goal of making humanity a multiplanetary species
Qualifications:
Required:
• Bachelor’s degree in computer science, information systems, or an engineering discipline; OR 2+ years of professional experience in software, DevOps, or site reliability engineering in lieu of a degree
• 1+ year of experience with Linux operating systems
• Experience in Bash, Python, or other scripting languages
• Active Top Secret, Top Secret SCI, or DOE Level Q clearance
Preferred:
• 1+ years of systems administration, site reliability engineering, or DevOps experience
• Experience with containerization technologies (i.e. Docker, Kubernetes)
• Experience with the Linux shell as well as configuring and extending Linux instances (e.g. kernel modules, Control Groups, Public Key Infrastructure, iptables, interfaces)
• 1+ years of experience with Python and Python-based development frameworks
• Strong understanding of Kubernetes, Docker, or similar technologies
• Strong understanding of message queue technologies such as RabbitMQ or Kafka
• Strong understanding of virtualization and hypervisor technologies
• Understanding of databases and performance tuning
• Experience with identity management and authentication protocols
• Focus on performance bottlenecks and performance improvement techniques
• Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations
• Ability to quickly learn new tools and frameworks
• Experience with dynamic system configuration templating using Jinja, YAML and Helm
Company:
SpaceX develops and operates rockets, satellite networks, and AI infrastructure including launch, connectivity, and cloud services. Founded in 2002, the company is headquartered in Hawthorne, USA, with a team of 1001-5000 employees. The company is currently Late Stage.