Job Summary:
Tesla is at the forefront of innovation in AI and Supercomputing, and they are seeking a highly skilled Software Engineer to join their Supercomputing team. The role involves designing, developing, and deploying software for high-performance computing infrastructure that supports AI, Full-Self-Driving, and other engineering initiatives.
Responsibilities:
• Design and develop software components for our infrastructure control plane, including: resource management and allocation; provisioning & automation of infrastructure components; monitoring and management of datacenter resources; integration with distributed storage systems
• Collaborate with cross-functional teams to ensure seamless integration of our software with our datacenter infrastructure
• Develop and maintain code for infrastructure software, focusing on areas such as: scalability & performance optimization; availability, reliability, & fault tolerance; automation & orchestration of datacenter operations
• Work closely with the operations team to ensure smooth deployment and operation of infrastructure software
• Participate in the testing and validation of infrastructure software to ensure it meets quality and reliability standards
• Collaborate with other Engineers to identify and resolve technical issues, and to continuously improve the design and operation of our datacenter infrastructure
Qualifications:
Required:
• Degree in Computer Science, Electrical Engineering, or related field or equivalent experience
• 5+ years of experience in software development, with a focus on infrastructure software and datacenter operations for large scale gpu/hpc clusters
• Strong programming skills in languages such as Python, Go, or Bash
• Experience with Slurm resource management and job scheduling systems
• Experience with distributed high performance storage systems
• Strong understanding of system design principles, including scalability, availability, and reliability
• Experience with agile development methodologies and version control systems such as Git
• Excellent problem-solving skills, with the ability to analyze complex technical issues and develop creative solutions
• Strong communication and collaboration skills, with the ability to work effectively with cross-functional team
• Knowledge of containerization technologies, such as Docker or Kubernetes
Company:
Tesla is an electric vehicle and clean energy company that provides electric cars, solar, and renewable energy solutions. Founded in 2003, the company is headquartered in Austin, USA, with a team of 10001+ employees. The company is currently Late Stage.