Job Summary:
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. They are seeking an experienced Training Infrastructure Engineer to manage their training cluster and implement distributed training algorithms for AI researchers.
Responsibilities:
• Design, deploy, and maintain Figure's training clusters
• Architect and maintain scalable deep learning frameworks for training on massive robot datasets
• Work together with AI researchers to implement training of new model architectures at a large scale
• Implement distributed training and parallelization strategies to reduce model development cycles
• Implement tooling for data processing, model experimentation, and continuous integration
Qualifications:
Required:
• Strong software engineering fundamentals
• Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field
• Experience with Python and PyTorch
• Experience managing HPC clusters for deep neural network training
• Minimum of 4 years of professional, full-time experience building reliable backend systems
Preferred:
• Experience managing cloud infrastructure (AWS, Azure, GCP)
• Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
• Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)
Company:
Figure is an AI robotics company that develops autonomous general-purpose humanoid robots. Founded in 2022, the company is headquartered in San Jose, USA, with a team of 201-500 employees. The company is currently Growth Stage.