Job Summary:
Figure is an AI robotics company developing autonomous general-purpose humanoid robots. They are seeking an experienced Training Infrastructure Engineer to manage their training cluster and implement distributed training algorithms for AI researchers.
Responsibilities:
โข Design, deploy, and maintain Figure's training clusters
โข Architect and maintain scalable deep learning frameworks for training on massive robot datasets
โข Work together with AI researchers to implement training of new model architectures at a large scale
โข Implement distributed training and parallelization strategies to reduce model development cycles
โข Implement tooling for data processing, model experimentation, and continuous integration
Qualifications:
Required:
โข Strong software engineering fundamentals
โข Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field
โข Experience with Python and PyTorch
โข Experience managing HPC clusters for deep neural network training
โข Minimum of 4 years of professional, full-time experience building reliable backend systems
Preferred:
โข Experience managing cloud infrastructure (AWS, Azure, GCP)
โข Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
โข Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)
Company:
Figure is an AI robotics company that develops autonomous general-purpose humanoid robots. Founded in 2022, the company is headquartered in San Jose, USA, with a team of 201-500 employees. The company is currently Growth Stage.