Job Summary:
Bot Auto is revolutionizing the transportation of goods with cutting-edge autonomous trucks. They are seeking a highly skilled Software Engineer to design, develop, and scale machine learning infrastructure, focusing on model evaluation, training workflows, and data management.
Responsibilities:
• Architect and own a scalable, end-to-end model evaluation platform for perception and prediction models central to autonomous driving. Define metrics, design for scale, and make results actionable for researchers.
• Partner with research scientists to optimize and scale distributed training workflows. Integrate experiment tracking and reproducibility into the model lifecycle from day one.
• Design and maintain a versioned, high-quality training data store that accelerates model development and supports rapid iteration.
• Build automated pipelines spanning data preparation, model training, validation, and deployment — enabling fast experimentation and reproducible outcomes.
• Contribute to tooling and infrastructure that powers high-throughput, high-accuracy data annotation at scale.
• Develop production ML services that treat models as products — with reliability, observability, and continuous improvement built in.
• Maintain and evolve a robust data storage and access layer (S3 data lake, Delta Lake) underpinning annotation, evaluation, and training workflows.
• Build scalable, reliable data collection pipelines supporting diverse vehicle dispatch missions.
• Develop foundational services and packages that provide clean, performant access to autonomous driving data across the stack.
Qualifications:
Required:
• Educational Background: Bachelor's or Master's in Computer Science, or equivalent practical experience.
• Strong Programming Skills: Strong proficiency in Python; working knowledge of C++
• ML/DL Infrastructure Experience — Demonstrated hands-on experience building or scaling at least one of the following in a production environment: Evaluation platforms — automated model benchmarking, metric computation, and regression tracking across model versions. Training infrastructure — distributed training pipelines, experiment tracking, and model lifecycle management (e.g. W&B, MLflow, ClearML). Dataset curation & feature stores — versioned dataset management, data lineage, and tooling for high-quality training data at scale. Annotation platforms — tooling or pipelines that support high-throughput, high-accuracy labeling workflows.
• Distributed Systems — Strong experience with distributed computing and container orchestration — Kubernetes, Spark, or comparable frameworks.
• Ability to operate independently: scope ambiguous problems, make sound architecture decisions, and drive them to completion.
Preferred:
• C++ experience in performance-sensitive or safety-critical applications
• Full-stack service development experience.
• Prior work in autonomous driving or robotics.
Company:
Transforming American Transportation with Autonomous Trucks Founded in 2023, the company is headquartered in Houston, USA, with a team of 51-200 employees. The company is currently Growth Stage.