Job Summary:
FieldAI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. The Senior Machine Learning Platform Engineer will design and manage scalable ML infrastructure, develop cloud-based pipelines, and ensure the reliability of MLOps workflows while mentoring junior engineers.
Responsibilities:
• Design and manage scalable ML infrastructure with IaC tools (Terraform, CloudFormation).
• Develop and optimize cloud-based pipelines for training, evaluation, and inference on multimodal datasets.
• Build and operate data systems for large-scale video ingestion, indexing, and storage.
• Maintain MLOps workflows for versioning, experiment tracking, reproducibility, and CI/CD.
• Ensure reliability and observability with monitoring, logging, and alerting.
• Collaborate with AI/ML Engineers to productionize workflows.
• Optimize infrastructure for performance and cost across cloud and edge.
• Enforce best practices in security, compliance, and maintainability.
• Mentor and manage junior engineers, providing technical guidance and career development.
Qualifications:
Required:
• Bachelor’s/Master’s in Computer Science, Engineering, or related field (or equivalent experience).
• 4+ years of industry experience in ML infrastructure or platform engineering.
• Strong coding skills in Python/TypeScript and a strong foundation in software engineering best practices.
• Proven experience with distributed systems, cloud platforms (AWS preferred), containerization and orchestration (Docker, Kubernetes/EKS, Ray), and serverless.
• Hands-on experience building ML pipelines for distributed training and large-scale inference.
• Strong knowledge of data management at scale, including preprocessing and retrieval of video/image datasets.
• Proficiency with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automation.
• Familiarity with MLOps tools (MLflow, Kubeflow, Airflow).
• Experience with system monitoring and observability in production.
Preferred:
• Experience with vector databases (OpenSearch, Pinecone, Weaviate) for indexing and retrieval.
• Familiarity with distributed training frameworks (Horovod, DDP/FSDP, DeepSpeed, Ray).
• Hands-on experience with GPU orchestration and auto-scaling (Karpenter, SageMaker, EKS).
• Experience with agentic AI deployment workflows, orchestration frameworks, and retrieval-augmented generation.
• Strong knowledge of security and compliance in ML and cloud environments.
Company:
FieldAI is the general-purpose brain making robots autonomous in complex, risky, real-world environments. Founded in 2023, the company is headquartered in Mission Viejo, USA, with a team of 201-500 employees. The company is currently Growth Stage.