About TroveoTroveo is building the next-generation data platform to train AI video models. Troveo offers the world's largest library of AI video training data, featuring millions of hours of licensed video content. Our end-to-end data pipeline connects creators, rights holders, and AI research labs, enabling scalable, compliant, and innovative uses of video across for AI application and model development.
We are an early-stage, high-growth venture backed by forward-thinking investors, and we are seeking an innovative strategic engineer to help us scale.
Role OverviewThe Senior Machine Learning Engineer will play a central role in designing, building, and optimizing large-scale machine learning pipelines for AI video model training. You'll work across the full ML lifecycle, from structuring massive datasets to deploying, evaluating, and training models in production.
This is a hands-on, high-impact role for an engineer who thrives on scale, autonomy, and cross-functional collaboration. You will combine deep technical expertise with strong communication and business acumen, translating models into measurable costs, performance targets, and real-world outcomes.
Key ResponsibilitiesData Curation & Indexing Pipelines- Architect and implement large-scale pipelines for video ingestion, metadata extraction, and indexing using vector databases and embedding models to enable fast, semantic retrieval.
- Design annotation workflows integrating active learning, weak supervision, and human-in-the-loop systems to curate high-quality labeled datasets for video models.
- Contribute to optimizing data partitioning, sharding, and caching strategies to handle petabyte-scale video corpora, ensuring low-latency search and robust data lineage.
Model Training & Evaluation- Develop and fine-tune multimodal models (e.g., CLIP variants, transformer-based encoders) for video embeddings, scene segmentation, and relevance ranking using PyTorch and Hugging Face.
- Build evaluation frameworks with metrics like NDCG, mAP, and annotation consistency scores to iteratively improve search accuracy and annotation efficiency.
- Deploy models via containerized services with A/B testing and monitoring for drift detection in production search and annotation pipelines.
- Collaborate with Product and Operations teams to translate ML performance into business insights and cost implications.
Infrastructure & Optimization- Scale ML infrastructure on AWS, leveraging multi-GPU clusters and distributed training to accelerate embedding computation and indexing jobs.
- Implement testing and deployment processes across large distributed systems. Fine-tune OSS models. Working knowledge in training large models is a plus.
- Implement automated CI/CD for model versioning, hyperparameter tuning, and resource orchestration to minimize compute costs and maximize GPU utilization.
- Profile and tune systems for bottlenecks in vector similarity search, batch annotation, and real-time querying.
Cross-Functional Collaboration- Partner with product, research, and data teams to align ML outputs with business KPIs, such as search latency targets and annotation throughput.
- Translate technical trade-offs (e.g., recall vs. precision in embeddings) into actionable insights for stakeholders, fostering adoption in video discovery features.
- Work closely with data engineers, research scientists, and product teams to align model performance with strategic business goals.
- Communicate technical concepts clearly to both technical and non-technical stakeholders.
- Take ownership of project outcomes in a fast-paced, startup environment.
Qualifications & Experience- 6+ years in ML engineering, with a focus on information retrieval, embedding systems, or data annotation pipelines.
- Proven track record building scalable indexing and search infrastructure, including vector stores and similarity search algorithms.
- Expertise in Python and PyTorch for core model development; hands-on experience with Hugging Face Transformers for multimodal embeddings and fine-tuning.
- Working experience with video, computer vision, and multi-modal LLMs.
- Hands-on experience deploying models in production environments and measuring model accuracy.
Proficiency in ML ops tools (e.g., MLflow, Weights & Biases) for experimentation, versioning, and deployment. - Hands-on experience with production ML deployment, evaluation metrics for retrieval/annotation tasks, and cost-optimized scaling on cloud platforms like AWS.
- Strong analytical skills for dissecting performance in large distributed systems; familiarity with multi-GPU training and vector databases preferred.
- Excellent communication to bridge technical depth with strategic priorities in collaborative settings.
Nice to Have- Prior experience training video models or working with video-based datasets.
- Demonstrated expertise in GPU optimization and large-scale compute performance tuning.
- A blend of startup agility and big tech rigor.
- Contributions to open source development and projects
- Experience working with search ranking algorithms.
Location & Compensation- Location: Strong preference for candidates based in the San Francisco Bay Area.
- Compensation: $200,000 - $400,000 base salary + equity.
Why Join Troveo?- Work at the cutting edge of AI, video, and large-scale data infrastructure.
- Build systems that directly power the next generation of AI video models.
- Collaborate with a world-class team of engineers, researchers, and industry experts.
- High autonomy, high impact, your work will shape the foundation of our platform.
- Competitive compensation with meaningful equity upside.