Job Summary:
Obvio is a company focused on preventing traffic-related deaths through innovative AI technology. They are seeking a Senior AI Infrastructure Engineer to build and optimize the core ML infrastructure layer, including orchestration, compute, and data management systems.
Responsibilities:
• Build the orchestration layer.
• Design and implement a scalable workflow system to ingest, route, and process incoming events.
• Define the stages of the pipeline — ingestion, preprocessing, inference, validation, and delivery — and build something that handles failures gracefully at high throughput.
• Scale the inference fleet.
• Build the compute layer that parallelizes processing across the event backlog and handles burst capacity as our camera fleet grows.
• Design the worker pool, queueing, and autoscaling strategy for GPU-bound workloads on ECS.
• Design the data plumbing.
• Own the path from edge device to pipeline output — storage, metadata, and the triggers that drive processing.
• Build something that is observable, debuggable, and auditable end-to-end.
• Build the model serving and lifecycle layer.
• Stand up the infrastructure that loads versioned CV models and handles inference reliably.
• Optimize for GPU utilization and throughput where it matters — dynamic batching, multi-model serving, and model optimizations like quantization or TensorRT/ONNX.
• Ensure new model versions can be promoted and rolled back without pipeline downtime.
• Set the engineering standard.
• Write the playbooks — runbooks, deployment procedures, testing standards — that the team builds on as we grow.
Qualifications:
Required:
• 6+ years building and operating production backend or data-intensive systems at scale, with meaningful experience working on ML-heavy pipelines.
• You've owned something through its full lifecycle — design, deployment, scaling, and on-call — and you've done it in a context where ML inference was a first-class part of the system.
• You've used a workflow orchestration tool to build production pipelines, not just evaluate them.
• Comfortable with the building blocks — compute, queues, storage, networking — and you think in terms of cost, reliability, and operational simplicity rather than just what works.
• You've built or operated pipelines where ML inference is a core stage, and you understand what those workloads need — throughput constraints, GPU economics, model versioning, and keeping model performance visible in production.
• You don't need to have trained the models, but you know how to run them reliably at scale.
• You don't reach for the first framework you know. You understand the problem, evaluate tradeoffs honestly, and build something that fits the actual scale and constraints.
Preferred:
• Experience with CV or video pipelines is a plus.
Company:
Obvio provides AI-powered traffic safety solutions using solar-powered monitoring cameras. Founded in 2023, the company is headquartered in San Carlos, USA, with a team of 51-200 employees. The company is currently Growth Stage.