Applied AI Lab Job
Compensation: Competitive base salary and meaningful equity
Benefits: Health & dental insurance, gym reimbursement, daily team meals, commuter benefits
We're an applied AI lab building coding agents. Julius executes ~1M lines of code every 36 hours for 1M+ users and has generated 3M+ visualizations. All code runs in code sandboxes (isolated remote containers) that we manage. We're revenue-generating and backed by AI Grant, YCombinator, Bessemer Venture Partners and the founders from Vercel, Notion, Perplexity, Palantir, Replit, Zapier, Intercom, and Dropbox.
The Role
Build and scale the code-execution sandboxes that power Julius across cloud environments (AWS and GCP). We orchestrate 500k+ containers/month and growing. You'll own reliability, performance, and security for multi-tenant compute.
What You'll Do
- Design and operate secure, multi-tenant container infrastructure with fast startup and smart autoscaling.
- Ship cloud deployments (Helm/Terraform) with SSO, network controls, and audit logging.
- Drive observability (metrics, traces, logs) with clear SLOs; lead incident response.
- Optimize images, scheduling, networking, and cost; build fair-use and rate-limiting controls.
What You Bring
- Production Kubernetes and container internals (Docker/containerd); strong networking fundamentals.
- Cloud (AWS/GCP/Azure) and IaC (Terraform/Helm).
- Monitoring/Logging (Prometheus, Grafana, OpenTelemetry, ELK/Vector).
- Security best practices for containerized, multi-tenant systems.
Nice to Have
- gVisor/Kata/Firecracker; Cilium/eBPF; GPU scheduling; serverless autoscaling (KEDA/Knative/Karpenter).
- You've built an AI side project and enjoy tinkering with LLMs.
Why Julius
Small, senior team; massive impact surface; hard infra problems at meaningful scale.