Job Summary:
Together AI is a research-driven artificial intelligence company focused on building inference infrastructure for voice applications. They are seeking a Senior Platform Engineer to own the API and infrastructure layer for voice workloads, ensuring the reliability and performance of their voice platform for production-grade applications.
Responsibilities:
• Own the real-time API layer (WebSocket + HTTP streaming) that powers Together's voice platform.
• Design autoscaling and orchestration for voice workloads running on tens of thousands of GPUs.
• Build the developer experience — APIs, observability, and tooling — for a fast-growing product area.
• Work with production voice customers (contact centers, AI agents, communication platforms) to ship what they actually need.
• Build and harden real-time WebSocket and HTTP streaming APIs for STT and TTS — including connection lifecycle management, backpressure, error handling, and reconnection, at the reliability bar needed for production voice agents.
• Design and ship autoscaling for voice model endpoints that handles bursty, real-time traffic patterns — accounting for concurrent connection limits, streaming state, and hard latency ceilings.
• Implement voice-specific API features: word-level alignment, speaker diarization in realtime, audio format flexibility (g711/mulaw for telephony, PCM, WebRTC formats), pronunciation controls, and multi-context WebSocket support.
• Build voice-specific observability — latency breakdowns, audio quality signals, and dashboards that help both the team and customers debug issues.
• Own multi-model normalization across our model partners (Cartesia, Deepgram, Rime, and others), ensuring consistent API behavior regardless of the underlying provider.
• Collaborate with the ML engineering side of the team on the interface between the API layer and the model serving stack, ensuring latency and reliability requirements are met end-to-end.
• Contribute to developer experience — API design, documentation, integration cookbooks, playground and showcasing how best-in-class voice agents are built.
• Lay the groundwork for multiple new products down the line.
Qualifications:
Required:
• 5+ years of experience building large-scale, real-time distributed systems and API services.
• Deep expertise in real-time streaming infrastructure — WebSocket server architecture, Server-Sent Events, bidirectional streaming, connection multiplexing, and stateful protocol design.
• Expert-level programming in TypeScript and Python; experience with Rust is a plus.
• Strong distributed systems fundamentals: load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads.
• Experience with Kubernetes — including custom autoscalers, resource management, and health checking for stateful services.
• Strong product sense — you care about API ergonomics and think about what developers building voice apps actually need.
• Comfort working on a small, early-stage team where you'll wear multiple hats and move fast.
• Experience with audio or media protocols (WebRTC, g711, PCM encoding) is a strong plus.
• Familiarity with ML model serving infrastructure and how inference engines work is a plus — you'll interface with the serving layer regularly.
• Bachelor's or Master's degree in Computer Science, Computer Engineering, or related field, or equivalent practical experience.
Preferred:
• Full-stack experience (React, Next.js) is a nice-to-have for contributing to developer-facing tooling.
Company:
Together AI provides a cloud platform for developing, training, fine-tuning, and deploying generative AI models. Founded in 2022, the company is headquartered in San Francisco, USA, with a team of 201-500 employees. The company is currently Growth Stage.