1

Chronosphere Jobs (NOW HIRING)

next page

Showing results 1-20

Chronosphere information

What are the typical challenges faced by engineers working on observability platforms like Chronosphere?

Engineers working on observability platforms such as Chronosphere often encounter challenges related to handling massive volumes of real-time telemetry data and ensuring low-latency performance. Balancing scalability with cost efficiency, maintaining reliability during rapid growth, and integrating seamlessly with customers' diverse cloud-native environments are also common hurdles. Team members regularly collaborate with product, customer success, and infrastructure teams to address these issues, making strong communication and problem-solving skills especially valuable in this role.

What are the key skills and qualifications needed to thrive as a Cloud Platform Engineer at Chronosphere, and why are they important?

To thrive as a Cloud Platform Engineer at Chronosphere, you need strong expertise in cloud computing, distributed systems, and programming languages like Go or Python, often supported by a degree in computer science or related experience. Familiarity with Kubernetes, CI/CD pipelines, observability tools, and cloud providers such as AWS or GCP is typically required. Strong problem-solving, communication skills, and the ability to collaborate in fast-paced, remote teams make candidates stand out. These skills ensure the reliable delivery, scalability, and performance of Chronosphere's cloud-native observability platform.

What is a Chronosphere?

A Chronosphere typically refers to a device or concept related to time manipulation, often seen in science fiction or certain video games. In the context of gaming, particularly in the game Dota 2, Chronosphere is an ultimate ability used by the hero Faceless Void that creates a time-freezing sphere, trapping all units inside except Faceless Void. In technology, Chronosphere is also the name of a cloud-native observability platform designed to monitor and troubleshoot large-scale, distributed systems. Understanding the context is important to determine which meaning is relevant to your needs.

What is the difference between Chronosphere vs Data Engineer?

AspectChronosphereData Engineer
Required credentialsCloud monitoring, observability tools, sometimes certifications in cloud platformsData management, SQL, programming languages, certifications like Google Cloud or AWS
Work environmentCloud-based, monitoring and observability platformsData warehouses, pipelines, cloud or on-premises data systems
Employer and industry usageTech companies, cloud service providers, SaaS firmsFinance, healthcare, tech, retail, any data-driven industry

Chronosphere focuses on cloud monitoring and observability solutions, helping companies track system performance. Data Engineers build and maintain data pipelines and infrastructure. While both roles work within tech environments, Chronosphere emphasizes system monitoring, whereas Data Engineers focus on data processing and management.

More about Chronosphere jobs
What cities are hiring for Chronosphere jobs? Cities with the most Chronosphere job openings:
What states have the most Chronosphere jobs? States with the most job openings for Chronosphere jobs include:
Infographic showing various Chronosphere job openings in the United States as of June 2026, with employment types broken down into 100% Full Time. Highlights an 75% Physical, and 25% Remote job distribution.

Member of Technical Staff, Site Reliablity Engineer

Vapi

San Francisco, CA โ€ข On-site

$200K - $270K/yr

Full-time

Medical, Dental, Vision

Posted 16 days ago


Job description

Voice AI that resolves, not transfers.
Most phone systems trap callers in menus and scripts. Vapi is the platform for deploying voice agents that know your business and can listen, adapt, and resolve in minutes.
  • The numbers: 1 billion calls. 1 million developers. 10x enterprise ARR growth
  • The customers: Amazon Ring, ServiceTitan, New York Life, Intuit, Kavak, and thousands more, from YC startups to the Fortune 500
  • The news: a $50M Series B led by Peak XV Partners, with Bessemer Venture Partners, Kleiner Perkins, M12 (Microsoft's Venture Fund), Y Combinator, and our earlier backers. Total raised: $72M

Why We're Hiring This Role:
  • 99.99% call completion is the number this role drives. Vapi runs live phone calls - a p99 spike means callers drop. We've had 15 stability-gap outages worth learning from, and we need someone who runs incident command, owns SLOs and error budgets, and builds the reliability culture from scratch.
  • This is not a bash-and-YAML role. You'll ship code (Go or TypeScript) for services that monitor and manage the platform: auto-remediation, capacity forecasters, oncall tooling. Capacity planning, load testing, and KEDA-based autoscaling for Vapi's wscaler and workerpool-cron-scaler are on your plate.
What You'll Do:
  • 30 Day: Join the oncall rotation. Walk the 15 stability-gap incidents and turn the patterns into a prioritized reliability backlog. Define the first set of SLOs for the call-completion path.
  • 60 Day: Stand up error budgets and SLO-based alerting in Chronosphere/Prometheus for the highest-impact services. Run the first proper load test against provider rate limits and per-org concurrency. Tune autoscaling for wscaler / workerpool-cron-scaler.
  • 90 Day: Ship a real platform service - capacity forecaster, auto-remediation, or oncall tooling - in Go or TypeScript. Own the postmortem process. Drive a measurable improvement in p99 call completion or MTTR.
Who You Are:
Must-haves
  • You've run incident command and postmortem discipline at scale on a real oncall rotation.
  • You've operated SLOs and error budgets in Chronosphere, Prometheus, Grafana, or Datadog.
  • You've done capacity planning and load testing for production systems with real users.
  • You're fluent in Kubernetes production ops: pod crash diagnosis, HPA/VPA tuning, PodDisruptionBudgets, graceful shutdown.
  • You know backpressure and autoscaling patterns - KEDA, custom metrics scaling.

Nice-to-haves
  • You ship code, not just scripts. You can build platform services in Go or TypeScript (matches Vapi's cluster-manager, database-health, wscaler, incidentManager).
  • Real-time / latency-sensitive product background where degraded means a dropped call, not a slow dashboard.

Tech stack you'll work in
  • Languages: Go and TypeScript (you ship code, not just scripts), Bash.
  • Observability: Chronosphere, Prometheus, Grafana, Datadog, OpenTelemetry.
  • Orchestration: Kubernetes on EKS - production ops (HPA/VPA tuning, PodDisruptionBudgets, graceful shutdown, pod crash diagnosis).
  • Autoscaling and backpressure: KEDA, custom metrics scaling (matches Vapi's wscaler and workerpool-cron-scaler).
  • Load testing: script-based load testing, provider rate-limit auditing, per-org concurrency auditing.
  • Vapi services you'll touch or build: cluster-manager, database-health, wscaler, incidentManager.

Where you likely come from
  • A real-time / latency-sensitive product (Discord, Zoom, Mux, Twitch, Twilio, LiveKit, Cloudflare, a trading firm, a gaming backend), or a FAANG SRE / Production Engineer (Google, Uber, Twitter/X, Meta) who misses being hands-on.
  • Weak fit: SRE from analytics or CRM backends where "degraded" means a slow dashboard, not a dropped call. Anyone uncomfortable reading or writing code.
Why Vapi:
  • Generational impact: Build the human interface for every business
  • Ownership culture: 70% of the company are previous founders
  • Kind team: The founders, Jordan and Nikhil, are Canadians
  • Tier-1 Investors: YC, KP seed, Bessemer Series A
What We Offer:
  • Real stake: We offer a competitive salary and excellent equity ownership
  • Comprehensive health coverage: medical, dental, and vision plans
  • Team love: We love hanging out, and we do quarterly off-sites
  • Flexible time off: take what you need

More: catered meals, transportation, gym, and a $10k annual L&D budget