Observability Datadog Jobs (NOW HIRING)

Cloud Engineer - Senior (Observability - Datadog)

$57 - $76.25/hr

The Cloud Engineer - Senior (Observability - Datadog) supports the SEC ISS contract by engineering, operating, and continuously improving the enterprise observability platform across hybrid cloud and ...

Leidos

Cloud Engineer - Senior (Observability - Datadog)

$57 - $76.25/hr

Datadog

Partner Solutions Architect (GSI) - AMER West

Denver, CO · On-site

$143K - $209K/yr

Observability: Datadog, Splunk, AppD, New Relic or Dynatrace * Public Cloud: AWS, Azure, GCP, or Alibaba * Containerization: Docker, Kubernetes, OpenShift, Cloud Foundry * Dev/Scripting: Python ...

Datadog

Partner Solutions Architect (GSI) - AMER West

Denver, CO · On-site

$143K - $209K/yr

Datadog

Partner Solutions Architect (GSI) - AMER East

Boston, MA · On-site

$143K - $209K/yr

Datadog

Partner Solutions Architect (GSI) - AMER East

Boston, MA · On-site

$143K - $209K/yr

Ova Technologies

Datadog Tester / Observability QA Engineer

Alpharetta, GA · On-site

Job Title Datadog Tester / Observability QA Engineer Job Summary We are looking for a Datadog Tester responsible for validating monitoring, logging, tracing, and alerting implementations using ...

Ova Technologies

Datadog Tester / Observability QA Engineer

Alpharetta, GA · On-site

Datadog

Partner Solutions Architect (GSI) - AMER West

Denver, CO · Hybrid

$64.75 - $85.50/hr

Datadog

Partner Solutions Architect (GSI) - AMER West

Denver, CO · Hybrid

$64.75 - $85.50/hr

Datadog

Staff Software Engineer - ML Observability

Boston, MA · On-site

$234K - $300K/yr

The ML Observability team builds cutting-edge tools to monitor, explain, and improve AI systems in ... At Datadog, we place value in our office culture - the relationships and collaboration it builds ...

Datadog

Staff Software Engineer - ML Observability

Boston, MA · On-site

$234K - $300K/yr

Datadog

Partner Solutions Architect (GSI) - AMER East

Boston, NY · Hybrid

$59.50 - $78.50/hr

Datadog

Partner Solutions Architect (GSI) - AMER East

Boston, NY · Hybrid

$59.50 - $78.50/hr

Datadog

Senior Product Solutions Architect - LLM Observability

Boston, MA · On-site

$167K - $244K/yr

Datadog's LLM Observability product enables organizations to monitor, troubleshoot, and optimize large-scale LLM-powered applications with confidence, while meeting requirements around data privacy ...

Datadog

Senior Product Solutions Architect - LLM Observability

Boston, MA · On-site

$167K - $244K/yr

MM International

AWS DevOps Engineer

Plano, TX · On-site

$50.50 - $69.25/hr

Observability - Datadog, Splunk * ALM/Documentation - JIRA, Confluence * Containerizations - Kubernetes, EKS

Quick apply

MM International

AWS DevOps Engineer

Plano, TX · On-site

$50.50 - $69.25/hr

Observability - Datadog, Splunk * ALM/Documentation - JIRA, Confluence * Containerizations - Kubernetes, EKS

Datadog

Senior Software Engineer - Observability Visibility

New York, NY · On-site

$134K - $176K/yr

Observability and Resilience Enablement focuses on closing the loop between how Datadog engineers detect and respond to issues and incidents and how those learnings translate into measurable risk ...

Datadog

Senior Software Engineer - Observability Visibility

New York, NY · On-site

$134K - $176K/yr

Datadog

Senior Software Engineer - Observability Visibility

New York, NY · Hybrid

$134K - $176K/yr

Datadog

Senior Software Engineer - Observability Visibility

New York, NY · Hybrid

$134K - $176K/yr

Datadog

AI Research Scientist - Datadog AI Research (DAIR)

New York, NY · On-site

Trained Agents for Observability -- Post-training models to operate autonomously across Datadog's domain. SRE incident response is our first target, with a clear path to code repair, security ...

Datadog

AI Research Scientist - Datadog AI Research (DAIR)

New York, NY · On-site

Datadog

AI Research Engineer - Datadog AI Research (DAIR)

New York, NY

... model), Datadog AI Research tackles high-risk, high-reward problems grounded in real-world ... World Models for Observability -- Training multimodal foundation models that learn the joint ...

Datadog

AI Research Engineer - Datadog AI Research (DAIR)

New York, NY

Vkore Solutions

Datadog Consultant

San Ramon, CA · On-site

San Ramon, CA FACE TO FACE INTERVIEW Datadog - Key Responsibilities Design, implement, and manage observability solutions using Datadog (Logs, Metrics, APM, RUM, Synthetics, etc.) Develop real-time ...

Quick apply

Vkore Solutions

Datadog Consultant

San Ramon, CA · On-site

Datadog

AI Research Scientist - Datadog AI Research (DAIR)

New York, NY

Datadog

AI Research Scientist - Datadog AI Research (DAIR)

New York, NY

Datadog

AI Research Engineer - Datadog AI Research (DAIR)

New York, NY · On-site

Datadog

AI Research Engineer - Datadog AI Research (DAIR)

New York, NY · On-site

Datadog

Datadog for Startups Founding Engineering Lead

San Francisco, CA · On-site

$205K - $240K/yr

The Datadog for Startups (DDFS) program helps the next generation of fast-scaling companies adopt best-in-class observability and security from day one. We're looking for the technical engine of this ...

Datadog

Datadog for Startups Founding Engineering Lead

San Francisco, CA · On-site

$205K - $240K/yr

Datadog

Staff Software Engineer - ML Observability

Boston, NY · Hybrid

Datadog

Staff Software Engineer - ML Observability

Boston, NY · Hybrid

Datadog

Staff Software Engineer - Logs Observability Pipelines

New York, NY · On-site

$234K - $300K/yr

Datadog is seeking a Staff Software Engineer to help shape the future of our Bring Your Own Cloud (BYOC) Logs offering by unifying observability pipelines with log management software that customers ...

Datadog

Staff Software Engineer - Logs Observability Pipelines

New York, NY · On-site

$234K - $300K/yr

Datadog

Datadog for Startups - Forward Deployed Engineering Lead

New York, NY · On-site

$153K - $188K/yr

Build out a catalog of FDE offerings spanning observability quickstarts, custom integration ... Datadog experience is a strong plus. * Communicate exceptionally across audiences from engineering ...

Datadog

Datadog for Startups - Forward Deployed Engineering Lead

New York, NY · On-site

$153K - $188K/yr

Showing results 1-20

Observability Datadog Jobs

Observability Datadog information

See salary details

$11

$17

$23

How much do observability datadog jobs pay per hour?

As of Jun 29, 2026, the average hourly pay for observability datadog in the United States is $17.34, according to ZipRecruiter salary data. Most workers in this role earn between $16.35 and $18.03 per hour, depending on experience, location, and employer.

Is it hard to get hired at Datadog?

Getting hired for an Observability Datadog role typically requires relevant technical skills such as experience with monitoring tools, cloud platforms, and scripting. The hiring process often involves technical interviews, coding assessments, and demonstrating knowledge of observability concepts, making it competitive but achievable with proper preparation.

Who is Datadog's biggest competitor?

For a role related to Observability at Datadog, its biggest competitors include companies like New Relic, Splunk, and Dynatrace, which offer similar monitoring and analytics solutions. These competitors provide cloud-based observability tools used by IT and DevOps teams to monitor infrastructure, applications, and performance metrics.

What are the key skills and qualifications needed to thrive as an Observability Engineer specializing in Datadog, and why are they important?

To excel as an Observability Engineer with a focus on Datadog, you need a strong background in IT operations, cloud infrastructure, and monitoring concepts, often supported by relevant degrees or certifications. Familiarity with Datadog's platform, scripting languages (like Python or Bash), and integrations with cloud services (AWS, Azure, GCP) is typically required. Analytical thinking, proactive problem-solving, and the ability to collaborate across teams are vital soft skills in this role. These skills ensure effective system monitoring, rapid incident response, and ongoing performance optimization in complex environments.

Does Datadog do observability?

Datadog is a platform that provides observability solutions, including monitoring, tracing, and logging for cloud infrastructure and applications. As an Observability Datadog professional, understanding these tools and how to implement them is essential for ensuring system performance and reliability.

What are Observability Datadog roles?

Observability Datadog roles typically refer to professionals who implement, manage, and optimize observability practices using the Datadog platform. These specialists focus on monitoring application performance, infrastructure health, and ensuring real-time visibility into system operations. They configure dashboards, set alerts, and analyze logs, traces, and metrics to detect and resolve issues quickly. Their work helps organizations maintain system reliability, optimize performance, and improve incident response.

How does an Observability Datadog specialist typically collaborate with development and operations teams?

An Observability Datadog specialist works closely with both development and operations teams to ensure that applications and infrastructure are properly monitored. They often participate in sprint planning and incident response meetings, helping teams define meaningful metrics, set up dashboards, and configure alerting policies. Collaboration also includes training team members on best practices for using Datadog and troubleshooting monitoring issues together. This cross-functional role ensures that all stakeholders have visibility into system health and can respond quickly to performance or reliability concerns.

What is the average salary at Datadog?

The average salary for an Observability Datadog role varies depending on experience and location but typically ranges from $100,000 to $140,000 annually. Salaries for related positions such as cloud engineers or monitoring specialists may also include bonuses and stock options. Candidates with skills in cloud monitoring, DevOps, and experience with Datadog tools tend to earn higher compensation.

What is the difference between Observability Datadog vs Cloud Engineer?

Aspect	Observability Datadog	Cloud Engineer
Primary Focus	Monitoring, analytics, and visualization of system performance	Designing, implementing, and managing cloud infrastructure
Required Skills	Monitoring tools, scripting, data analysis	Cloud platforms, scripting, infrastructure as code
Certifications	Datadog certifications, cloud provider certifications	AWS, Azure, or GCP certifications
Work Environment	IT operations, DevOps teams	Cloud infrastructure teams, DevOps

While both roles involve cloud technologies, Observability Datadog specialists focus on monitoring and analyzing system performance, whereas Cloud Engineers design and maintain cloud infrastructure. Understanding these differences helps organizations assign the right skills to each role.

Infographic showing various Observability Datadog job openings in the United States as of June 2026, with employment types broken down into 97% Full Time, and 3% Contract. Highlights an 78% Physical, 6% Hybrid, and 16% Remote job distribution, with an average salary of $36,065 per year, or $17.3 per hour.

Cloud Engineer - Senior (Observability - Datadog)

Leidos

Remote

Apply

$57 - $76.25/hr

Full-time

Posted 9 days ago

Key responsibilities

Engineer, operate, and continuously improve the enterprise observability platform across hybrid cloud and containerized environments.
Build, tune, and maintain dashboards, monitors, SLOs/SLIs, and alerting policies to produce actionable signals and minimize alert noise.
Lead data-driven investigation and resolution of complex performance, latency, saturation, and reliability issues using observability telemetry and tools.

Leidos rating

8.4

Based on 147 frontline employees who took The Breakroom Quiz

56th of 430 rated business services

Job description

The Cloud Engineer - Senior (Observability - Datadog) supports the SEC ISS contract by engineering, operating, and continuously improving the enterprise observability platform across hybrid cloud and containerized environments. This role is hands-on: instruments services with distributed tracing, code-level profiling, and custom metrics; builds and tunes Datadog (or comparable) dashboards, alerts, APM, log pipelines, RUM, and synthetic monitors; then uses that telemetry to solve production performance, reliability, and capacity problems. The engineer partners with cloud, platform, and application teams to embed observability into Azure, AWS, and container platforms (OpenShift/Kubernetes), and drives reduction of alert noise, mean time to detect (MTTD), and mean time to resolve (MTTR). This position provides senior technical leadership for APM/distributed tracing strategy, SLO/SLI engineering, and data-driven operational decision-making in a 24x7x365 operating environment.
"STRONG DATADOG EXPERIENCE NEEDED"
PRIMARY RESPONSIBILITIES
Observability Platform Engineering
- Engineer and operate the enterprise observability stack (Datadog or comparable), including metrics, logs, traces, APM, RUM, synthetic monitoring, and network performance monitoring.
- Build, tune, and maintain dashboards, monitors, SLOs/SLIs, and alerting policies that produce actionable signal and minimize noise.
- Instrument services, infrastructure, and containerized workloads using agents, OpenTelemetry, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) with consistent span tagging, W3C TraceContext propagation, and unified service tagging across the estate.
- Develop and maintain integrations between observability platforms, ITSM (ServiceNow), CI/CD pipelines, and on-call/paging workflows.
- Define and enforce a unified tagging standard (environment, service, version, team/ownership, data classification, cost center) across metrics, logs, and traces; manage tag cardinality, governance, and custom business tags to keep telemetry queryable, attributable, and cost-controlled.
Cloud and Container Monitoring Engineering
- Design and deliver monitoring coverage for Microsoft Azure and AWS workloads, including PaaS services, serverless, networking, identity, managed databases, and cloud-native data services.
- Engineer managed database observability across AWS RDS/Aurora (MySQL, PostgreSQL, SQL Server, Oracle), Azure SQL/PostgreSQL/MySQL, and NoSQL/cache services (DynamoDB, Cosmos DB, ElastiCache/Redis), including query-level performance analytics, slow-query and execution-plan capture, lock/deadlock/wait analysis, connection pool and session monitoring, replication lag, storage/IOPS saturation, and backup/HA health -- correlating database spans with upstream APM traces.
- Engineer container-platform observability for OpenShift/Kubernetes, covering cluster health, control plane, nodes, pods, namespaces, ingress, service mesh, and workload APM.
- Build standardized, reusable monitoring modules deployable via infrastructure-as-code (Terraform, Bicep, ARM) and CI/CD.
- Support hybrid visibility across on-premises, cloud, and containerized workloads with correlated telemetry.
Performance Engineering and Problem Solving
- Lead data-driven investigation and resolution of complex performance, latency, saturation, and reliability issues across the estate.
- Use APM distributed traces, service/dependency maps, continuous code profiling (CPU, memory, lock contention), database query analytics, exception/error tracking, and RUM-to-backend trace correlation to isolate bottlenecks in applications, platforms, middleware, and downstream dependencies.
- Partner with engineering teams to define and implement remediation, tuning, and architectural improvements based on telemetry evidence.
- Define and implement trace-based SLOs, deployment tracking, and change-correlation workflows so performance regressions are detected and attributed to specific releases, versions, or configuration changes.
- Provide senior technical leadership during major incidents, delivering impact analysis, contributing to root-cause analysis, and owning post-incident observability gaps.
Capacity, Reliability, and Continuous Improvement
- Analyze operational telemetry and trend data to identify capacity risks, recurring constraints, and opportunities for efficiency.
- Build and maintain capacity and performance dashboards and reports that communicate posture, risk, and recommendations to technical and leadership stakeholders.
- Define capacity thresholds, alert baselines, and trigger points for scaling, technology refresh, and resource reallocation.
- Drive continuous improvement of observability coverage, alert quality, runbook linkage, and operational maturity aligned to SEC SLA/KPI expectations.
REQUIRED QUALIFICATIONS
Citizenship/Work Authorization: Must meet contract requirements.
Clearance: Ability to obtain and maintain SEC Public Trust (or higher if required).
EXPERIENCE
- Minimum 8 years of experience in IT infrastructure or platform engineering roles, including 5+ years focused on observability, performance engineering, or site reliability engineering.
- Demonstrated experience engineering and operating an enterprise observability platform (Datadog strongly preferred; equivalent experience with Dynatrace, New Relic, Splunk Observability, or Grafana/Prometheus stacks considered).
- Proven experience building APM and distributed tracing coverage for production multi-tier applications -- including language-specific tracer deployment, custom instrumentation of business transactions, service/dependency mapping, continuous profiling, and RUM-to-backend trace correlation -- across cloud and containerized workloads.
- Proven experience leading complex production performance and reliability problem-solving from telemetry to remediation.
- Hands-on experience monitoring Kubernetes or OpenShift clusters and containerized workloads in production.
TECHNICAL SKILLS
- Enterprise observability platforms (Datadog or comparable): metrics, logs, traces, APM, RUM, synthetic, NPM
- Instrumentation with OpenTelemetry, Datadog agents/SDKs, and language-specific APM tracers (Java, .NET, Python, Node.js, Go) including custom spans, trace sampling strategies, W3C TraceContext propagation, and continuous profiling
- Microsoft Azure and AWS monitoring services and integrations (Azure Monitor, Log Analytics, CloudWatch, AWS X-Ray)
- Container and Kubernetes/OpenShift observability, including cluster, workload, and service mesh telemetry
- Cloud database monitoring: AWS RDS/Aurora (including Performance Insights), Azure SQL/PostgreSQL/MySQL (Query Performance Insight), and NoSQL/cache (DynamoDB, Cosmos DB, ElastiCache/Redis); query-level performance tuning, execution-plan analysis, and Datadog DBM or equivalent deep database APM
- Infrastructure-as-code for monitoring (Terraform, Bicep, ARM) and CI/CD-driven monitor/dashboard deployment
- APM and distributed tracing: service/dependency maps, trace analytics, RUM-to-backend correlation, exception/error tracking, deployment tracking, and trace-based SLOs
- Unified tagging strategy and cardinality governance across metrics/logs/traces (environment, service, version, ownership, data classification, cost center), including custom tag enrichment and tag-driven access/cost controls
- Alert engineering, SLO/SLI design, error budget management, and alert-noise reduction
- Performance engineering, capacity analysis, and telemetry-driven root-cause analysis
- Integration of observability with ITSM (ServiceNow) and on-call/paging workflows
PREFERRED QUALIFICATIONS
- Experience supporting federal agency IT environments under FISMA/FedRAMP/NIST-aligned security and compliance requirements.
- Datadog certification (Fundamentals and/or Administrator) or comparable enterprise observability certification.
- Hands-on experience with Red Hat OpenShift Virtualization (CNV/KubeVirt) or other KubeVirt-based container virtualization observability.
- Experience with eBPF-based observability tooling and service mesh telemetry (Istio, Linkerd).
- Experience implementing SLOs and error budgets at enterprise scale and integrating them into operational governance.
- Experience with cost-aware observability practices, including telemetry volume optimization and retention tuning.
- Experience integrating observability outputs with executive reporting, SLA/KLI dashboards, and capacity forecasting.
- ITIL 4 Foundation
- AWS Certified Solutions Architect - Associate (or higher)
- Microsoft Certified: Azure Administrator Associate (or higher)
- Red Hat Certified Specialist in OpenShift Administration (or equivalent)
- HashiCorp Terraform Associate
WORK ENVIRONMENT / OTHER
Operational Support: Supports a 24x7x365 operating environment; participates in a defined on-call rotation and may require surge support based on operational needs.
Location: Telework
Travel: As required per contract direction.
EDUCATION & EXPERIENCE
BS and 4 - 8 years of prior relevant experience or Masters with 2 - 6 years of prior relevant experience. Preferred degree in a relevant field (e.g., Information Technology, Computer Science, Engineering).
If you're looking for comfort, keep scrolling. At Leidos, we outthink, outbuild, and outpace the status quo - because the mission demands it. We're not hiring followers. We're recruiting the ones who disrupt, provoke, and refuse to fail. Step 10 is ancient history. We're already at step 30 - and moving faster than anyone else dares.
Original Posting:
May 19, 2026
For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.
Pay Range:
Pay Range $87,100.00 - $157,450.00
The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

What Leidos employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom

About Leidos

Sourced by ZipRecruiter

At Leidos, we deliver innovative solutions through the efforts of our diverse and talented people who are dedicated to our customers' success. We empower our teams, contribute to our communities, and operate sustainable practices. Everything we do is built on a commitment to do the right thing for our customers, our people, and our community.

Industry

It services

Company size

10,000+ Employees

Headquarters location

Reston, VA, US

Website

leidos.com

Social media

View All Leidos Jobs

Apply

Observability Datadog Jobs (NOW HIRING)

Cloud Engineer - Senior (Observability - Datadog)

Cloud Engineer - Senior (Observability - Datadog)

Partner Solutions Architect (GSI) - AMER West

Partner Solutions Architect (GSI) - AMER West

Partner Solutions Architect (GSI) - AMER East

Partner Solutions Architect (GSI) - AMER East

Datadog Tester / Observability QA Engineer

Datadog Tester / Observability QA Engineer

Partner Solutions Architect (GSI) - AMER West

Partner Solutions Architect (GSI) - AMER West

Staff Software Engineer - ML Observability

Staff Software Engineer - ML Observability

Partner Solutions Architect (GSI) - AMER East

Partner Solutions Architect (GSI) - AMER East

Senior Product Solutions Architect - LLM Observability

Senior Product Solutions Architect - LLM Observability

AWS DevOps Engineer

AWS DevOps Engineer

Senior Software Engineer - Observability Visibility

Senior Software Engineer - Observability Visibility

Senior Software Engineer - Observability Visibility

Senior Software Engineer - Observability Visibility

AI Research Scientist - Datadog AI Research (DAIR)

AI Research Scientist - Datadog AI Research (DAIR)

AI Research Engineer - Datadog AI Research (DAIR)

AI Research Engineer - Datadog AI Research (DAIR)

Datadog Consultant

Datadog Consultant

AI Research Scientist - Datadog AI Research (DAIR)

AI Research Scientist - Datadog AI Research (DAIR)

AI Research Engineer - Datadog AI Research (DAIR)

AI Research Engineer - Datadog AI Research (DAIR)

Datadog for Startups Founding Engineering Lead

Datadog for Startups Founding Engineering Lead

Staff Software Engineer - ML Observability

Staff Software Engineer - ML Observability

Staff Software Engineer - Logs Observability Pipelines

Staff Software Engineer - Logs Observability Pipelines

Datadog for Startups - Forward Deployed Engineering Lead

Datadog for Startups - Forward Deployed Engineering Lead

Observability Datadog information

See salary details

How much do observability datadog jobs pay per hour?

Cloud Engineer - Senior (Observability - Datadog)

Share this job

Key responsibilities

Leidos rating

Get the real story on frontline employers

Job description

What Leidos employees say

Get the real story on frontline employers

Pay

Only some people get paid breaks

Only some people get paid when they’re sick

The job rarely spills into unpaid time

Benefits

Only some people get separate paid time off for sick days and vacation

Most people say they can afford the health insurance

Most people get paid time off

Hours and flexibility

Less than 4 weeks notice of work schedule

Most people don’t worry about their hours

Only some people can choose their shifts

Workplace

Most people feel treated with respect

Most people get breaks without interruption

Some people are stressed out

About Leidos

Industry

Company size

Headquarters location

Website

Social media

Share this job