1

Observability Aiops Engineer Jobs (NOW HIRING)

Systems Engineer

Fort Belvoir, VA · Hybrid

$190K - $218K/yr

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Senior Site Reliability Engineer, AIOPs

Santa Clara, CA · On-site

$67 - $89/hr

Proven ownership of reliability for an observability/AIOps platform: SLOs/SLIs, on-call, addressing ... Proven programming experience building automation tools or services - ideally in Python, or similar ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

next page

Showing results 1-20

Observability Aiops Engineer information

What are some common challenges faced by Observability AIOps Engineers in integrating monitoring solutions across diverse technology stacks?

Observability AIOps Engineers often encounter challenges when integrating monitoring and analytics tools across a mix of legacy systems, cloud-native applications, and various third-party platforms. Ensuring consistent data collection, normalization, and visualization can be complex due to differing protocols, data formats, and tool compatibility. Collaboration with development, operations, and security teams is crucial to address these challenges, streamline workflows, and maintain a unified observability platform. Staying current with evolving AIOps technologies and best practices is also vital for continued success in this dynamic role.

What is an Observability Aiops Engineer?

An Observability Aiops Engineer is a technology professional who focuses on implementing and managing observability tools and practices, often leveraging artificial intelligence for IT operations (AIOps). Their role is to ensure system reliability, performance, and uptime by monitoring, analyzing, and automating responses to IT incidents. They integrate data from logs, metrics, and traces to gain real-time insights, helping organizations quickly detect and resolve issues. This role combines expertise in software engineering, monitoring solutions, automation, and machine learning to improve the overall health and efficiency of IT environments.

What are the key skills and qualifications needed to thrive as an Observability AIOps Engineer, and why are they important?

To thrive as an Observability AIOps Engineer, you need expertise in systems monitoring, data analytics, automation, and a strong understanding of IT infrastructure, often supported by a degree in computer science or a related field. Familiarity with tools like Prometheus, Grafana, ELK stack, Splunk, and AIOps platforms, as well as certifications in cloud solutions (AWS, Azure, or GCP), are typically required. Strong problem-solving skills, collaboration, and a proactive mindset help you stand out in identifying and addressing system anomalies. These skills and qualities are crucial for maintaining high system reliability, reducing downtime, and enabling data-driven decision-making in complex IT environments.

What is the difference between Observability Aiops Engineer vs Site Reliability Engineer?

AspectObservability Aiops EngineerSite Reliability Engineer
Primary FocusMonitoring, analyzing, and improving system observability using AI and automationEnsuring system reliability, scalability, and performance of services
Skills & CertificationsKnowledge of AI/ML, monitoring tools, scripting, cloud platformsSystems engineering, scripting, cloud infrastructure, incident management
Work EnvironmentDevOps teams, monitoring platforms, AI toolsOperations, development teams, cloud environments
Industry UsageTech companies, cloud providers, organizations focusing on AI-driven monitoringLarge-scale tech firms, SaaS providers, internet services

While both roles focus on system performance and reliability, the Observability Aiops Engineer specializes in leveraging AI and automation to enhance system observability, whereas the Site Reliability Engineer concentrates on maintaining overall system stability and scalability. Both roles often collaborate but have distinct core responsibilities.

More about Observability Aiops Engineer jobs
What cities are hiring for Observability Aiops Engineer jobs? Cities with the most Observability Aiops Engineer job openings:
What states have the most Observability Aiops Engineer jobs? States with the most job openings for Observability Aiops Engineer jobs include:

Senior AI Ops Engineer

TDI (Tetrad Digital Integrity)

Fort Belvoir, VA • On-site

$118K - $162K/yr

Full-time

Posted 4 days ago


Job description

Job Summary:
Tetrad Digital Integrity (TDI) is a leading-edge cybersecurity firm with a mission to safeguard and protect our customers from increasing threats and vulnerabilities in this digital age. TDI is seeking a Senior AIOps Engineer to lead ITSM transformation efforts within a secure mission environment, orchestrating integrations across various teams and utilizing advanced analytics to ensure high availability of mission-essential infrastructure.
Responsibilities:
• Lead AIOps platform integration efforts across Network Engineering, ServiceNow, and SolarWinds teams to establish unified observability and telemetry capabilities.
• Architect and maintain Splunk AIOps and ITSI solutions across classified and unclassified environments, delivering real-time situational awareness, event correlation, and automated incident remediation through ServiceNow integration.
• Develop and deploy advanced analytics and machine learning models using Splunk MLTK to detect anomalies, identify cyber threats, predict infrastructure issues, and reduce alert fatigue.
• Engineer secure telemetry ingestion and correlation pipelines from enterprise infrastructure, cross-domain solutions, and tactical edge systems to provide a comprehensive view of operational health.
• Support defensive cyber operations by integrating AIOps insights into security workflows, while ensuring compliance with DoD STIGs, IL5/IL6 requirements, and maintaining technical and architectural documentation.
Qualifications:
Required:
• Active TS/SCI security clearance
• Candidates must possess DoD IAT Level II certification (e.g., Security+ CE, CySA+, GSEC, or SSCP)
• Bachelor's degree and 7+ years of Splunk Enterprise experience, including architecture, cluster administration, and advanced SPL development.
• 3+ years of experience implementing AIOps workflows and integrating Splunk with ServiceNow or other enterprise ITSM platforms.
• Experience building, tuning, and deploying machine learning models using Splunk MLTK.
• Strong scripting and automation skills, including Python, API integrations, custom search commands, and automated remediation solutions.
• Must be able to present designs, plans, and analyses of alternatives to technical leadership boards for approvals.
Preferred:
• Splunk Enterprise Certified Architect or Splunk ITSI Certified Admin.
• Experience with Cloud Native Computing Foundation (CNCF) observability tools in secure hybrid multi-cloud environments (Azure/AWS).
Company:
For over 20 years, TDI’s one and only passion has been delivering cybersecurity solutions to effectively manage the business of cyber. Founded in 2001, the company is headquartered in Washington, USA, with a team of 51-200 employees. The company is currently Growth Stage.