1

Observability Aiops Engineer Jobs (NOW HIRING)

next page

Showing results 1-20

Observability Aiops Engineer information

What are the key skills and qualifications needed to thrive as an Observability AIOps Engineer, and why are they important?

To thrive as an Observability AIOps Engineer, you need expertise in systems monitoring, data analytics, automation, and a strong understanding of IT infrastructure, often supported by a degree in computer science or a related field. Familiarity with tools like Prometheus, Grafana, ELK stack, Splunk, and AIOps platforms, as well as certifications in cloud solutions (AWS, Azure, or GCP), are typically required. Strong problem-solving skills, collaboration, and a proactive mindset help you stand out in identifying and addressing system anomalies. These skills and qualities are crucial for maintaining high system reliability, reducing downtime, and enabling data-driven decision-making in complex IT environments.

What are some common challenges faced by Observability AIOps Engineers in integrating monitoring solutions across diverse technology stacks?

Observability AIOps Engineers often encounter challenges when integrating monitoring and analytics tools across a mix of legacy systems, cloud-native applications, and various third-party platforms. Ensuring consistent data collection, normalization, and visualization can be complex due to differing protocols, data formats, and tool compatibility. Collaboration with development, operations, and security teams is crucial to address these challenges, streamline workflows, and maintain a unified observability platform. Staying current with evolving AIOps technologies and best practices is also vital for continued success in this dynamic role.

What is an Observability Aiops Engineer?

An Observability Aiops Engineer is a technology professional who focuses on implementing and managing observability tools and practices, often leveraging artificial intelligence for IT operations (AIOps). Their role is to ensure system reliability, performance, and uptime by monitoring, analyzing, and automating responses to IT incidents. They integrate data from logs, metrics, and traces to gain real-time insights, helping organizations quickly detect and resolve issues. This role combines expertise in software engineering, monitoring solutions, automation, and machine learning to improve the overall health and efficiency of IT environments.

What is the difference between Observability Aiops Engineer vs Site Reliability Engineer?

AspectObservability Aiops EngineerSite Reliability Engineer
Primary FocusMonitoring, analyzing, and improving system observability using AI and automationEnsuring system reliability, scalability, and performance of services
Skills & CertificationsKnowledge of AI/ML, monitoring tools, scripting, cloud platformsSystems engineering, scripting, cloud infrastructure, incident management
Work EnvironmentDevOps teams, monitoring platforms, AI toolsOperations, development teams, cloud environments
Industry UsageTech companies, cloud providers, organizations focusing on AI-driven monitoringLarge-scale tech firms, SaaS providers, internet services

While both roles focus on system performance and reliability, the Observability Aiops Engineer specializes in leveraging AI and automation to enhance system observability, whereas the Site Reliability Engineer concentrates on maintaining overall system stability and scalability. Both roles often collaborate but have distinct core responsibilities.

More about Observability Aiops Engineer jobs
What cities are hiring for Observability Aiops Engineer jobs? Cities with the most Observability Aiops Engineer job openings:
What states have the most Observability Aiops Engineer jobs? States with the most job openings for Observability Aiops Engineer jobs include:
What job categories do people searching Observability Aiops Engineer jobs look for? The top searched job categories for Observability Aiops Engineer jobs are:
Infographic showing various Observability Aiops Engineer job openings in the United States as of May 2026, with employment types broken down into 100% Full Time. Highlights an 74% Physical, 15% Hybrid, and 11% Remote job distribution.
Engineer/Dev with Security Clearance

Engineer/Dev with Security Clearance

Global Enterprise Services, LLC

Fort Belvoir, VA • Hybrid

Other

Posted 5 days ago


Job description

Engineer/Dev
GES is seeking a Senior AIOps Engineer to support critical mission operations within a secure environment and lead the transformation of our IT Service Management (ITSM) capabilities. This role is responsible for the design, deployment, and management of AIOps solutions that enhance the reliability and security of Department of War (DoW) networks and systems. Acting as the technical lead for this initiative, you will orchestrate integrations across existing Network Engineering, ServiceNow, and SolarWinds teams. You will utilize Splunk and the Machine Learning Toolkit (MLTK) to provide descriptive and predictive analytics and establish closed-loop automated incident response, ensuring the high availability of mission-essential infrastructure. Primary Responsibilities • Cross-Functional Leadership: Lead the AIOps platform initiative by acting as the primary technical liaison to existing Network Engineering, ServiceNow, and SolarWinds administration teams to establish unified telemetry pipelines.
ITSM Orchestration & Automation: Architect closed-loop remediation workflows by deeply integrating Splunk ITSI alerts with ServiceNow Event Management and Incident Management modules. • Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified and classified enclaves to provide real-time situational awareness. • Infrastructure Telemetry Integration: Normalize and correlate network performance and fault data from SolarWinds with server and application logs to provide a holistic view of enterprise health. • Advanced ML Development: Deploy custom machine learning models via Splunk MLTK to identify anomalous behavior, potential cyber threats, and infrastructure degradations. • Secure Data Integration: Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and tactical edge devices. • Incident Reduction: Utilize IT Service Intelligence (ITSI) to correlate multi-source events, reducing noise and prioritizing high-impact mission alerts. • Cyber Defense Support: Collaborate with the Cyber Security Service Provider (CSSP) to integrate AIOps insights into defensive cyber operations (DCO). • Compliance & Documentation: Ensure all observability tools comply with DoW STIGs and IL5/IL6 protocols; develop and maintain architectural documentation and compliance traceability. • Mission Alignment: Stay current on AIOps and related capabilities relevant to DoD, federal, and intelligence mission systems.
Required Qualifications • Security Clearance: Active Top Secret / Sensitive Compartmented Information (TS/SCI) required at time of hire. • Certification: Active IAT Level II certification (e.g., Security+ CE, CySA+, GSEC, or SSCP) required. • Citizenship: United States Citizenship is required. • Platform Experience: 7+ years of experience with Splunk Enterprise, including architectural design, cluster management, and advanced Search Processing Language (SPL). • AIOps & ITSM: 3+ years of experience implementing AIOps workflows, including integration with enterprise ITSM solutions (ServiceNow) for automated root cause analysis and remediation. • Machine Learning: Proven track record of building, testing, and tuning supervised and unsupervised models within the Splunk MLTK. • Scripting & Automation: Advanced scripting skills for developing custom search commands, API integrations, and automating remediation tasks (e.g., Python).
Leadership: Experience leading technical working groups and directing the efforts of adjacent infrastructure and development teams. • Operational Experience: Prior experience working within a DoW/DoD Operations Center (NOC/SOC) or supporting mission-critical systems and networks. • Communication: Must be able to present designs, plans, and analyses of alternatives to technical leadership boards for approvals.
Desired Qualifications • Enterprise Aggregation: Experience aggregating and correlating telemetry from diverse tools, specifically SolarWinds, ServiceNow, and VMware vCenter. • Expert Certification: Splunk Enterprise Certified Architect or Splunk ITSI Certified Admin. • Cloud Observability: Experience with Cloud Native Computing Foundation (CNCF) observability tools in secure hybrid multi-cloud environments (Azure/AWS). • RMF/ATO Knowledge: Understanding of the Risk Management Framework (RMF) and the Authorization to Operate (ATO) process for AI/ML workloads.