1

Observability Aiops Engineer Jobs (NOW HIRING)

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

AIOps Engineer

Fort Belvoir, VA · On-site

$190K - $218K/yr

Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified ... Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and ...

... AIOps Engineer to support mission-critical operations within a highly secure environment and drive ... Ensure observability platforms comply with applicable STIGs and IL5/IL6 security requirements while ...

Lead AIOps Engineer -AZURE

Fremont, CA

$112K - $147K/yr

Ideal Candidate Profile The ideal candidate is a Lead AIOps Engineer with deep Azure expertise and ... Experience with monitoring, observability, alert management, event correlation, and operational ...

... AIOps Engineer to support mission-critical operations within a highly secure environment and drive ... Ensure observability platforms comply with applicable STIGs and IL5/IL6 security requirements while ...

DevOps Engineer (AIOps)

Tempe, AZ · On-site

$50.50 - $69.25/hr

The AIOps Engineer is a technical operational engineering role responsible for improving DevOps, ... Improve observability and operational visibility through dashboards, monitoring integrations ...

DevOps Engineer (AIOps)

Hanover, MD · On-site

$52 - $71.25/hr

Job Summary: The AIOps Engineer is a technical operational engineering role responsible for ... Improve observability and operational visibility through dashboards, monitoring integrations ...

DevOps Engineer (AIOps)

Jacksonville, FL · On-site

$49 - $67/hr

The AIOps Engineer is a technical operational engineering role responsible for improving DevOps, ... Improve observability and operational visibility through dashboards, monitoring integrations ...

DevOps Engineer (AIOps)

Hanover, MD · On-site

$52 - $71.25/hr

The AIOps Engineer is a technical operational engineering role responsible for improving DevOps, ... Improve observability and operational visibility through dashboards, monitoring integrations ...

Lead AIOps Engineer

Fremont, CA

$112K - $147K/yr

Lead AIOps Engineer Location: Fremont, CA (Hybrid Onsite) Duration: 12+ Months Role Summary We are ... Experience working with monitoring, observability, scripting, and release validation * Must have ...

AI Operations AIOps Engineer

Fremont, CA

$76K - $102K/yr

... relevant AIOps experience is a key requirement with some experience in SRE. * Total 12+ years of ... Experience working with monitoring, observability, scripting, and release validation * Must have ...

DevOps Engineer (AIOps)

Hanover, MD · On-site

$52 - $71.25/hr

The AIOps Engineer is a technical operational engineering role responsible for improving DevOps, ... Improve observability and operational visibility through dashboards, monitoring integrations ...

SRE AIops Engineer (AWS)

Irvine, CA

$61.25 - $81.25/hr

Senior SRE AIOps Engineer (AWS) Experience Level: 5-8 Years Location: USA Primary Environment ... Hands-on experience with observability and monitoring tools: * Amazon CloudWatch (Logs Insights ...

Senior SRE AIOps Engineer (AWS)

Irvine, CA

$61.25 - $81.25/hr

Role Title: Senior SRE AIOps Engineer (AWS) Experience Level: 5-8 Years Location: Irvine, CA ... Hands-on experience with observability and monitoring tools: * Amazon CloudWatch (Logs Insights ...

next page

Showing results 1-20

Observability Aiops Engineer information

What are some common challenges faced by Observability AIOps Engineers in integrating monitoring solutions across diverse technology stacks?

Observability AIOps Engineers often encounter challenges when integrating monitoring and analytics tools across a mix of legacy systems, cloud-native applications, and various third-party platforms. Ensuring consistent data collection, normalization, and visualization can be complex due to differing protocols, data formats, and tool compatibility. Collaboration with development, operations, and security teams is crucial to address these challenges, streamline workflows, and maintain a unified observability platform. Staying current with evolving AIOps technologies and best practices is also vital for continued success in this dynamic role.

What is an Observability Aiops Engineer?

An Observability Aiops Engineer is a technology professional who focuses on implementing and managing observability tools and practices, often leveraging artificial intelligence for IT operations (AIOps). Their role is to ensure system reliability, performance, and uptime by monitoring, analyzing, and automating responses to IT incidents. They integrate data from logs, metrics, and traces to gain real-time insights, helping organizations quickly detect and resolve issues. This role combines expertise in software engineering, monitoring solutions, automation, and machine learning to improve the overall health and efficiency of IT environments.

What are the key skills and qualifications needed to thrive as an Observability AIOps Engineer, and why are they important?

To thrive as an Observability AIOps Engineer, you need expertise in systems monitoring, data analytics, automation, and a strong understanding of IT infrastructure, often supported by a degree in computer science or a related field. Familiarity with tools like Prometheus, Grafana, ELK stack, Splunk, and AIOps platforms, as well as certifications in cloud solutions (AWS, Azure, or GCP), are typically required. Strong problem-solving skills, collaboration, and a proactive mindset help you stand out in identifying and addressing system anomalies. These skills and qualities are crucial for maintaining high system reliability, reducing downtime, and enabling data-driven decision-making in complex IT environments.

What is the difference between Observability Aiops Engineer vs Site Reliability Engineer?

AspectObservability Aiops EngineerSite Reliability Engineer
Primary FocusMonitoring, analyzing, and improving system observability using AI and automationEnsuring system reliability, scalability, and performance of services
Skills & CertificationsKnowledge of AI/ML, monitoring tools, scripting, cloud platformsSystems engineering, scripting, cloud infrastructure, incident management
Work EnvironmentDevOps teams, monitoring platforms, AI toolsOperations, development teams, cloud environments
Industry UsageTech companies, cloud providers, organizations focusing on AI-driven monitoringLarge-scale tech firms, SaaS providers, internet services

While both roles focus on system performance and reliability, the Observability Aiops Engineer specializes in leveraging AI and automation to enhance system observability, whereas the Site Reliability Engineer concentrates on maintaining overall system stability and scalability. Both roles often collaborate but have distinct core responsibilities.

More about Observability Aiops Engineer jobs
What cities are hiring for Observability Aiops Engineer jobs? Cities with the most Observability Aiops Engineer job openings:
What states have the most Observability Aiops Engineer jobs? States with the most job openings for Observability Aiops Engineer jobs include:
AIOps Engineer

AIOps Engineer

JCS Solutions LLC

Fort Belvoir, VA • On-site

Full-time

Posted 14 days ago


Job description

Grow, innovate, and generate progress: Harness your expertise to solve challenges and celebrate success!
JCS Solutions LLC is seeking a Senior AIOps Engineer to support critical mission operations within a secure environment and lead the transformation of our IT Service Management (ITSM) capabilities. This role is responsible for the design, deployment, and management of AIOps solutions that enhance the reliability and security of Department of War (DoW) networks and systems.
Job Summary:
Acting as the technical lead for this initiative, you will orchestrate integrations across existing Network Engineering, ServiceNow, and SolarWinds teams. You will utilize Splunk and the Machine Learning Toolkit (MLTK) to provide descriptive and predictive analytics and establish closed-loop automated incident response, ensuring the high availability of mission-essential infrastructure.
What you will do:
  • Cross-Functional Leadership: Lead the AIOps platform initiative by acting as the primary technical liaison to existing Network Engineering, ServiceNow, and SolarWinds administration teams to establish unified telemetry pipelines.
  • ITSM Orchestration & Automation: Architect closed-loop remediation workflows by deeply integrating Splunk ITSI alerts with ServiceNow Event Management and Incident Management modules.
  • Mission-Critical Observability: Architect and maintain Splunk AIOps solutions across unclassified and classified enclaves to provide real-time situational awareness.
  • Infrastructure Telemetry Integration: Normalize and correlate network performance and fault data from SolarWinds with server and application logs to provide a holistic view of enterprise health.
  • Advanced ML Development: Deploy custom machine learning models via Splunk MLTK to identify anomalous behavior, potential cyber threats, and infrastructure degradations.
  • Secure Data Integration: Engineer secure data ingestion pipelines for telemetry data from cross-domain solutions and tactical edge devices.
  • Incident Reduction: Utilize IT Service Intelligence (ITSI) to correlate multi-source events, reducing noise and prioritizing high-impact mission alerts.
  • Cyber Defense Support: Collaborate with the Cyber Security Service Provider (CSSP) to integrate AIOps insights into defensive cyber operations (DCO).
  • Compliance & Documentation: Ensure all observability tools comply with DoW STIGs and IL5/IL6 protocols; develop and maintain architectural documentation and compliance traceability.
  • Mission Alignment: Stay current on AIOps and related capabilities relevant to DoD, federal, and intelligence mission systems.
What you will bring:
  • Security Clearance: Active Top Secret / Sensitive Compartmented Information (TS/SCI) required at time of hire.
  • Certification: Active IAT Level II certification (e.g., Security+ CE, CySA+, GSEC, or SSCP) required.
  • Citizenship: United States Citizenship is required.
  • Platform Experience: 7+ years of experience with Splunk Enterprise, including architectural design, cluster management, and advanced Search Processing Language (SPL).
  • AIOps & ITSM: 3+ years of experience implementing AIOps workflows, including integration with enterprise ITSM solutions (ServiceNow) for automated root cause analysis and remediation.
  • Machine Learning: Proven track record of building, testing, and tuning supervised and unsupervised models within the Splunk MLTK.
  • Scripting & Automation: Advanced scripting skills for developing custom search commands, API integrations, and automating remediation tasks (e.g., Python).
  • Leadership: Experience leading technical working groups and directing the efforts of adjacent infrastructure and development teams.
  • Operational Experience: Prior experience working within a DoW/DoD Operations Center (NOC/SOC) or supporting mission-critical systems and networks.
  • Communication: Must be able to present designs, plans, and analyses of alternatives to technical leadership boards for approvals.
How you will wow us:
  • Enterprise Aggregation: Experience aggregating and correlating telemetry from diverse tools, specifically SolarWinds, ServiceNow, and VMware vCenter.
  • Expert Certification: Splunk Enterprise Certified Architect or Splunk ITSI Certified Admin.
  • Cloud Observability: Experience with Cloud Native Computing Foundation (CNCF) observability tools in secure hybrid multi-cloud environments (Azure/AWS).
  • RMF/ATO Knowledge: Understanding of the Risk Management Framework (RMF) and the Authorization to Operate (ATO) process for AI/ML workloads.
JCS Solutions (JCS) is a premier technology firm providing innovative solutions and high-quality services in defense, national security, and civilian sectors. JCS offers enterprise-wide solutions including cloud computing, software development, cybersecurity, digital modernization, and management consulting for the federal government. At JCS, we elevate our customers' mission through the application of technology and professional services. Our commitment to investing in our workforce drives innovation and progress for our clients, employees, and communities. JCS is both a Great Place to Work and a Washington Post's Top Places to Work certified company.
Our employees embody our core values, and we are looking for others who do too!
  • Customer Experience: Strive for excellence and delight our clients
  • Innovation: Embrace creative thinking to enable continual growth and powerful solutions
  • Accountability: Take ownership of and pride in our actions and service delivery
  • Inspire: Be inspired to be your best self and have fun in the process
  • Integrity: Do the right thing, the right way, every time!
  • Stewardship: The careful and responsible management of something entrusted to our care.
Commitment to non-discrimination: All qualified applicants will receive consideration for employment without regard to any status protected by applicable federal, state, or local laws.