1

Aiops Jobs (NOW HIRING)

AIOps Engineer

$100K - $150K/yr

About the Role Nelnet is seeking an AIOps Engineer to own the operational backbone of our Enterprise AI platforms and build the AI Agents that power our Shared Services teams. Reporting to the ...

Netcool Developer with AIOps Cloud Pak Location: TAMPA/ FL, Dallas, TX, Basking Ridge/ NJ, Ashburn/ Virginia- 100% ONSITE Duration: 6 months Candidate must have exp in- 1. Telecom Exp is MUST 2. IBM ...

Wells Fargo is seeking a Principal Engineer - AIOps to join Platform Strategy & Transformation as part of Commercial & Corporate and Investment Management Technology (CCIBT) group. Learn more about ...

Wells Fargo is seeking a Principal Engineer - AIOps to join Platform Strategy & Transformation as part of Commercial & Corporate and Investment Management Technology (CCIBT) group. Learn more about ...

next page

Showing results 1-20

Aiops information

What is a $900,000 AI job?

A $900,000 AI job typically refers to a high-level position in artificial intelligence, such as senior AI engineer, AI research director, or executive roles like Chief AI Officer, often requiring advanced skills in machine learning, deep learning, and data analysis. These roles usually involve leadership, strategic planning, and extensive experience, and they may be found in large tech companies or organizations investing heavily in AI development.

What is the salary of AIOps engineer?

The salary of an AIOps engineer typically ranges from $80,000 to $150,000 annually, depending on experience, location, and the complexity of the role. Senior positions or those requiring specialized skills in machine learning and automation may offer higher compensation.

Is AIOps a good career?

AIOps is a growing field that combines artificial intelligence and IT operations to automate and improve system management. It requires skills in data analysis, machine learning, and cloud platforms, making it a valuable and in-demand career path with strong job prospects. Professionals in AIOps often work in environments that emphasize automation, monitoring tools, and continuous learning.

What job makes $10,000 a month without a degree?

An AIOps engineer can potentially earn $10,000 or more per month by managing IT operations with automation, monitoring tools, and cloud platforms. Success in this role depends on technical skills, experience, and certifications, rather than formal degrees, and often involves working in tech companies or consulting environments.

What are the typical day-to-day responsibilities of someone working in an AIOps position?

In an AIOps role, your primary responsibilities include monitoring IT infrastructure, analyzing large volumes of system data, and proactively automating responses to incidents using machine learning models and advanced analytics. You will collaborate closely with IT, DevOps, and cybersecurity teams to detect anomalies, optimize system performance, and reduce downtime. Troubleshooting and refining automation scripts are part of the daily workflow, along with participating in incident response and post-mortem analysis. This role requires continual learning, as you will regularly implement new tools and processes to stay ahead of emerging technology trends and operational challenges.

What are the key skills and qualifications needed to thrive in the Aiops position, and why are they important?

To thrive in an AIOps role, you need a strong background in IT operations, data analytics, and automation technology, often supported by a degree in computer science or a related field. Familiarity with monitoring tools like Splunk, ELK Stack, and AI/ML platforms, as well as certifications in cloud platforms or DevOps practices, is highly valuable. Excellent problem-solving skills, adaptability, and effective communication are essential for collaborating with cross-functional teams. These competencies enable AIOps professionals to effectively predict, identify, and resolve IT issues rapidly, ensuring seamless and efficient system operations.

What is an AIOps job?

An AIOps job involves using artificial intelligence and machine learning to enhance IT operations by automating processes, analyzing vast amounts of data, and identifying patterns to prevent issues. AIOps professionals work with tools that help in real-time monitoring, anomaly detection, and incident response to improve system reliability and efficiency. They collaborate with IT teams to reduce downtime, improve performance, and streamline operations.

More about Aiops jobs
What cities are hiring for Aiops jobs? Cities with the most Aiops job openings:
What are the most commonly searched types of Aiops jobs? The most popular types of Aiops jobs are:
What states have the most Aiops jobs? States with the most job openings for Aiops jobs include:
What job categories do people searching Aiops jobs look for? The top searched job categories for Aiops jobs are:
Infographic showing various Aiops job openings in the United States as of June 2026, with employment types broken down into 95% Full Time, 1% Part Time, and 4% Contract. Highlights an 76% Physical, 13% Hybrid, and 11% Remote job distribution.
Senior Site Reliability Engineer, AIOPs

Senior Site Reliability Engineer, AIOPs

NVIDIA

Santa Clara, CA • On-site

$67 - $89/hr

Full-time

This job post has expired today. Applications are no longer accepted.


Job description

Job Summary:
NVIDIA has been transforming computer graphics and computing for over 25 years, and they are seeking a Senior Site Reliability Engineer to join their innovative team. The role involves operating an AI Data Center AIOps platform, ensuring uptime, performance, and data integrity while collaborating with engineering teams to create actionable insights and automation.
Responsibilities:
• Continuously monitor platform health via dashboards/logs/metrics, automate recurring checks, and keep reliability + resource efficiency on track.
• Own Kubernetes deployments end-to-end (runbooks, canary checks, post-deploy validation), and lead rollbacks/remediations when needed.
• Lead first-level incident triage: collect diagnostics, identify likely root causes, and hand off clear, actionable findings to engineering.
• Build and maintain runbooks/SOPs/checklists, pushing continuous improvement through automation.
• Manage deployment infrastructure and packaging (Helm + Terraform/IaC) to keep environments scalable, consistent, and reproducible.
• Contribute in adjacent functional areas to grow and help your team members!
Qualifications:
Required:
• BS/MS in CS/CE (or equivalent experience) and 5+ years operating production distributed systems as SRE/DevOps/Platform Ops.
• Proven ownership of reliability for an observability/AIOps platform: SLOs/SLIs, on-call, addressing incidents, and follow-up evaluations that drive measurable improvements.
• Deep Kubernetes + containers experience (deploying, debugging, scaling) for telemetry-heavy microservices—ingestion, processing, storage, APIs, and UI.
• Automation-first approach: solid scripting (Python/Bash), CI/CD, and infrastructure-as-code (Terraform + Helm) to deliver safe rollouts (canaries/rollbacks), reproducible environments, and minimal toil.
• Clear communicator who writes excellent runbooks/docs and can translate ambiguous requirements into concrete operational practices and dependable customer-facing reliability.
Preferred:
• Strong Linux + networking fundamentals, distributed systems instincts, and hands-on ops for Kubernetes/services/streaming stacks are ideal; bonus for experience with observability platforms at scale.
• Experience building safe automation that operators trust: canary releases, automated rollback criteria, 'monitoring for the monitoring' (lag/drop/error budgets), and replay/backfill pipelines with correctness checks.
• Strong in distributed/streaming systems operations (Kafka/Pulsar, Flink/Spark, ClickHouse/Elastic/TSDBs, object storage)—and can reason about backpressure, hotspots, and failure domains end-to-end.
• Proven programming experience building automation tools or services — ideally in Python, or similar languages — to simplify operations and scale recurring processes.
• Proven experience running large‑scale production deployments and multiple Kubernetes environments or clusters across teams or customers, coordinating changes and rollouts with minimal disruption with hands‑on experience with observability tools — you know your way around dashboards, metrics, logs, and traces using platforms like Prometheus, Grafana, or similar.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993