1

Fault Management Engineer Jobs (NOW HIRING)

JB061509 - Sr Embedded SW & Control Engin

Irvine, CA · On-site

$133.10K - $174.40K/yr

Title: Sr Embedded SW & Control Engineer * No. of Positions: 5 * Location: Irvine CA * Experience ... Fault Management Testing & Validation MIL, SIL, HIL, Hardware Bring-Up, Test Automation Safety ...

We are seeking a curious and self-motivated UEFI BIOS Engineer to join our team and contribute to ... Familiarity with RAS concepts (fault management, telemetry, recovery).Ability to anticipate ...

CISCO Support Engineer Duration: 3 Years Work Location: Atlanta GA 30308 Troubleshoot and resolve ... teams within the VOIP NRC and Fault Management Groups. Will be responsible for performing ...

next page

Showing results 1-20

Fault Management Engineer information

See salary details

$29.5K

$111.1K

$183.5K

How much do fault management engineer jobs pay per year?

As of May 29, 2026, the average yearly pay for fault management engineer in the United States is $111,144.00, according to ZipRecruiter salary data. Most workers in this role earn between $75,500.00 and $143,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Fault Management Engineer, and why are they important?

To thrive as a Fault Management Engineer, you need a solid understanding of networking principles, troubleshooting methodologies, and a relevant degree in engineering or information technology. Familiarity with network management systems (NMS), SNMP, fault monitoring tools like Nagios or SolarWinds, and certifications such as CCNA or CompTIA Network+ are typically required. Analytical thinking, attention to detail, and effective communication are crucial soft skills for diagnosing issues and coordinating resolutions. These skills ensure quick identification and resolution of network faults, maintaining system reliability and minimizing downtime.

How does a Fault Management Engineer typically collaborate with other IT teams during incident resolution?

A Fault Management Engineer works closely with network operations, system administrators, and support teams to swiftly identify and resolve system faults. During incidents, they coordinate troubleshooting efforts, communicate findings, and escalate issues to specialized teams when necessary. This collaboration ensures minimal downtime and helps maintain service reliability. Effective communication and teamwork are essential, as engineers often participate in cross-functional meetings and post-incident reviews to improve future response strategies.

What does a Fault Management Engineer do?

A Fault Management Engineer is responsible for monitoring, detecting, and resolving faults or issues within a network or system to ensure optimal performance and minimal downtime. They use specialized tools to identify problems, analyze incident reports, and coordinate with technical teams for quick resolution. Their duties often include implementing automated monitoring solutions, performing root cause analysis, and documenting incidents to prevent future occurrences. Overall, they play a crucial role in maintaining the reliability and efficiency of IT infrastructure.

What is the difference between Fault Management Engineer vs Network Operations Center (NOC) Technician?

AspectFault Management EngineerNetwork Operations Center (NOC) Technician
CertificationsNetwork+ or CCNA, fault management certificationsNetwork+ or CCNA, basic troubleshooting certifications
Work EnvironmentDesign, analyze, and resolve network faults, often in a technical or engineering settingMonitor network performance, respond to alerts, and perform troubleshooting in a control room
Employer & IndustryTelecom, ISPs, large enterprise networksTelecom, ISPs, data centers, enterprise IT

Fault Management Engineers focus on diagnosing and resolving complex network faults, often working on system design and analysis. NOC Technicians monitor network health and handle routine troubleshooting. Both roles are essential in maintaining network reliability but differ in scope and responsibilities.

More about Fault Management Engineer jobs

$120K - $140K/yr

Full-time

Posted 13 days ago


Job description

Must Have Technical/Functional Skills
We are looking for a skilled Senior Developer - A developer possessing 4 to 8 years of experience in designing, developing, integrating, and automating the telecom Fault Management (FM) layer within the OSS environment. This position involves building event probes, correlation and enrichment logic, and integrating with inventory and topology systems using AI-driven automation.
Skills:
FM Platforms IBM Netcool/NOI, AI Ops, HPe TeMIP, OpenNMS , ServiceNow EM
Programming: Python, Shell, Java, JavaScript, SQL, XML/XSLT, JSON
Integration: REST/SOAP APIs, Kafka, Webhooks, SNMP, Syslog, MQTT
Database: Oracle, MySQL, PostgreSQL, MongoDB
Cloud: Kubernetes, Docker, AWS/Azure monitoring integration
Automation: Ansible, NOI Runbooks, ServiceNow Flows, custom scripts
AI/Analytics: Experience with anomaly detection, AI Ops, or log analytics (ELK/Splunk)
Standard: 3GPP, ITU-U x.733, SID, TMF API 642, TMF621, TMF 656
Technical Skills:
FM Platforms: IBM Netcool/NOI, AI Ops, HPe TeMIP, OpenNMS , ServiceNow EM
Programming: Python, Shell, Java, JavaScript, SQL, XML/XSLT, JSON
Integration: REST/SOAP APIs, Kafka, Webhooks, SNMP, Syslog, MQTT
Database: Oracle, MySQL, PostgreSQL, MongoDB
Cloud: Kubernetes, Docker, AWS/Azure monitoring integration
Automation: Ansible, NOI Runbooks, ServiceNow Flows, custom scripts
AI/Analytics: Experience with anomaly detection, AI Ops, or log analytics (ELK/Splunk)
Standard:3GPP, ITU-U x.733, SID, TMF API 642, TMF621, TMF 656
Tools and Technologies:
FM Platforms: IBM Netcool/NOI, AI Ops, HPe TeMIP, OpenNMS , ServiceNow EM
Programming: Python, Shell, Java, JavaScript, SQL, XML/XSLT, JSON
Integration: REST/SOAP APIs, Kafka, Webhooks, SNMP, Syslog, MQTT
Database: Oracle, MySQL, PostgreSQL, MongoDB
Cloud: Kubernetes, Docker, AWS/Azure monitoring integration
Automation: Ansible, NOI Runbooks, ServiceNow Flows, custom scripts
AI/Analytics: Experience with anomaly detection, AI Ops, or log analytics (ELK/Splunk)
Standard: 3GPP, ITU-U x.733, SID, TMF API 642, TMF621, TMF 656
Roles & Responsibilities
Design and develop modules for event collection, normalization, and correlation across multi-vendor network domains such as RAN, Core, Transport, IP/MPLS, and IT Infrastructure.
Build rules, probes, gateways, and policies within FM platforms like IBM Netcool/NOI.
Build custom adapters and APIs for northbound and southbound system integrations.
Develop logic for alarm enrichment, delay, threshold, alarm resync. suppression, deduplication, and root cause analysis.
Configure/refine FM database and message bus efficiency to handle high event volumes.
Integrate FM systems with Inventory & Topology, GIS, Performance Management, and Trouble Ticketing.
Implement automation using NOI Runbooks, Ansible, and AI/ML engines.
Define fault correlation across domains and service impact assessments.
D evelop event-driven automation and closed-loop scenarios, including auto-ticketing and auto-healing.
Work alongside AI/ML teams to enable predictive fault analytics, anomaly detection, and noise reduction.
Contribute to the development of AI-based FM assistants for triage, root cause analysis, alarm prediction, and closed loop orchestration.
Understand data migration, alarm model harmonization, and integration of blueprinting.
Salary Range: $120,000-$140,000 a year