1

Fault Management Engineer Jobs (NOW HIRING)

Be Seen First

... programming languages. You will be involved in all phases of product development, from initial ... Participate in project scoping, design, fault management, and safety considerations * Develop and ...

... fault management logic and contingency responses. * Communicate concise technical updates on ... engineering teams, mission management, and stakeholders. What you bring to this role: * Hands-on ...

... fault management logic and contingency responses. * Communicate concise technical updates on ... engineering teams, mission management, and stakeholders. What you bring to this role: * Hands-on ...

Senior Flight Software Engineer

Long Beach, CA · On-site

$128K - $169K/yr

Vast is seeking a Senior Flight Software Engineer to own flight software development and ... Modify C&DH software - command and data handling, telemetry definitions, fault management updates ...

... fault management logic and contingency responses. * Communicate concise technical updates on ... engineering teams, mission management, and stakeholders. What you bring to this role: * Hands-on ...

... fault management, and communication management * Author and maintain platform service ... Collaborate with the hardware engineering team on compute architecture, resource allocation, and ...

... fault management, and communication management * Author and maintain platform service ... Collaborate with the hardware engineering team on compute architecture, resource allocation, and ...

... fault management, and communication management * Author and maintain platform service ... Collaborate with the hardware engineering team on compute architecture, resource allocation, and ...

Vast is seeking a Senior Flight Software Engineer to own flight software development and ... Modify C&DH software - command and data handling, telemetry definitions, fault management updates ...

... cases for Fault Management & Diagnostics, Performance & Timing, Integrated S/W • Design and ... engineer through the development process and iterate code accordingly • Perform software ...

next page

Showing results 1-20

Fault Management Engineer information

See salary details

$29.5K

$111.1K

$183.5K

How much do fault management engineer jobs pay per year?

As of Jun 19, 2026, the average yearly pay for fault management engineer in the United States is $111,144.00, according to ZipRecruiter salary data. Most workers in this role earn between $75,500.00 and $143,000.00 per year, depending on experience, location, and employer.

What engineers make $500,000?

Senior engineers in high-demand fields such as software engineering, data engineering, and specialized roles like Fault Management Engineers can earn $500,000 or more annually, especially with experience, advanced skills, and in competitive industries. Compensation often includes base salary, bonuses, and stock options, particularly at large tech companies or in leadership positions.

What engineers make $300,000 a year?

Senior Fault Management Engineers, especially those with extensive experience, specialized skills in network systems, and certifications like Cisco or Juniper, can earn $300,000 or more annually. These roles often involve managing complex network infrastructure, working in high-demand environments, and may include bonuses or stock options that contribute to total compensation.

How does a Fault Management Engineer typically collaborate with other IT teams during incident resolution?

A Fault Management Engineer works closely with network operations, system administrators, and support teams to swiftly identify and resolve system faults. During incidents, they coordinate troubleshooting efforts, communicate findings, and escalate issues to specialized teams when necessary. This collaboration ensures minimal downtime and helps maintain service reliability. Effective communication and teamwork are essential, as engineers often participate in cross-functional meetings and post-incident reviews to improve future response strategies.

What is the difference between Fault Management Engineer vs Network Operations Center (NOC) Technician?

AspectFault Management EngineerNetwork Operations Center (NOC) Technician
CertificationsNetwork+ or CCNA, fault management certificationsNetwork+ or CCNA, basic troubleshooting certifications
Work EnvironmentDesign, analyze, and resolve network faults, often in a technical or engineering settingMonitor network performance, respond to alerts, and perform troubleshooting in a control room
Employer & IndustryTelecom, ISPs, large enterprise networksTelecom, ISPs, data centers, enterprise IT

Fault Management Engineers focus on diagnosing and resolving complex network faults, often working on system design and analysis. NOC Technicians monitor network health and handle routine troubleshooting. Both roles are essential in maintaining network reliability but differ in scope and responsibilities.

What is the highest paid job in engineering?

In engineering, roles such as petroleum engineers, aerospace engineers, and engineering managers tend to have the highest salaries, often exceeding $150,000 annually. Senior positions requiring advanced skills, certifications, and leadership responsibilities typically command the highest compensation in the field.

What are the key skills and qualifications needed to thrive as a Fault Management Engineer, and why are they important?

To thrive as a Fault Management Engineer, you need a solid understanding of networking principles, troubleshooting methodologies, and a relevant degree in engineering or information technology. Familiarity with network management systems (NMS), SNMP, fault monitoring tools like Nagios or SolarWinds, and certifications such as CCNA or CompTIA Network+ are typically required. Analytical thinking, attention to detail, and effective communication are crucial soft skills for diagnosing issues and coordinating resolutions. These skills ensure quick identification and resolution of network faults, maintaining system reliability and minimizing downtime.

What does a Fault Management Engineer do?

A Fault Management Engineer is responsible for monitoring, detecting, and resolving faults or issues within a network or system to ensure optimal performance and minimal downtime. They use specialized tools to identify problems, analyze incident reports, and coordinate with technical teams for quick resolution. Their duties often include implementing automated monitoring solutions, performing root cause analysis, and documenting incidents to prevent future occurrences. Overall, they play a crucial role in maintaining the reliability and efficiency of IT infrastructure.

What is the role of a FM engineer?

A Fault Management (FM) engineer is responsible for monitoring, diagnosing, and resolving network faults to ensure system reliability and performance. They use network management tools and protocols to detect issues, perform root cause analysis, and coordinate repairs, often working in 24/7 environments. Strong technical skills and knowledge of network infrastructure are essential for this role.
More about Fault Management Engineer jobs
Staff Engineer, Memory Systems Architecture

Staff Engineer, Memory Systems Architecture

Samsung Semiconductor

San Jose, CA

Full-time

Medical, Dental, Vision, Life, Retirement, PTO

Posted 12 days ago

Be an early applicant


Samsung Electronics rating

6.7

Company rating: 6.7 out of 10

Based on 49 frontline employees who took The Breakroom Quiz

111th of 139 rated electronics manufacturers


Job description

Please Note:

To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period.

Advancing the World's Technology Together

Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more. Here, you'll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what's possible and powering the future.

We believe innovation and growth are driven by an inclusive culture and a diverse workforce. We're dedicated to empowering people to be their true selves. Together, we're building a better tomorrow for our employees, customers, partners, and communities.

Samsung Semiconductor is hiring now for a Staff Engineer, Memory Systems Architecture. The conventional DRAM failure analysis was physical electrical FA and physical FA. But, in the era of Data center, it is easier to track the field failure information. With this data set, Fault management team's role is finding DRAM failure mode, abnormality and failure rate projection.

You will be part of an incubation team working on in-field telemetry intended to transform the Customer Quality Experience for Samsung memory products. Fault Management is the future of quality to minimize system downtime within AI/ML hardware deployments and workloads of the future. We analyze trends and patterns from enormous memory fleet telemetry to bucketize failures and perform virtual root-cause analysis. Telemetry analysis helps us design solutions to proactively avoid system downtime. We conduct research and develop both in-house and collaboratively in the industry with the opportunity to publish our findings through whitepapers and conferences. We are looking for innovative and passionate thinkers who can work in a start-up environment and are excited to shape the future of data centers around the world. Join us in our mission!

What You'll Do

  • Based on the knowledge of SOC controller and memory operation including RAS feature, find and recommends better solution to mitigate the field DRAM failure rate.
  • Needs to communicate better ECC scheme to customers based on Samsung DRAM failure mode(DQ and burst)
  • Interface with customers to establish the value add of enabling in-field fault management architecture
  • Contribute to the standardization of DRAM/HBM failure logging in the OCP.
  • Propose and develop platform RAS (Reliability Availability Serviceability) algorithms for memory fault management such as page offlining, hPPR and conduct POC with known failure DIMMs in the real server and application.

Location: Daily onsite presence at our San Jose headquarters in alignment with our Flexible Work Policy.

Job ID: 42886

What You Bring

  • Bachelor's degree with 10+ years of relevant industry experience, or Master's with 8+ years or PhD with 5+ years hardware fault management, reliability, data center fleet management experience or related technical field preferred. (Must)
  • Knowledge of platform memory subsystem, platform RAS (Reliability Availability Serviceability) such as ECC, page offlining, hPPR and hardware sparing.
  • ECC design and verification and reverse engineering experience.
  • Understanding on the address mapping between CPU and memory.
  • Memory controller register modification.
  • Linux kernel commit experience.
  • DRAM and HBM failure mode understanding.
  • You're inclusive, adapting your style to the situation and diverse global norms of our people.
  • An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
  • You're collaborative, building relationships, humbly offering support and openly welcoming approaches.
  • Innovative and creative, you proactively explore new ideas and adapt quickly to change.

#LI-MD1

What We Offer
The pay range below is for all roles at this level across all US locations and functions. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience. We also offer incentive opportunities that reward employees based on individual and company performance.
This is in addition to our diverse package of benefits centered around the wellbeing of our employees and their loved ones. In addition to the usual Medical/Dental/Vision/401k, our inclusive rewards plan empowers our people to care for their whole selves. An investment in your future is an investment in ours.

Give Back With a charitable giving match and frequent opportunities to get involved, we take an active role in supporting the community.
Enjoy Time Away You'll start with 4+ weeks of paid time off a year, plus holidays and sick leave, to rest and recharge.
Care for Family Whatever family means to you, we want to support you along the way—including a stipend for fertility care or adoption, medical travel support, and virtual vet care for your fur babies.
Prioritize Emotional Wellness With on-demand apps and free confidential therapy sessions, you'll have support no matter where you are.
Stay Fit Eating well and being active are important parts of a healthy life. Our onsite Café and gym, plus virtual classes, make it easier.
Embrace Flexibility Benefits are best when you have the space to use them. That's why we facilitate a flexible environment so you can find the right balance for you.

Base Pay Range
$163,000—$253,000 USD

Equal Opportunity Employment Policy

Samsung Semiconductor takes pride in being an equal opportunity workplace dedicated to fostering an environment where all individuals feel valued and empowered to excel, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status.

When selecting team members, we prioritize talent and qualities such as humility, kindness, and dedication. We extend comprehensive accommodations throughout our recruiting processes for candidates with disabilities, long-term conditions, neurodivergent individuals, or those requiring pregnancy-related support. All candidates scheduled for an interview will receive guidance on requesting accommodations.

Our Commitment to Innovation and Fairness

At Samsung Semiconductor, we use Artificial Intelligence (AI) tools in the recruitment process to enhance efficiency. However, AI is used as a support tool, not a final decision-maker. All hiring decisions are made by our human recruiting team and hiring managers to ensure every candidate is evaluated fairly and holistically.

Recruiting Agency Policy

We do not accept unsolicited resumes. Only authorized recruitment agencies that have a current and valid agreement with Samsung Semiconductor, Inc. are permitted to submit resumes for any job openings.

Applicant AI Use Policy

At Samsung Semiconductor, we support innovation and technology. However, to ensure a fair and authentic assessment, we prohibit the use of generative AI tools to misrepresent a candidate's true skills and qualifications. Permitted uses are limited to basic preparation, grammar, and research, but all submitted content and interview responses must reflect the candidate's genuine abilities and experience. Violation of this policy may result in immediate disqualification from the hiring process.

Trade Secret Notice

By submitting an application, you agree not to disclose to Samsung—or encourage Samsung to use—any confidential or proprietary information (including trade secrets) belonging to a current or former employer or other entity.

Applicant Privacy Policy
https://semiconductor.samsung.com/about-us/careers/us/privacy/


What Samsung Electronics employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom