Job Summary We are seeking a highly accomplished experienced GPU Architect to define the next ... understand repair rates, SLAs, uptime curves. * NPI Manufacturing: The role requires a deep ...
Job Summary We are seeking a highly accomplished experienced GPU Architect to define the next ... understand repair rates, SLAs, uptime curves. * NPI Manufacturing: The role requires a deep ...
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
Sunnyvale, CA · On-site
$129K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
Quick apply
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
Sunnyvale, CA · On-site
$129K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
San Francisco, CA · On-site
$130K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
San Francisco, CA · On-site
$130K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
San Francisco, CA · On-site
$130K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
Quick apply
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
San Francisco, CA · On-site
$130K - $173K/yr
Create precision SOPs for high-stakes GPU repairs (e.g., baseboard swaps, manifold maintenance) and develop diagnostic tooling that allows Site Ops to identify NVLink flapping, PCIe degradations, or ...
GPU Architect
Milpitas, CA · On-site
Job Summary We are seeking a highly accomplished experienced GPU Architect to define the next ... understand repair rates, SLAs, uptime curves. * NPI Manufacturing: The role requires a deep ...
GPU Architect
Milpitas, CA · On-site
Job Summary We are seeking a highly accomplished experienced GPU Architect to define the next ... understand repair rates, SLAs, uptime curves. * NPI Manufacturing: The role requires a deep ...
Principal Software Engineer, GPU Compute
San Mateo, CA · On-site
$153K - $206K/yr
Drive GPU reliability and performance at fleet scale, defining the detection, diagnosis, and automated repair of unhealthy accelerators before they impact production. * Evaluate and onboard new GPU ...
Principal Software Engineer, GPU Compute
San Mateo, CA · On-site
$153K - $206K/yr
Drive GPU reliability and performance at fleet scale, defining the detection, diagnosis, and automated repair of unhealthy accelerators before they impact production. * Evaluate and onboard new GPU ...
Principal Software Engineer, GPU Compute
San Mateo, CA · On-site
$345K - $399K/yr
Drive GPU reliability and performance at fleet scale, defining the detection, diagnosis, and automated repair of unhealthy accelerators before they impact production. * Evaluate and onboard new GPU ...
Principal Software Engineer, GPU Compute
San Mateo, CA · On-site
$345K - $399K/yr
Drive GPU reliability and performance at fleet scale, defining the detection, diagnosis, and automated repair of unhealthy accelerators before they impact production. * Evaluate and onboard new GPU ...
Senior HPC & GPU Infrastructure Engineer
San Francisco, CA · On-site
$150K - $220K/yr
Coordinate with data center staff, hardware vendors, and on-site technicians for repairs, RMA ... Lead deployment of new GPU nodes, including BIOS configuration, NUMA tuning, GPU topology ...
Senior HPC & GPU Infrastructure Engineer
San Francisco, CA · On-site
$150K - $220K/yr
Coordinate with data center staff, hardware vendors, and on-site technicians for repairs, RMA ... Lead deployment of new GPU nodes, including BIOS configuration, NUMA tuning, GPU topology ...
Debug Repair Technicians
Memphis, TN · On-site
$25 - $28/hr
As an Debug Repair Technician you will: * Perform advanced board-level troubleshooting and ... GPU, FPGA, ASIC, memory, and power subsystems Diagnose high speed digital, power distribution, and ...
Debug Repair Technicians
Memphis, TN · On-site
$25 - $28/hr
As an Debug Repair Technician you will: * Perform advanced board-level troubleshooting and ... GPU, FPGA, ASIC, memory, and power subsystems Diagnose high speed digital, power distribution, and ...
Facilities Maintenance (GSE) Technician II
Kansas City, MO · On-site
$17.25 - $23.75/hr
Aircraft 400Hz Generator (GPU) repairs * Solid State GPU's. Transformers. * Hot work/welding and solder * Love efficiency and can help build improvements in processes * Dream about airplanes Other ...
Quick apply
Facilities Maintenance (GSE) Technician II
Kansas City, MO · On-site
$17.25 - $23.75/hr
Aircraft 400Hz Generator (GPU) repairs * Solid State GPU's. Transformers. * Hot work/welding and solder * Love efficiency and can help build improvements in processes * Dream about airplanes Other ...
Software Development Director, AI Infrastructure
Austin, TX · On-site
$250K/yr
This is a hands-on leadership role requiring experience working with GPU servers, their configuration, validation , benchmarking , debugging , diagnosis and repairs. You will drive technical strategy ...
Software Development Director, AI Infrastructure
Austin, TX · On-site
$250K/yr
This is a hands-on leadership role requiring experience working with GPU servers, their configuration, validation , benchmarking , debugging , diagnosis and repairs. You will drive technical strategy ...
Infrastructure Repair Technician
Huntsville, AL · On-site
$25 - $30/hr
Infra Repair Technician Location - Huntsville, AL, US 35810 Note - May need to work in shifts ... Server hardware (e.g., GPU, CPU, Motherboard) * Storage systems (e.g., Hard Drives, SSDs) * Network ...
Quick apply
Infrastructure Repair Technician
Huntsville, AL · On-site
$25 - $30/hr
Infra Repair Technician Location - Huntsville, AL, US 35810 Note - May need to work in shifts ... Server hardware (e.g., GPU, CPU, Motherboard) * Storage systems (e.g., Hard Drives, SSDs) * Network ...
Manufacturing Engineer, Repair Operations
Santa Clara, CA · On-site
$81K - $110K/yr
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... As a Manufacturing Engineer in our Repair Operations team, you will play a key role in ensuring ...
Manufacturing Engineer, Repair Operations
Santa Clara, CA · On-site
$81K - $110K/yr
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... As a Manufacturing Engineer in our Repair Operations team, you will play a key role in ensuring ...
Associate Product Manager
$90K - $105K/yr
The Associate Product Manager will support and coordinate GPU-related operations across RMA, Service, Repair, Technical Support, and Production teams, with a focus on improving workflow efficiency ...
Associate Product Manager
$90K - $105K/yr
The Associate Product Manager will support and coordinate GPU-related operations across RMA, Service, Repair, Technical Support, and Production teams, with a focus on improving workflow efficiency ...
Associate Product Manager
San Jose, CA · On-site
$90K - $105K/yr
The Associate Product Manager will support and coordinate GPU-related operations across RMA, Service, Repair, Technical Support, and Production teams, with a focus on improving workflow efficiency ...
Associate Product Manager
San Jose, CA · On-site
$90K - $105K/yr
The Associate Product Manager will support and coordinate GPU-related operations across RMA, Service, Repair, Technical Support, and Production teams, with a focus on improving workflow efficiency ...
Perform repairs and replacements of faulty components, including but not limited to: • Server hardware (e.g., GPU, CPU, Motherboard) • Storage systems (e.g., Hard Drives, SSDs) • Network ...
Quick apply
Perform repairs and replacements of faulty components, including but not limited to: • Server hardware (e.g., GPU, CPU, Motherboard) • Storage systems (e.g., Hard Drives, SSDs) • Network ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can ... Onboard new repair partners and manage day-to-day repair operations across our sites, ensuring all ...
GPU Cluster Operations & Diagnostics * Perform advanced diagnostics, repair, and maintenance on NVIDIA H100, H200, and B300 GPUs and multi-node GPU clusters * Monitor cluster health and respond ...
GPU Cluster Operations & Diagnostics * Perform advanced diagnostics, repair, and maintenance on NVIDIA H100, H200, and B300 GPUs and multi-node GPU clusters * Monitor cluster health and respond ...
Gpu Repair information
See salary details
$12.50 - $14.29
2% of jobs
$14.29 - $16.08
7% of jobs
$17.82 is the 25th percentile. Wages below this are outliers.
$16.08 - $17.88
16% of jobs
$17.88 - $19.67
21% of jobs
The median wage is $20 / hr.
$19.67 - $21.46
17% of jobs
$21.46 - $23.25
11% of jobs
$23.34 is the 75th percentile. Wages above this are outliers.
$23.25 - $25.04
11% of jobs
$25.04 - $26.84
6% of jobs
$26.84 - $28.63
4% of jobs
$28.63 - $30.42
3% of jobs
$30.42 - $32.21
1% of jobs
$12
$21
$32
How much do gpu repair jobs pay per hour?
What are the key skills and qualifications needed to thrive as a GPU Repair Technician, and why are they important?
What are GPU repair services?
What are some common challenges faced in GPU repair, and how can they be addressed?
What is the difference between Gpu Repair vs Gpu Technician?
| Aspect | Gpu Repair | Gpu Technician |
|---|---|---|
| Certifications | Hardware repair certifications, e.g., CompTIA A+ | Same as Gpu Repair, often includes electronics or hardware certifications |
| Work Environment | Repair shops, electronics labs, or service centers | Electronics labs, repair shops, or manufacturing facilities |
| Job Focus | Diagnosing and fixing GPU hardware issues | Diagnosing, repairing, and maintaining GPUs and related components |
| Industry Usage | Common in electronics repair industry | Used in electronics manufacturing and repair sectors |
Gpu Repair and Gpu Technician roles overlap significantly, focusing on diagnosing and fixing GPU hardware issues. Gpu Technicians often have broader responsibilities, including maintenance and testing, but both require similar certifications and work environments. The main difference lies in scope: Gpu Repair is more specialized in hardware fixes, while Gpu Technicians may handle a wider range of electronic components.

Other
Posted 7 days ago
Job description
About us
Graphcore is one of the world's leading innovators in Artificial Intelligence compute.
It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.
As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world's most transformative technologies. We are opening a new AI Engineering Campus in Austin, which will play a central role in Graphcore's work building the future of AI computing!.
Graphcore's teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.
Job Summary
We are seeking a highly accomplished experienced GPU Architect to define the next generation of AI accelerators and multi-GPU cluster architecture. As the demand for trillion-parameter LLM training and high-throughput localized inference accelerates, the role of GPU architecture has never been more critical. In this role, you will lead the technology characterization, reliability, and interconnect performance strategies that ensure our compute fabrics scale flawlessly. You will collaborate deeply across hardware, firmware, and AI silicon teams to build GPU infrastructure capable of pushing the absolute limits of parallel processing and hardware efficiency.
Responsibilities and Duties
- Hardware-Software Co-Design: Collaborate with software engineering to ensure the AI compute and Rack level hardware architectures fundamentally accelerate lower-level ML frameworks and localized inference engines (e.g., vLLM, Ollama, TensorRT).
- Performance Modeling: Build and analyze cycle-accurate simulators and analytical models to identify bottlenecks, forecast workload performance, and guide architectural trade-offs.
- Influence long-term silicon architecture roadmaps with our GPU SoC teams. Mentor engineering teams and drive strict engineering standards from feasibility to tape-out and post-silicon validation.
- Reliability: As a Platform level GPU architect, the role requires the candidate to have extensive knowledge in Reliability and Quality including but not limited to the ability to calculate MTBF, FIT rates, IEFR, IFR, and lifecycle bath-tub curves to understand repair rates, SLAs, uptime curves.
- NPI Manufacturing: The role requires a deep knowledge with manufacturing processes to detect and correct any inadequate manufacturing frameworks that can impact the overall quality of the products we deploy in our Datacenters.
Candidate Profile
Essential:
- Experience: 10+ years of deep experience in GPUs, AI accelerators, or highly parallel computer systems in areas of qualification, manufacturing, and programming.
- Microarchitecture Expertise: Understanding of SIMD/SIMT execution models, instruction scheduling, and hardware acceleration for machine learning algorithms.
- Manufacturing: Deep knowledge of advanced manufacturing techniques for build of AI compute units and Rack level L11 liquid cooled solutions.
- Systems Interconnects: Extensive hands-on experience characterizing data pathways across RDMA environments, and hardware clustering protocols.
- Programming & Tooling: Proficiency in C++, Python, or similar languages for performance modeling, GPU technology characterization, and workload profiling.
- Analytical Rigor: Exceptional ability to characterize complex AI mathematical operations into efficient hardware implementations.
- Education: BS or MS or equivalent experience in Computer Engineering or Electrical Engineering.
Desirable
- Specific Topology Experience: Direct experience qualifying Rack-scale GPU designs including but not limited to NPI manufacturing, testing, quality and reliability calculations.
We welcome people of different backgrounds and experiences and are committed to building an inclusive work environment that makes Graphcore a great home for everyone. We are an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.