We are seeking an ambitious Senior Solutions Architect - AI Factory Deployment to join our NVIDIA Infrastructure Specialists team in Santa Clara! This role is uniquely positioned to develop, deploy ...
We are seeking an ambitious Senior Solutions Architect - AI Factory Deployment to join our NVIDIA Infrastructure Specialists team in Santa Clara! This role is uniquely positioned to develop, deploy ...
Senior Linux Security Engineer
Durham, NC · On-site
$118K - $160K/yr
You will partner with our Prime Security Architects across infrastructure, driving security improvements across compute, storage, containers, AI, and platform services. You'll collaborate extensively ...
Senior Linux Security Engineer
Durham, NC · On-site
$118K - $160K/yr
You will partner with our Prime Security Architects across infrastructure, driving security improvements across compute, storage, containers, AI, and platform services. You'll collaborate extensively ...
Data Centers & AI (high-density power supplies, AI accelerators, server infrastructure) Key Responsibilities Technical Leadership * Act as the global technical lead for assigned Wesr Coast customers ...
Data Centers & AI (high-density power supplies, AI accelerators, server infrastructure) Key Responsibilities Technical Leadership * Act as the global technical lead for assigned Wesr Coast customers ...
Field Application Engineer (FAE) - Strategic SiC Power Platforms - West Coast
Durham, NC · On-site +1
Data Centers & AI (high-density power supplies, AI accelerators, server infrastructure) Key Responsibilities Technical Leadership * Act as the global technical lead for assigned Wesr Coast customers ...
Field Application Engineer (FAE) - Strategic SiC Power Platforms - West Coast
Durham, NC · On-site +1
Data Centers & AI (high-density power supplies, AI accelerators, server infrastructure) Key Responsibilities Technical Leadership * Act as the global technical lead for assigned Wesr Coast customers ...
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Quick apply
Apply Early
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Apply Early
Linux Kernel Developer
Raleigh, NC · On-site
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Linux Kernel Developer
Raleigh, NC · On-site
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Linux Kernel Developer
Raleigh, NC · On-site
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Quick apply
Apply Early
Linux Kernel Developer
Raleigh, NC · On-site
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Apply Early
NVIDIA's Infrastructure Specialists team is hiring a Senior Solutions Architect - AI Factory Observability & Visualization! This remote role develops full-spectrum visibility that supports the smooth ...
NVIDIA's Infrastructure Specialists team is hiring a Senior Solutions Architect - AI Factory Observability & Visualization! This remote role develops full-spectrum visibility that supports the smooth ...
NVIDIA is hiring experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes ...
NVIDIA is hiring experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes ...
NVIDIA is hiring experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes ...
NVIDIA is hiring experienced software engineers with kubernetes experience to help scale up its AI Infrastructure. We expect you to have significant software engineering experience with kubernetes ...
Senior Software Engineer
Raleigh, NC · On-site +1
$95K - $158K/yr
Are you passionate about building scalable AI infrastructure and shaping governance for cutting-edge GenAI solutions? Do you enjoy collaborating across teams while mentoring others and translating ...
New
Senior Software Engineer
Raleigh, NC · On-site +1
$95K - $158K/yr
Are you passionate about building scalable AI infrastructure and shaping governance for cutting-edge GenAI solutions? Do you enjoy collaborating across teams while mentoring others and translating ...
New
Senior Software Engineer
Raleigh, NC · On-site +1
$95K - $158K/yr
Are you passionate about building scalable AI infrastructure and shaping governance for cutting-edge GenAI solutions? Do you enjoy collaborating across teams while mentoring others and translating ...
New
Senior Software Engineer
Raleigh, NC · On-site +1
$95K - $158K/yr
Are you passionate about building scalable AI infrastructure and shaping governance for cutting-edge GenAI solutions? Do you enjoy collaborating across teams while mentoring others and translating ...
New
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
DevOps Engineer (East Coast)
Raleigh, NC · On-site +1
$51.25 - $70.25/hr
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
DevOps Engineer (East Coast)
Raleigh, NC · On-site +1
$51.25 - $70.25/hr
We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and ...
Infrastructure & Capital Projects - Senior Director - Energy Compliance Consulting, ANS
Raleigh, NC · Remote
... and AI. Together, we're transforming how capital projects are planned, managed, and executed ... At Accenture Infrastructure & Capital Projects, you'll do exactly that. You'll help develop and ...
Quick apply
Apply Early
Infrastructure & Capital Projects - Senior Director - Energy Compliance Consulting, ANS
Raleigh, NC · Remote
... and AI. Together, we're transforming how capital projects are planned, managed, and executed ... At Accenture Infrastructure & Capital Projects, you'll do exactly that. You'll help develop and ...
Apply Early
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
Drive pipeline development and revenue growth for HPE Networking's data center networking, routing, and AI infrastructure connectivity portfolio across assigned commercial and lower-tier enterprise ...
At Accenture Infrastructure & Capital Projects, you'll do exactly that. You'll help develop and ... and AI. Together, we're transforming how capital projects are planned, managed, and executed ...
At Accenture Infrastructure & Capital Projects, you'll do exactly that. You'll help develop and ... and AI. Together, we're transforming how capital projects are planned, managed, and executed ...
ASIC Design and Verification Engineer - AI
$131K - $160K/yr
Knowledge of AI Infrastructure development, including MCP integration * API integration experience (REST/GraphQL) * Familiarity with cloud platforms (AWS, Azure) and CI/CD practices for ML tools
ASIC Design and Verification Engineer - AI
$131K - $160K/yr
Knowledge of AI Infrastructure development, including MCP integration * API integration experience (REST/GraphQL) * Familiarity with cloud platforms (AWS, Azure) and CI/CD practices for ML tools
Ai Infrastructure information
See Raleigh, NC salary details
$27.34 - $32.57
3% of jobs
$32.57 - $37.79
3% of jobs
$37.79 - $43.02
13% of jobs
$46.21 is the 25th percentile. Wages below this are outliers.
$43.02 - $48.24
10% of jobs
$48.24 - $53.47
12% of jobs
The median wage is $56.60 / hr.
$53.47 - $58.69
16% of jobs
$58.69 - $63.92
16% of jobs
$65.23 is the 75th percentile. Wages above this are outliers.
$63.92 - $69.15
11% of jobs
$69.15 - $74.37
9% of jobs
$74.37 - $79.60
5% of jobs
$79.60 - $84.82
3% of jobs
$27
$57
$84
How much do ai infrastructure jobs pay per hour?
What is the difference between Ai Infrastructure vs Data Engineer?
| Aspect | Ai Infrastructure | Data Engineer |
|---|---|---|
| Required Credentials | Bachelor's in CS, Engineering, or related; knowledge of cloud platforms and AI tools | Bachelor's in CS, Data Science, or related; programming and database skills |
| Work Environment | Cloud environments, AI model deployment, infrastructure setup | Data pipelines, database management, data processing |
| Employer & Industry Usage | Tech companies, AI startups, cloud providers | Tech firms, finance, healthcare, e-commerce |
Ai Infrastructure professionals focus on building and maintaining the hardware and software systems that support AI models, while Data Engineers develop and manage data pipelines and databases. Both roles require technical skills and often collaborate but serve different core functions within AI and data ecosystems.
How much do AI infrastructure engineers make?
What are AI infrastructure jobs?
What are the key skills and qualifications needed to thrive in AI Infrastructure, and why are they important?
What is a $900000 AI job?
What are common challenges faced by professionals working in AI Infrastructure roles, and how can they be addressed?
What is AI Infrastructure?
What engineer makes $500,000 a year?

Full-time
Posted 5 days ago
Job description
We are seeking an ambitious Senior Solutions Architect - AI Factory Deployment to join our NVIDIA Infrastructure Specialists team in Santa Clara! This role is uniquely positioned to develop, deploy, and validate AI factories end to end. You will focus on running and debugging AI/LLM workloads and benchmarks on Linux-based GPU clusters, using NCCL and collectives like AllReduce and AllToAll to improve performance and scalability.
As part of our world-class team, you will bring to bear observability and automation to improve benchmarks and validation. You will serve as the expert when workloads or benchmarks do not perform flawlessly. You will collaborate across NVIDIA to ensure AI factories are prepared for customers, validating hardware and software for modern AI deployments.
What You Will be Doing:
Set up, adjust, and verify AI factory environments across multi-GPU and multi-node Linux clusters.
Ensure configurations align with guidelines for NCCL, collectives, and distributed training frameworks.
Own the execution of key AI/LLM benchmarks, including setup, orchestration, result collection, and analysis.
Investigate and resolve issues when training jobs or benchmarks fail, hang, or underperform.
Build and improve observability for AI factories (metrics, logs, traces, dashboards) to understand workload behavior and system health.
Develop automation (Python, Shell) for running benchmarks, collecting results, and performing regression checks
Examine communication patterns and NCCL usage for AI/LLM workloads, concentrating on collectives such as AllReduce and AllToAll.
Recommend changes to job configuration, parallelism strategies, and cluster settings to improve throughput, latency, and scaling efficiency.
Work closely with hardware, software, networking, datacenter, and product teams to prepare AI factories for customer use.
Contribute to documentation, guidelines, and readiness collateral that support internal collaborators and customer-facing teams.
What We Need to See:
Bachelor's degree or equivalent experience in Computer Science, Mathematics, Engineering, Physics, or related field.
More than 6+ years of experience managing Linux-based systems in HPC, distributed systems, or extensive AI/ML settings.
Hands-on experience running AI/ML workloads on multi-GPU and/or multi-node clusters, with practical knowledge of NCCL.
Solid grasp of collective communication patterns, particularly AllReduce and AllToAll, and how they are applied in contemporary ML/LLM training.
Familiarity with LLM training and/or inference workflows using frameworks such as PyTorch or TensorFlow.
Proficiency with Python and Shell/Bash for scripting, automation, and tooling.
Experience with benchmarking (crafting, executing, and interpreting performance benchmarks).
Comfortable working with observability data (metrics, logs, dashboards) to troubleshoot and optimize complex distributed workloads.
Strong communication skills and the ability to work effectively with cross-functional teams.
Ways to Stand Out From the Crowd:
Experience with AI factory or large-scale AI infrastructure build, deployment, or operations.
Background in HPC performance engineering, SRE, or systems performance analysis for GPU-accelerated environments.
Familiarity with observability stacks (e.g., metrics/monitoring, logging, tracing systems) used for large distributed systems.
Experience building automation and CI-style pipelines for running and validating benchmarks at scale.
Demonstrated desire to use AI to solve practical problems, improve workflows, and guide data-driven decisions.
You will also be eligible for equity and benefits.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.About Nvidia
Sourced by ZipRecruiter
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.
Industry
Computer and electronic product manufacturing
Company size
10,000+ Employees
Headquarters location
Santa Clara, CA, US
Year founded
1993