1

Deep Learning Performance Architect Jobs (NOW HIRING)

We are now looking for a Senior Performance Architect for Nemotron! At NVIDIA, we are redefining ... Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang * A Growth mindset and ...

We are now looking for a Senior Performance Architect for Nemotron! At NVIDIA, we are redefining ... Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang * A Growth mindset and ...

Performance Architect

Westlake, TX ยท On-site

$160.40K/yr

... learning, sensor fusion, coaching workflows, reporting, alert management engines, high accuracy ... Deep expertise in performance observability including distributed tracing, telemetry, logs, APM ...

Performance Architect

Austin, TX ยท Hybrid

$165.50K/yr

As a Performance Architect in Embedded Processor Architecture, Engineering & Solutions (EPAES) team, you will provide deep technical expertise and technical leadership in analyzing and optimizing ...

Systems Performance Architect

Beaverton, OR ยท On-site

$173.80K/yr

Our System Performance Architecture group includes a team of interdisciplinary Performance ... learning new things from deep technical topics to user workflows.Strong interpersonal skills and ...

... deep learning (DL), high-performance computing (HPC), cloud service providers (CSP), gaming ... Come join the CPU performance architecture team and help us push performance boundaries for all our ...

... deep learning (DL), high-performance computing (HPC), cloud service providers (CSP), gaming ... Come join the CPU performance architecture team and help us push performance boundaries for all our ...

next page

Showing results 1-20

Deep Learning Performance Architect information

See salary details

$156.5K

$168K

How much do deep learning performance architect jobs pay per year?

As of May 31, 2026, the average yearly pay for deep learning performance architect in the United States is $167,842.00, according to ZipRecruiter salary data. Most workers in this role earn between $167,000.00 and $167,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Performance Architect, and why are they important?

To thrive as a Deep Learning Performance Architect, you need a strong background in computer science, deep learning frameworks, parallel computing, and optimization techniques, typically supported by a relevant degree and experience in AI or high-performance computing. Familiarity with tools such as TensorFlow, PyTorch, CUDA, and profiling or benchmarking systems is essential. Analytical problem-solving, effective communication, and a collaborative mindset help professionals excel in cross-functional teams and resolve complex performance bottlenecks. These skills are vital for optimizing AI workloads, ensuring scalability, and maximizing the efficiency of deep learning models in production environments.

What are some common challenges faced by Deep Learning Performance Architects when optimizing large-scale neural network models?

Deep Learning Performance Architects often encounter challenges such as balancing model accuracy with computational efficiency, managing memory constraints on specialized hardware, and optimizing inference or training speed across different platforms. They frequently need to profile and analyze bottlenecks at both the algorithmic and hardware levels, often requiring close collaboration with software engineers and hardware designers. Staying current with rapidly evolving deep learning frameworks and hardware accelerators is also essential to ensure optimal performance and scalability.

What is a Deep Learning Performance Architect?

A Deep Learning Performance Architect is a specialized professional who designs, analyzes, and optimizes the performance of deep learning systems and models. They work to improve the efficiency, speed, and scalability of machine learning algorithms on various hardware platforms such as GPUs, TPUs, and CPUs. Their role often involves collaborating with software engineers and data scientists to identify bottlenecks and implement solutions that enhance computational capabilities for AI workloads. By doing so, they ensure that deep learning applications run faster and more efficiently, making the best use of available resources.

What is the difference between Deep Learning Performance Architect vs Machine Learning Engineer?

AspectDeep Learning Performance ArchitectMachine Learning Engineer
CredentialsAdvanced degrees in AI, deep learning, or related fields; certifications in deep learning frameworksDegrees in computer science, data science, or related fields; certifications in machine learning tools
Work EnvironmentResearch labs, AI development teams, performance optimization settingsData-driven projects, model development, deployment environments
Industry UsageTech companies, AI research firms, organizations focusing on deep learning optimizationTech companies, startups, enterprises applying machine learning solutions

The Deep Learning Performance Architect specializes in optimizing deep learning models for efficiency and scalability, focusing on hardware and software performance. In contrast, Machine Learning Engineers develop, train, and deploy machine learning models across various applications. While both roles require strong technical skills, the Architect emphasizes performance tuning and system optimization, whereas the Engineer focuses on model development and implementation.

More about Deep Learning Performance Architect jobs
What job categories do people searching Deep Learning Performance Architect jobs look for? The top searched job categories for Deep Learning Performance Architect jobs are:
Senior Performance Architect, Nemotron

Senior Performance Architect, Nemotron

Nvidia Corporation

Santa Clara, CA โ€ข On-site

$196.10K/yr

Full-time

Posted 13 days ago


Job description

We are now looking for a Senior Performance Architect for Nemotron! At NVIDIA, we are redefining the future of AI systems through deep model-system-hardware co-design. We are looking for a forward-thinking Nemotron Performance Architect to shape the next generation of Nemotron models through performance modeling, analysis, and forward projections. In this role, you will predict before we build - developing high-fidelity models to evaluate how architectural choices translate into real-world deployment efficiency. You will ensure that future models achieve Pareto-optimal trade-offs across accuracy, throughput, and interactivity on target platforms.
Recent efforts such as LatentMoE architectures and the Nemotron Super model exemplify the kind of performance-driven co-design you will help advance-where modeling insights directly shape model architecture and system efficiency at scale. This role sits at the center of Generative AI evolution, partnering across research, framework development, compiler, and hardware teams to guide decisions that determine how efficiently intelligence scales in production.
What You'll Be Doing:
  • Develop high-fidelity analytical performance models to prototype emerging algorithmic techniques & hardware optimizations to drive model-hardware co-design Nemotron family of models.
  • Prioritize features to guide future software and hardware roadmap based on detailed performance modeling and analysis
  • Model end-to-end performance impact of emerging GenAI workflows - such as Speculative Decoding, Agentic Pipelines, Inference-time compute scaling, RL etc. - to understand future datacenter needs
  • This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.

What we need to see:
  • A minimum qualification of a Master's degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.
  • Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
  • Solid understanding of ML fundamentals, model parallelism and inference serving techniques.
  • Proficiency in Python (and optionally C++) for simulator design and data analysis.
  • 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
  • Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
  • Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang
  • A Growth mindset and pragmatic "measure, iterate, deliver" approach.

Ways to Stand Out from the Crowd
  • Proven track record of working in multi-functional teams, spanning algorithms, software and hardware architecture.
  • Ability to distill complex analyses into clear recommendations for both technical and non-technical collaborators.
  • Experience with GPU computing (CUDA)

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until May 23, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993