Job Summary:
NVIDIA is seeking outstanding Performance Architects to help analyze and develop the next generation of architectures that accelerate AI and high-performance computing applications. The role involves designing and evaluating hardware architectures to improve performance and efficiency of AI workloads, as well as optimizing large-scale deep learning workloads.
Responsibilities:
• Design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.
• Analyze and optimize large-scale deep learning workloads, especially LLM inference/training in real-world deployments.
• Build and use performance and power models (Python/C++) to drive architecture and product decisions.
• Identify and resolve system bottlenecks across compute, memory, and interconnect.
• Evaluate PPA trade-offs and guide feature prioritization for next-generation GPU/ASIC designs.
• Partner closely with software, systems, and product teams to align hardware capabilities with workload requirements.
Qualifications:
Required:
• MS or PhD in a relevant field (Computer Science, Electrical Engineering, Computer Engineering, etc) or equivalent experience.
• 5+ years of hands-on experience in GPU/ASIC architecture, parallel computing, or system performance engineering.
• Experience with deep learning workloads in production environments (training and/or inference).
• Proficiency in Python and C++ for building performance models, simulators, or analysis tools.
• Solid understanding of system architecture: memory hierarchy, data movement, and scalability.
• Prior experience debugging, profiling, and performance tuning on real systems.
• Ability to work across team and drive decisions in fast-paced product environments.
Preferred:
• Experience translating workload behavior into concrete hardware or system-level improvements.
• Practical experience with LLM inference optimization: batching, disaggregation, KV-cache management, latency/throughput tuning.
• Familiarity with production inference systems (e.g., scheduling, multi-node scaling, resource utilization)
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.