Job Summary:
NVIDIA is redefining the future of AI systems through deep model–system–hardware co-design, and they are looking for a Senior Performance Architect for Nemotron. In this role, you will develop high-fidelity performance models to evaluate architectural choices and ensure future models achieve optimal trade-offs across accuracy and efficiency.
Responsibilities:
• Develop high-fidelity analytical performance models to prototype emerging algorithmic techniques & hardware optimizations to drive model-hardware co-design Nemotron family of models.
• Prioritize features to guide future software and hardware roadmap based on detailed performance modeling and analysis
• Model end-to-end performance impact of emerging GenAI workflows - such as Speculative Decoding, Agentic Pipelines, Inference-time compute scaling, RL etc. – to understand future datacenter needs
• This position requires you to keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.
Qualifications:
Required:
• A minimum qualification of a Master's degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.
• Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
• Solid understanding of ML fundamentals, model parallelism and inference serving techniques.
• Proficiency in Python (and optionally C++) for simulator design and data analysis.
• 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
• Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
• Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang.
• A Growth mindset and pragmatic 'measure, iterate, deliver' approach.
Preferred:
• Proven track record of working in multi-functional teams, spanning algorithms, software and hardware architecture.
• Ability to distill complex analyses into clear recommendations for both technical and non-technical collaborators.
• Experience with GPU computing (CUDA)
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.