Job Summary:
NVIDIA is leading groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. They are seeking a Senior Software Architect to help co-design next-gen data center platforms and scalable communications software for deep learning and HPC applications.
Responsibilities:
• Investigate opportunities to improve communication performance by identifying bottlenecks in today's systems.
• Design and implement new communication technologies to accelerate AI and HPC workloads.
• Explore innovative solutions in HW and SW for our next generation platforms as part of co-design efforts involving GPU, Networking, and SW architects.
• Build proofs-of-concept, conduct experiments, and perform quantitive modeling to evaluate and drive new innovations.
• Use simulation to explore performance of large GPU clusters (think scales of 100s of 1000s of GPUs)
Qualifications:
Required:
• M.S./Ph.D. degree in CS/CE or equivalent experience.
• 5+ years of relevant experience.
• Excellent C/C++ programming and debugging skills.
• Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
• Deep understanding of operating systems, computer and system architecture.
• Solid in fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
• Strong experience with Linux.
• Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Preferred:
• Expertise in related technology and passion for what you do.
• Experience with CUDA programming and NVIDIA GPUs.
• Knowledge of high-performance networks like InfiniBand, RoCE, NVLink, etc.
• Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc.
• Knowledge of deep learning parallelisms and mapping to the communication subsystem.
• Experience with HPC applications.
• Strong collaborative and interpersonal skills and a proven track record of effectively guiding and influencing within a dynamic and multi-functional environment.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.