Job Summary:
NVIDIA is a fast-growing technology company that leads the AI revolution, seeking a Senior DL Algorithms Engineer for LLM/Omni model optimizations. The role focuses on performance analysis and optimization of Deep Learning workloads across the hardware/software stack.
Responsibilities:
• Enable and optimize state-of-the-art open models (like Nemotron and Cosmos) on NVIDIA’s accelerated inference SW stack.
• Contribute new features, fix bugs and deliver production code to open-source frameworks like TRT-LLM, vLLM, SGLang, FlashInfer, etc.
• Profile and analyze bottlenecks across the full inference stack to push the boundaries of inference performance.
• Benchmark state-of-the-art offerings and perform competitive analysis for NVIDIA’s SW/HW stack.
• Co-design with partner teams to develop the next generation of AI models and services.
Qualifications:
Required:
• PhD in CS, EE or CSEE or equivalent experience.
• 3+ years of experience.
• Strong background in deep learning and neural networks, in particular inference.
• Experience with performance profiling, analysis and optimization, especially for GPU-based applications.
• Proficient in PyTorch or equivalent frameworks for AI, or HPC-heavy application development.
• Deep understanding of computer architecture, and familiarity with the fundamentals of GPU architecture.
Preferred:
• Proven experience with processor and system-level performance optimization.
• Deep understanding of modern LLM/Diffusion architectures.
• Strong fundamentals in algorithms.
• GPU programming experience (CUDA or OpenCL) is a strong plus.
Company:
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. Founded in 1993, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.