Job Summary:
AMD is a company committed to building innovative products that enhance computing experiences across various domains, including AI and data centers. They are seeking a Senior Fellow in Engineering to lead technical efforts in Machine Learning workload performance and optimization, driving strategies for performance enhancement across multiple generations of AMD GPUs.
Responsibilities:
• Define and drive the technical strategy and roadmap for ML model optimization across AMD platforms.
• Serve as the highest-level technical authority in performance optimization, guiding architecture, and implementation decisions.
• Lead performance tuning, profiling, and analysis of large-scale models (LLMs, diffusion, multimodal, RecSys, generative AI) across single-node and distributed environments.
• Drive hardware-software co-design initiatives to influence future GPU architectures and system-level optimizations.
• Collaborate across engineering, research, and customer-facing teams to deliver best-in-class performance.
• Develop advanced methodologies, tools, and infrastructure for performance estimation, modeling, and benchmarking.
• Provide technical mentorship to senior engineers and influence best practices across the organization.
• Represent AMD in external technical forums, benchmarks, and customer engagements.
• Communicate complex technical findings and recommendations to senior leadership and stakeholders.
Qualifications:
Required:
• Recognized technical expert with a strong track record of driving large-scale performance optimization initiatives in ML systems.
• Deep expertise in ML hardware architecture, software optimization, and performance modeling.
• Strong understanding of how model architectures map to low-level software and hardware execution.
• Comfortable operating across the full stack—from model design to kernel-level optimization.
• Understand the performance implications at each layer.
• Influence without authority, mentor senior engineers, and provide technical direction across multiple teams.
• Strong knowledge of modern generative model architectures, including state-of-the-art LLMs, diffusion models, and multimodal systems.
• Experience with distributed inference and large-scale deployment.
• Define and drive the technical strategy and roadmap for ML model optimization across AMD platforms.
• Serve as the highest-level technical authority in performance optimization, guiding architecture, and implementation decisions.
• Lead performance tuning, profiling, and analysis of large-scale models (LLMs, diffusion, multimodal, RecSys, generative AI) across single-node and distributed environments.
• Drive hardware-software co-design initiatives to influence future GPU architectures and system-level optimizations.
• Collaborate across engineering, research, and customer-facing teams to deliver best-in-class performance.
• Develop advanced methodologies, tools, and infrastructure for performance estimation, modeling, and benchmarking.
• Provide technical mentorship to senior engineers and influence best practices across the organization.
• Represent AMD in external technical forums, benchmarks, and customer engagements.
• Communicate complex technical findings and recommendations to senior leadership and stakeholders.
• PhD or master's degree with equivalent experience in Computer Science, Electrical Engineering, or related field.
Preferred:
• Sr. Fellow level of experience in performance engineering, ML systems, or related domains, with demonstrated technical leadership at scale.
• Deep expertise in performance analysis, modeling, and hardware/software co-optimization.
• Proven impact on optimizing large-scale ML workloads on modern accelerator architectures.
• Strong ability to influence cross-functional teams without direct authority.
• Experience contributing to industry benchmarks, open-source ecosystems, or published research is a plus.
Company:
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions. Founded in 1969, the company is headquartered in Santa Clara, USA, with a team of 10001+ employees. The company is currently Late Stage.