Job Summary:
Red Hat is the world’s leading provider of enterprise open source software solutions, and they are seeking a highly motivated research intern to join their Machine Learning Research Team. The role involves researching and implementing networking techniques for machine learning workloads and contributing to efforts that enhance distributed LLM inference.
Responsibilities:
• Research via experimentation and theoretical modeling the network bandwidth requirements and trade-offs in Prefill-Decode (P/D) disaggregated LLM serving.
• Research and implement networking techniques/methods for high-performance KV cache transfers in deployment setups without RDMA networking.
• Conduct experiments to evaluate the impact of newly developed non-RDMA KV Cache transfer techniques on performance (latency and throughput) in P/D LLM serving.
• Collaborate with researchers and engineers to integrate the networking techniques/methods into real-world distributed inference workflows (e.g. in llm-d)
• Document findings and contribute to technical reports, research theses, blog posts, or research publications.
Qualifications:
Required:
• Currently pursuing a Masters (with research) or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, or a related field.
• Strong programming skills in C/C++, Rust, and Python.
• Experience with the Linux network stack including frameworks such as DPDK or eBPF/XDP.
• Strong analytical and problem-solving skills.
• Excellent communication skills and ability to work in a team-oriented research environment.
Preferred:
• Familiarity with distributed LLM serving with prefill/decode disaggregation and KV cache transfers is a plus, but not required.
Company:
Red Hat is a software company that offers enterprise open-source software solutions. It is a sub-organization of IBM. Founded in 1993, the company is headquartered in Raleigh, USA, with a team of 10001+ employees. The company is currently Late Stage.