At Tenstorrent, we believe the future of computing must be open, which is why our interns don't just watch from the sidelines - they help build the core of it. We provide a "code-to-career" pipeline where students collaborate with industry experts to solve high-stakes problems in RISC-V and AI hardware-software co-design. By joining us, you are taking an internship to democratize high-performance computers that are accessible to everyone.
As a CPU Performance Architect Intern at Tenstorrent, you will join a team of world-class CPU designers and performance architects to shape the future of high-performance computing. You will research emerging CPU workloads, analyze their impact on next-generation CPUs, accelerators, and build innovative solutions that maximize hardware performance.
This role is on-site, based in our Santa Clara, CA office. We may also consider remote candidates on a case-by-case basis.
Who You Are
- Currently pursuing a Ph.D. in Computer Architecture, Parallel Computing Artificial Intelligence/ Machine Learning, or a related field, with a strong publication record.
- Strong foundation in computer organization, computer architecture, parallel computation and digital logic design.
- Hands-on with Linux, performance and power profiling tools, Silicon PMU profiling, and hardware performance counters.
- Proficient in Shell scripting, Python, and C++, with experience in PyTorch, benchmarking, and targeted microbenchmarks.
- Strong familiarity with modern AI productivity workflows, with the ability to effectively use AI tools.
What We Need
- Conduct deep research into cutting-edge CPU application trends, workload characterization, and advanced performance and power modeling techniques.
- Build robust profiling, automation, and visualization tools that accelerate workload analysis and streamline production workflows.
- Identify microarchitectural bottlenecks, reduce workloads for efficient simulation, and leverage hardware-software co-design principles to propose hardware-aware optimizations.
- Enable and optimize critical CPU workloads, utilizing advanced parallel computation techniques to unlock the full potential of Tenstorrent hardware and contribute novel research or patentable ideas.
What You Will Learn
- How CPU workload analysis informs next-generation accelerator architecture and performance strategy.
- Practical approaches to workload characterization, reduction, performance modeling, and simulation at scale.
- How to profile, analyze, and optimize real-world software, ranging from industry-standard SPEC CPU benchmarks to emerging agentic AI workloads across hardware and software boundaries.
- Techniques for identifying performance bottlenecks and optimizing the execution of multi-agent AI systems on modern microarchitectures.
- How CPU designers, performance architects, and software teams work together to turn analysis into product impact.
USA Hiring Timelines
This internship opportunity is available throughout our 3 terms with the following corresponding recruitment cycles:
- Winter Term: Jan-Apr work term, Sept-Dec recruit.
- Summer Term: May-Aug work term, Oct-Apr recruit.
- Fall Term: Sept-Dec work term, Jan-Aug recruit.
Please note these timelines are for reference only. Actual timelines may vary.
Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.