1

Internship Inference Jobs (NOW HIRING)

Job Summary We are seeking talented Fall '26, Spring '27, and Summer '27 Inference Architecture interns to join our team and contribute to the design of next-generation AI accelerators. This role ...

Job Summary We are seeking talented Fall '26, Spring '27, and Summer '27 Inference Architecture interns to join our team and contribute to the design of next-generation AI accelerators. This role ...

Preferred : โ€ข Internship or project that deployed a microservice or ML inference demo. โ€ข Coursework/research with PyTorch or TensorFlow; simple CUDA projects a plus. โ€ข Familiarity with Grafana ...

Preferred : โ€ข Internship or project that deployed a microservice or ML inference demo. โ€ข Coursework/research with PyTorch or TensorFlow; simple CUDA projects a plus. โ€ข Familiarity with Grafana ...

OR

$466K - $750K/yr

The Machine Learning and Inference Research team is a dedicated research team building up Netflix ... Actively mentors others, such as interns or junior peers. Generally, our compensation structure ...

Internship or project that deployed a microservice or ML inference demo. * Coursework/research with PyTorch or TensorFlow; simple CUDA projects a plus. * Familiarity with Grafana/Prometheus ...

OR ยท On-site

$466K - $750K/yr

The Machine Learning and Inference Research team is a dedicated research team building up Netflix ... Actively mentors others, such as interns or junior peers. Generally, our compensation structure ...

next page

Showing results 1-20

Internship Inference information

See salary details

$9

$17

$23

How much do internship inference jobs pay per hour?

As of Jun 6, 2026, the average hourly pay for internship inference in the United States is $17.31, according to ZipRecruiter salary data. Most workers in this role earn between $14.42 and $19.23 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as an Inference Intern, and why are they important?

To thrive as an Inference Intern, you generally need a strong background in machine learning, statistics, and programming, often supported by coursework or a degree in computer science or related fields. Familiarity with frameworks like TensorFlow or PyTorch, experience with model deployment tools, and knowledge of cloud platforms such as AWS or GCP are commonly required. Strong analytical thinking, problem-solving abilities, and effective communication help interns contribute meaningfully to research and team projects. These skills are crucial for successfully developing, testing, and deploying inference models in real-world applications.

What is an Internship Inference?

An Internship Inference typically refers to the process of drawing conclusions or gaining insights from the experiences and performance of interns during their internship period. This may involve evaluating an intern's skills, adaptability, and contributions to assess their suitability for future roles or projects. Companies use internship inference to inform hiring decisions, provide feedback, and improve internship programs. The process can also help interns understand their strengths and areas for development.

What types of projects and responsibilities can I expect during an Internship in Inference, and how do these experiences contribute to professional growth?

As an intern focusing on inference, you will typically work on projects involving the deployment, optimization, and evaluation of machine learning models, often supporting a research or engineering team. Responsibilities may include running model benchmarks, improving inference speed or accuracy, and assisting with integration of models into production environments. These tasks provide hands-on experience with real-world data and infrastructure, allowing you to develop technical skills and collaborate closely with data scientists and engineers. Such exposure not only enhances your understanding of applied machine learning but also builds a strong foundation for future roles in AI and data science.

Inference Intern

Etched

San Jose, CA โ€ข On-site

Internship

Posted 28 days ago


Job description

About Etched
Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.
Job Summary
We are seeking talented Fall '26, Spring '27, and Summer '27 Inference Architecture interns to join our team and contribute to the design of next-generation AI accelerators. This role focuses on developing and optimizing compute architectures that deliver exceptional performance and efficiency for transformer workloads. You will work on cutting-edge architectural problems and performance modeling over the course of your internship.
Key responsibilities
  • Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
  • Assist in building, enhancing, and scaling Sohu's runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
  • Contribute to optimizing routing and communication layers using Sohu's collectives.
  • Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
  • Develop and leverage a deep understanding of Sohu to co-design both HW instructions and model architecture operations to maximize model performance
  • Implement high-performance software components for the Model Toolkit

You may be a good fit if you have
  • Progress towards a Bachelor's, Master's, or PhD degree in computer science, computer engineering, applied mathematics, or a related field
  • Proficiency in Python, C++
  • Understanding of performance-sensitive or complex distributed software systems, e.g. Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
  • Ported applications to non-standard accelerator hardware or hardware platforms.
  • Deep knowledge of transformer model architectures and/or inference serving stacks (vLLM, SGLang, etc.)

Strong candidates may have some experience with
  • Proficiency in Rust
  • Low-latency, high-performance applications using both kernel-level and user-space networking stacks.
  • Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
  • Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
  • Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
  • Familiarity with PyTorch or JAX.
  • Math competitions (AIME, AMC, etc)

We encourage you to apply even if you do not believe you meet every qualification.
Program details
  • 12-week paid internship
  • Generous housing support for those relocating
  • Daily lunch and dinner in our office
  • Based at our office in San Jose, CA
  • Direct mentorship from industry leaders and world-class engineers
  • Opportunity to work on one of the most important problems of our time

For any questions, contact internships@etched.com.
How we're different
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.