1

Compression Internship Jobs (NOW HIRING)

DFT Intern

San Jose, CA ยท On-site

$17.50 - $23.50/hr

We are looking for Summer '26, Fall '26, Spring '27, and Summer '27 interns. You may be a good fit ... Knowledge of DFT concepts such as MBIST, scan insertion, and scan compression * Experience with ...

In this role, you will develop embedded software for image and video compression/processing ... Internship or direct related experience in embedded software development. Experience working with ...

In this role, you will develop embedded software for image and video compression/processing ... Internship or direct related experience in embedded software development. Experience working with ...

In this role, you will develop embedded software for image and video compression/processing ... Internship or direct related experience in embedded software development. Experience working with ...

In this role, you will develop embedded software for image and video compression/processing ... Internship or direct related experience in embedded software development. Experience working with ...

DFT Intern

San Jose, CA ยท On-site

We are looking for Summer '26, Fall '26, Spring '27, and Summer '27 interns. You may be a good fit ... Knowledge of DFT concepts such as MBIST, scan insertion, and scan compression * Experience with ...

Junior Engineering Intern

Addison, TX ยท On-site

$16.25 - $21/hr

Throughout your internship, you'll work alongside experienced engineers to enhance your skills and ... data compression, machine learning, and search technologies, with a focus on applying these ...

Junior Engineering Intern

Addison, TX ยท On-site

$16.25 - $21/hr

Throughout your internship, you'll work alongside experienced engineers to enhance your skills and ... data compression, machine learning, and search technologies, with a focus on applying these ...

Exposure to model compression or quantization concepts such as INT8, FP16, or related approaches ... Internship, research, or project experience in deep learning model deployment, inference ...

Exposure to model compression or quantization concepts such as INT8, FP16, or related approaches ... Internship, research, or project experience in deep learning model deployment, inference ...

Free Red-Light Therapy, Compression, Cryotherapy, Red-Light Sauna * Free floats + Contrast Therapy at Float 41 * Hands on training experience * Flexible schedule * Co-coaching style sessions Ideal ...

next page

Showing results 1-20

Compression Internship information

See salary details

$9

$17

$23

How much do compression internship jobs pay per hour?

As of Jun 24, 2026, the average hourly pay for compression internship in the United States is $17.31, according to ZipRecruiter salary data. Most workers in this role earn between $14.42 and $19.23 per hour, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Compression Intern, and why are they important?

To succeed as a Compression Intern, you need a solid background in computer science fundamentals, algorithms, and data structures, often supported by progress toward a relevant degree. Familiarity with programming languages like C++ or Python and experience using source control systems such as Git are typically required. Strong analytical thinking, problem-solving abilities, and effective communication skills help interns tackle technical challenges and collaborate with team members. These skills and qualities are crucial for contributing to data compression projects and learning effectively in a technical internship environment.

What is a Compression Internship?

A Compression Internship is a temporary, hands-on learning position where interns work with data compression algorithms, techniques, and software systems. Interns typically assist in developing, optimizing, or testing methods that reduce the size of data for storage or transmission. These roles are often found in tech companies that handle large amounts of data, such as streaming services or cloud storage providers. The internship offers practical experience in computer science fields like signal processing, machine learning, and software engineering. It is ideal for students or recent graduates interested in systems performance and efficiency.

What can I expect from the team environment and mentorship during a Compression Internship?

During a Compression Internship, you can expect to work closely with experienced engineers and data scientists, often in a collaborative team setting focused on optimizing algorithms and data storage solutions. Interns typically receive mentorship through regular check-ins, code reviews, and project guidance, which helps build both technical and problem-solving skills. The environment is usually fast-paced and encourages learning, with opportunities to contribute to real projects, ask questions, and participate in team discussions. This structure not only supports your technical growth but also helps you develop professional communication and teamwork abilities.

What is the difference between Compression Internship vs Compression Engineer?

AspectCompression InternshipCompression Engineer
Required CredentialsTypically pursuing or recent graduate in engineering or related fieldBachelor's or Master's in Mechanical, Civil, or Petroleum Engineering
Work EnvironmentInternship programs, entry-level tasks, supervisedFull-time, project management, design, and analysis
Employer & Industry UsageOil & gas, manufacturing, energy sectorsOil & gas, energy, industrial sectors

The Compression Internship provides hands-on experience for students or recent graduates, focusing on learning and supporting compression systems. In contrast, a Compression Engineer is a full-time professional responsible for designing, analyzing, and maintaining compression equipment. While internships are temporary and educational, engineers hold permanent roles with greater responsibilities.

More about Compression Internship jobs
What cities are hiring for Compression Internship jobs? Cities with the most Compression Internship job openings:
What are the most commonly searched types of Compression jobs? The most popular types of Compression jobs are:
What states have the most Compression Internship jobs? States with the most job openings for Compression Internship jobs include:

Member of Technical Staff - Model Optimization and Inference (New Grad)

Nuance Labs

Seattle, WA โ€ข On-site

$200K - $300K/yr

Full-time

Retirement, PTO

Posted 13 days ago


Job description

About Nuance Labs
Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person.
We're a research company, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved.
How Nuance Differentiates
Most conversational AI avatars today are hacks - a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2-5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack.
That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up.
About the Role
We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation - and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product.
We're looking for someone who's excited about taking trained models and squeezing every last millisecond out of them. You understand - or want to deeply understand - the full stack from model weights to serving infrastructure: quantization, KV cache optimization, kernel-level acceleration, batching strategies. You've worked with vLLM, SGLang, or similar frameworks (through coursework, research, internships, or open-source) and have opinions about where they fall short.
This posting is aimed at early-career engineers finishing or recently finished with a BS, MS, or PhD. We don't require a PhD - we care about systems intuition, engineering chops, and the appetite to go deep.
Our stack is more complex than a standard LLM deployment: we're serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There's a lot of unsolved optimization work here, and we want someone who finds that genuinely exciting and is ready to grow fast alongside people who've built these systems before.
What You'll Do
  • Contribute to end-to-end inference optimization across our model stack - LLMs, audio models, and diffusion-based components
  • Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention
  • Work with inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) and extend them for our specific workloads
  • Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks
  • Build internal tooling that makes optimization work faster and more rigorous - profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly
  • Accelerate diffusion model inference - consistency models, step distillation, caching strategies, and custom kernel optimizations
  • Apply quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality
  • Work closely with research and infrastructure to ensure new models ship with optimized serving from day one
What We're Looking For
  • BS, MS, or PhD in CS, ML, or a related field - completed or in the final stretch
  • Strong fundamentals in LLM inference or ML systems - KV caching, memory layout, attention kernels, batching, or serving - picked up through coursework, research, internships, or open-source. You don't need to have shipped at production scale yet; you do need to learn fast and go deep.
  • Exposure to inference serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar) - even at a research or hobby level
  • Strong Python and PyTorch skills; familiarity with CUDA or Triton is a significant plus
  • A systematic approach to profiling and optimization - you measure first, then optimize
  • Curiosity about diffusion inference, speculative decoding, quantization, or other inference-time acceleration techniques
Bonus Points
  • Internship or research experience with LLM inference, ML systems, or model serving
  • Contributions to open-source inference frameworks (vLLM, SGLang, TensorRT-LLM, etc.)
  • CUDA / Triton kernel work, even at a research or hobby scale
  • Publications or research projects in MLSys, model compression, or inference optimization
  • Familiarity with multimodal or streaming inference architectures
  • Experience with hard latency SLAs in any real-time system
Compensation
$200,000 - $300,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly.
Logistics
  • Location: In-person in Seattle, five days a week - we believe in the compounding value of working shoulder-to-shoulder.
  • Visa sponsorship: We sponsor visas (O-1, H-1B, green card) from day one.
  • AI-native tooling: Do your best work with the best tools, including unlimited tokens.
Benefits
  • Health: HSA plan with ~$2,000 in annual company contributions - roughly 2x what most big tech companies put in.
  • Time off: 15 days of PTO plus public holidays, and we close the office for a full week at year-end.
  • Food: Lunch, drinks, and snacks on us every workday - the small thing that quietly makes the day better.
  • Commuter benefits: We help cover the cost of getting to the office.
  • 401(k): In the works.

Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.