1

Deep Learning Quantization Jobs in New York (NOW HIRING)

Implement techniques such as distillation, quantization, and pruning to aggressively accelerate ... Strong experience in deep learning systems and infrastructure * Expertise in PyTorch, CUDA, Triton ...

... quantization, compression, and resource-efficient AI, to drive performance improvements and ... Research experience in machine learning, deep learning, natural language processing, and/or ...

... quantization, compression, and resource-efficient AI, to drive performance improvements and ... Research experience in deep learning, reinforcement learning, natural language processing, computer ...

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and ... Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous ...

Software Engineer - Model Performance

Manhattan, NY ยท On-site

$154K/yr

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and ... Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous ...

Sr. AI Engineer

New York, NY ยท On-site

$114K - $157K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

Sr. AI Engineer

Manhattan, NY ยท Remote

$114K - $157K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

Sr. AI Engineer

Manhattan, NY ยท Remote

$97K - $140K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

next page

Showing results 1-20

Deep Learning Quantization information

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.
What cities in New York are hiring for Deep Learning Quantization jobs? Cities in New York with the most Deep Learning Quantization job openings:
Infographic showing various Deep Learning Quantization job openings in New York as of June 2026, with employment types broken down into 1% Internship, 3% As Needed, 8% Full Time, 86% Part Time, and 2% Temporary. Highlights an 71% Physical, 3% Hybrid, and 26% Remote job distribution.
Research Engineer, Video Generation

Research Engineer, Video Generation

Mirage

New York, NY โ€ข On-site

$175K - $275K/yr

Full-time

Medical, Dental, Vision, Retirement, PTO

Posted 26 days ago


Job description

Mirage is an AI-native video platform that intelligently orchestrates production and editing through natural language. Our models leverage contextual awareness to execute the same creative decisions a professional editor would โ€” dramatically improving productivity for experienced teams, while making video creation accessible to anyone.
Weโ€™re an interdisciplinary team addressing some of the most difficult technical and creative challenges in generative media. As an early member of our team, youโ€™ll tackle foundational problems that remain largely unsolved across the industry, driving an outsized impact on the future of creative expression.

More about us

Product (Captions by Mirage)

Research (Seeing Voices, technical-white-paper)

Updates (Mirage on X / twitter)

TechCrunch, Forbes AI 50, Fast Company (press)

Our Investors

Weโ€™re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, General Catalyst, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.

Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square)

About the Role
Mirage is seeking a Research Engineer to build and scale the systems powering our video generation models. Youโ€™ll work closely with researchers to optimize performance, improve efficiency, and bring cutting-edge models into production.

This role sits at the intersection of research and systems engineering, focusing on making advanced models faster, more efficient, and capable of ultra-low latency, real-time generation.

Responsibilities

  • Train and optimize large-scale video and multimodal models

  • Improve efficiency across training and inference (memory, latency, cost)

  • Implement techniques such as distillation, quantization, and pruning to aggressively accelerate diffusion and autoregressive generation

  • Build and maintain distributed training systems

  • Optimize GPU utilization, parallelism, and throughput

  • Develop tooling for experimentation, evaluation, and debugging

  • Translate research models into robust, production-ready systems

  • Monitor and improve model performance in real-world usage

What makes you a great fit

  • 2+ years of professional industry experience

  • Strong experience in deep learning systems and infrastructure

  • Expertise in PyTorch, CUDA, Triton, and distributed training (FSDP, etc.)

  • Experience scaling and optimizing large models under low-latency inference constraints

  • Strong debugging and performance profiling skills

  • Ability to move quickly from prototype to production

  • Strong software engineering fundamentals

Benefits:
  • Comprehensive medical, dental, and vision plans

  • 401K with employer match

  • Commuter Benefits

  • Catered lunch multiple days per week

  • Dinner stipend every night if you're working late and want a bite!

  • Grubhub subscription

  • Health & Wellness Perks

  • Multiple team offsites per year with team events every month

  • Generous PTO policy

Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.

Please note benefits apply to full time employees only.

Compensation Range: $175K - $275K