1

Deep Learning Quantization Jobs in New York (NOW HIRING)

Senior ML Engineer

New York, NY · On-site +1

$114K - $157K/yr

Advanced Python and deep learning proficiency (PyTorch, HuggingFace Transformers, spaCy ... models via quantization, batching, and throughput tuning * Proficiency with inference ...

The ideal candidate blends deep machine learning expertise with modern software engineering ... Knowledge of model fine-tuning techniques and local LLM quantization/hosting. Familiarity with ...

The ideal candidate blends deep machine learning expertise with modern software engineering ... Knowledge of model fine-tuning techniques and local LLM quantization/hosting. Familiarity with ...

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

AI Researcher - Vatic Labs

Manhattan, NY · On-site

$175K - $250K/yr

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

AI Researcher

New York, NY · On-site

$175K - $250K/yr

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

... quantization, batching, and KV‑cache reuse. * Instrument deep observability (metrics, traces ... Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

next page

Showing results 1-20

Deep Learning Quantization information

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.
What cities in New York are hiring for Deep Learning Quantization jobs? Cities in New York with the most Deep Learning Quantization job openings:
Infographic showing various Deep Learning Quantization job openings in New York as of June 2026, with employment types broken down into 1% Internship, 3% As Needed, 8% Full Time, 86% Part Time, and 2% Temporary. Highlights an 71% Physical, 3% Hybrid, and 26% Remote job distribution.
Senior ML Engineer

Senior ML Engineer

Invoca

New York, NY • On-site, Remote

$114K - $157K/yr

Full-time

Medical, Dental, Vision, Retirement, PTO

Posted 17 days ago


Job description

Senior ML Engineer

About Invoca

Invoca is an AI-powered revenue execution platform that brings together marketing, commerce, and contact center teams to turn every customer interaction into measurable, profitable growth. Join our dynamic, fast-growing team, where innovation and collaboration are at the core of our culture.

About the Team

The Data Platform team owns the full ML lifecycle at Invoca, from model training and fine-tuning through inference optimization and production APIs. We move quickly, swarm on hard problems, and care deeply about code quality, reliability, and each other's growth. Learn more on our blog or check out our open source projects.

About the Role

We're hiring a Senior ML Engineer to own the productionization layer of Invoca's ML stack — model serving, inference optimization, fine-tuning, and the APIs and pipelines that tie it all together. You'll be a primary driver of the infrastructure powering our Context Engine and agentic AI workflows, working closely with Data Scientists, Data Engineers, and Applied AI Engineers.

Core Focus & Primary Ownership

  • Lead End-to-End MLOps and Productionization: Architect, implement, and maintain CI/CD pipelines for ML artifacts — including model evaluation, versioning, and automated deployment. Serve as the primary SME for operational excellence across the Invoca ML stack.
  • Design and Optimize SLM/LLM Deployment: Own the full inference infrastructure: model serving on Triton Inference Server, Baseten, and Kubernetes-based GPU infrastructure. Profile and tune for low latency and high throughput, and build robust, scalable APIs for internal and external model access.

Broader Contributions

  • Fine-Tune Language Models: Apply parameter-efficient fine-tuning methods (LoRA, QLoRA, PEFT) to adapt transformer-based SLMs and LLMs for high-impact NLP applications in conversation intelligence.
  • Evolve ML Infrastructure: Contribute to model training infrastructure, data pipelines, and data lake foundations to keep the systems powering our models reliable and scalable.
  • Collaborate Across Teams: Partner closely with Data Scientists, Data Engineers, and Applied AI Engineers to build the foundational ML systems behind Invoca's agentic AI products.
  • Deliver Customer Value: Work with product and engineering to understand customer needs and ship ML solutions that make a measurable difference.

What You Bring

  • 5+ years of ML Engineering experience with a strong production focus
  • Advanced Python and deep learning proficiency (PyTorch, HuggingFace Transformers, spaCy)
  • Demonstrated track record deploying and maintaining transformer-based NLP models in production
  • Hands-on experience fine-tuning SLMs/LLMs (LoRA, QLoRA, PEFT) and optimizing models via quantization, batching, and throughput tuning
  • Proficiency with inference infrastructure: Triton, Baseten, vLLM, TGI, SageMaker, Vertex AI, or similar
  • Experience building production-grade APIs that expose ML models to downstream consumers
  • Familiarity with MLOps tooling, model monitoring, and eval platforms (Braintrust, MLflow, or equivalent)
  • B.S. in Computer Science, Engineering, Statistics, or equivalent; advanced degree a plus
  • Familiarity with RLHF or preference training is a bonus

📍 Location This is a remote-first role. We are currently hiring in the following locations: 📍

United States: Greater Los Angeles Area (including Santa Barbara and San Diego) · SF Bay Area · Denver Metro · Austin Metro · Chicago Metro · Greater NYC Area

Canada: Toronto (AI/ML technical roles only)

Candidates must be based within ~2 hour drive of these areas. Occasional business travel may be required.

Please note that we are unable to provide initial visa sponsorship for this position.

Salary, Benefits & Perks:

At Invoca, all new hires in the U.S. receive benefits starting on day one of employment. Our benefits offerings include:

Please note that benefits for teammates outside the U.S. may vary in accordance with their country's laws and regulations.

  • Flexible Time Off – We encourage a healthy work-life balance. Our flexible paid time off policy allows you to recharge and take time away as needed.
  • Paid Holidays – Invoca provides 16 U.S. paid holidays, including a winter break, giving you ample opportunity to refresh and spend time with friends and family.
  • Health Benefits – Our healthcare program includes medical, dental, and vision coverage, with multiple plan options so you can choose what works best for you and your family. Fertility assistance is also included.
  • Retirement – Invoca offers a 401(k) plan through Fidelity with a company match of up to 4%.
  • Stock Options – All employees are invited to share in Invoca's success through stock options.
  • Mental Health Program– Well-being support on a broad range of issues is available through our SpringHealth program.
  • Paid Family Leave – Up to 6 weeks of 100% paid leave is provided for baby bonding, adoption, and caring for family members.
  • Paid Medical Leave – Up to 12 weeks of 100% paid leave is provided for childbirth and medical needs.
  • InVacation – As a thank-you to our long-term team members, we offer a bonus after 7 years of service.
  • Wellness Subsidy – We provide a subsidy that can be applied toward gym memberships, fitness classes, and more.
  • Position Base Range - Salary Range $152,000 - $228,000 USD plus bonus + equity

#LI-Remote