1

Deep Learning Quantization Jobs in Freehold, NJ (NOW HIRING)

Machine Learning Engineer

New York, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Optimize model inference for production environments using quantization, pruning, and hardware ... Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face). * Hands-on ...

Computer Vision/ML Engineer

New York, NY · On-site

$122K - $143.90K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

Computer Vision/ML Engineer

New York, NY · On-site

$122K - $143.90K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

Implement techniques such as distillation, quantization, and pruning to aggressively accelerate ... Strong experience in deep learning systems and infrastructure * Expertise in PyTorch, CUDA, Triton ...

Implement techniques such as distillation, quantization, and pruning to aggressively accelerate ... Strong experience in deep learning systems and infrastructure * Expertise in PyTorch, CUDA, Triton ...

... quantization, compression, and resource-efficient AI, to drive performance improvements and ... Research experience in machine learning, deep learning, natural language processing, and/or ...

... quantization, compression, and resource-efficient AI, to drive performance improvements and ... Research experience in deep learning, reinforcement learning, natural language processing, computer ...

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and ... Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous ...

Sr. AI Engineer

New York, NY · On-site

$114.30K - $157K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

Senior ML Engineer

New York, NY · On-site +1

$114.30K - $157K/yr

Advanced Python and deep learning proficiency (PyTorch, HuggingFace Transformers, spaCy ... models via quantization, batching, and throughput tuning * Proficiency with inference ...

AI Researcher

New York, NY · On-site

$175K - $250K/yr

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

AI Researcher

New York, NY · On-site

$175K - $250K/yr

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

Senior ML Engineer, Fauna

New York, NY · On-site

$114.30K - $157K/yr

You'll bring deep expertise in reinforcement learning, computer vision, and supervised learning ... deployment (quantization, pruning, TensorRT, ONNX) - Experience with physics simulation ...

... quantization, batching, and KV‑cache reuse. * Instrument deep observability (metrics, traces ... Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

next page

Showing results 1-20

Deep Learning Quantization information

See Freehold, NJ salary details

$11K

$84K

$140.2K

How much do deep learning quantization jobs pay per year?

As of Jun 1, 2026, the average yearly pay for deep learning quantization in Freehold, NJ is $84,014.00, according to ZipRecruiter salary data. Most workers in this role earn between $72,100.00 and $139,200.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What cities near Freehold, NJ are hiring for Deep Learning Quantization jobs? Cities near Freehold, NJ with the most Deep Learning Quantization job openings:

Core ML Engineer: Deep Learning Architecture

Mecka AI

New York, NY • On-site

$160K - $250K/yr

Full-time

Posted 2 days ago


Job description

The Role
We're hiring an ML and Optimization Specialist to lead model architecture improvements across all of Mecka's pipelines.
This role is heavily focused on foundational deep learning engineering rather than applied ML. We are looking for an engineer who natively writes, debugs, and modifies internal model architectures from the ground up, moving beyond utilizing off-the-shelf models or standard fine-tuning.
Many of our current ML systems rely heavily on frame-by-frame models, but all of our data is inherently temporal. Your immediate focus will be converting and optimizing these models for temporal inference - a critical unlock for pipeline performance.
Beyond that, you'll be the go-to person for model-level debugging, architecture design, and optimization across the organization. This is a high-leverage, deeply technical role for someone who thinks at the architecture level.
Responsibilities
Immediate Priorities
  • Temporal model conversion - migrate frame-by-frame models to temporal architectures that leverage sequential data
  • Benchmark and validate temporal models against existing frame-based baselines
Ongoing
  • Lead model architecture improvements across all pipelines (CV, pose estimation, etc.)
  • Tune and debug ML models at the model architecture level - modifying structural code, writing custom layers, and addressing the underlying math, rather than relying solely on high-level APIs or hyperparameter tuning
  • Profile and optimize model performance (latency, throughput, memory)
  • Evaluate and introduce new architectures, training strategies, and optimization techniques
  • Collaborate with CV, ML, and infrastructure teams to deploy improved models
Who You Are
Required Skills
  • Deep expertise in ML model architecture design and optimization
  • Ability to tune and debug models at the architecture level - diagnosing issues in attention mechanisms, loss landscapes, gradient flow, etc.
  • Strong experience with temporal/sequential models (transformers, RNNs, temporal convolutions, state-space models)
  • Proficiency in PyTorch (or equivalent) at a research-engineering level
  • Experience optimizing models for production deployment
Strong Signals
  • Published papers or production experience with video understanding or temporal perception
  • Experience with model distillation, quantization, or efficient inference
  • Background in computer vision model architectures
  • Experience converting or adapting pre-trained models to new domains/modalities
  • Familiarity with ONNX, TensorRT, or similar inference optimization tools
You Are
  • Obsessed with model internals - you think in terms of structural architecture and custom implementations, rather than just training runs and applied endpoints
  • Able to move between research papers and production code
  • A strong communicator who can explain architecture tradeoffs to cross-functional teams
Why This Role
  • Own the model architecture strategy across all of Mecka's pipelines
  • Solve a critical temporal modeling challenge with immediate impact
  • Work at the intersection of perception, robotics, and ML systems
  • High ownership in a fast-moving, well-funded robotics AI company