1

Deep Learning Quantization Jobs in Secaucus, NJ (NOW HIRING)

Machine Learning Engineer

New York, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · On-site

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Computer Vision/ML Engineer

Brooklyn, NY · On-site

$117.20K - $138.30K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

next page

Showing results 1-20

Deep Learning Quantization information

See Secaucus, NJ salary details

$11.2K

$85.3K

$142.3K

How much do deep learning quantization jobs pay per year?

As of Jun 1, 2026, the average yearly pay for deep learning quantization in Secaucus, NJ is $85,285.00, according to ZipRecruiter salary data. Most workers in this role earn between $73,200.00 and $141,300.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What job categories do people searching Deep Learning Quantization jobs in Secaucus, NJ look for? The top searched job categories for Deep Learning Quantization jobs in Secaucus, NJ are:
What cities near Secaucus, NJ are hiring for Deep Learning Quantization jobs? Cities near Secaucus, NJ with the most Deep Learning Quantization job openings:

Core ML Engineer: Deep Learning Architecture

Mecka AI

New York, NY • On-site

$160K - $250K/yr

Full-time

Posted 2 days ago


Job description

The Role
We're hiring an ML and Optimization Specialist to lead model architecture improvements across all of Mecka's pipelines.
This role is heavily focused on foundational deep learning engineering rather than applied ML. We are looking for an engineer who natively writes, debugs, and modifies internal model architectures from the ground up, moving beyond utilizing off-the-shelf models or standard fine-tuning.
Many of our current ML systems rely heavily on frame-by-frame models, but all of our data is inherently temporal. Your immediate focus will be converting and optimizing these models for temporal inference - a critical unlock for pipeline performance.
Beyond that, you'll be the go-to person for model-level debugging, architecture design, and optimization across the organization. This is a high-leverage, deeply technical role for someone who thinks at the architecture level.
Responsibilities
Immediate Priorities
  • Temporal model conversion - migrate frame-by-frame models to temporal architectures that leverage sequential data
  • Benchmark and validate temporal models against existing frame-based baselines
Ongoing
  • Lead model architecture improvements across all pipelines (CV, pose estimation, etc.)
  • Tune and debug ML models at the model architecture level - modifying structural code, writing custom layers, and addressing the underlying math, rather than relying solely on high-level APIs or hyperparameter tuning
  • Profile and optimize model performance (latency, throughput, memory)
  • Evaluate and introduce new architectures, training strategies, and optimization techniques
  • Collaborate with CV, ML, and infrastructure teams to deploy improved models
Who You Are
Required Skills
  • Deep expertise in ML model architecture design and optimization
  • Ability to tune and debug models at the architecture level - diagnosing issues in attention mechanisms, loss landscapes, gradient flow, etc.
  • Strong experience with temporal/sequential models (transformers, RNNs, temporal convolutions, state-space models)
  • Proficiency in PyTorch (or equivalent) at a research-engineering level
  • Experience optimizing models for production deployment
Strong Signals
  • Published papers or production experience with video understanding or temporal perception
  • Experience with model distillation, quantization, or efficient inference
  • Background in computer vision model architectures
  • Experience converting or adapting pre-trained models to new domains/modalities
  • Familiarity with ONNX, TensorRT, or similar inference optimization tools
You Are
  • Obsessed with model internals - you think in terms of structural architecture and custom implementations, rather than just training runs and applied endpoints
  • Able to move between research papers and production code
  • A strong communicator who can explain architecture tradeoffs to cross-functional teams
Why This Role
  • Own the model architecture strategy across all of Mecka's pipelines
  • Solve a critical temporal modeling challenge with immediate impact
  • Work at the intersection of perception, robotics, and ML systems
  • High ownership in a fast-moving, well-funded robotics AI company