1

Deep Learning Quantization Jobs in Secaucus, NJ (NOW HIRING)

Software Engineer - Model Performance

Manhattan, NY ยท On-site

$154.30K/yr

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and ... Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous ...

Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and ... Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous ...

Sr. AI Engineer

New York, NY ยท On-site

$114.30K - $157K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

Sr. AI Engineer

Manhattan, NY ยท Remote

$114.60K - $157.40K/yr

Optimize inference performance and cost efficiency through techniques such as model quantization ... learning, and deep learning 5. Experience with AI platforms like PyTorch or TensorFlow 6. ...

Implement quantization techniques and deploy large language models (LLMs) to maximize efficiency ... Deep knowledge and passion for data science fundamentals, training and deploying models

Lead Machine Learning Engineer-MLOps

Brooklyn, NY ยท On-site

$108.30K - $142.60K/yr

Implement quantization techniques and deploy large language models (LLMs) to maximize efficiency ... Deep knowledge and passion for data science fundamentals, training and deploying models

Senior ML Engineer

New York, NY ยท On-site +1

$114.30K - $157K/yr

Advanced Python and deep learning proficiency (PyTorch, HuggingFace Transformers, spaCy ... models via quantization, batching, and throughput tuning * Proficiency with inference ...

The ideal candidate blends deep machine learning expertise with modern software engineering ... Knowledge of model fine-tuning techniques and local LLM quantization/hosting. Familiarity with ...

The ideal candidate blends deep machine learning expertise with modern software engineering ... Knowledge of model fine-tuning techniques and local LLM quantization/hosting. Familiarity with ...

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

... data analysis, vector quantization, decision tree methods, EM methods, Bayesian methods ... Demonstration of deep knowledge of large language models and deep neural networks for practical ...

next page

Showing results 1-20

Deep Learning Quantization information

See Secaucus, NJ salary details

$11.2K

$85.3K

$142.3K

How much do deep learning quantization jobs pay per year?

As of Jun 1, 2026, the average yearly pay for deep learning quantization in Secaucus, NJ is $85,285.00, according to ZipRecruiter salary data. Most workers in this role earn between $73,200.00 and $141,300.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What job categories do people searching Deep Learning Quantization jobs in Secaucus, NJ look for? The top searched job categories for Deep Learning Quantization jobs in Secaucus, NJ are:
What cities near Secaucus, NJ are hiring for Deep Learning Quantization jobs? Cities near Secaucus, NJ with the most Deep Learning Quantization job openings:
Software Engineer - Model Performance

Software Engineer - Model Performance

Baseten

Manhattan, NY โ€ข On-site

$154.30K/yr

Other

Medical, Dental, Vision, Retirement, PTO

Posted 19 days ago


Job description

ABOUT BASETEN
Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer. By uniting applied AI research, flexible infrastructure, and seamless developer tooling, we enable companies operating at the frontier of AI to bring cutting-edge models into production. We're growing quickly and recently raised our $300M Series E, backed by investors including BOND, IVP, Spark Capital, Greylock, and Conviction. Join us and help build the platform engineers turn to to ship AI products.
THE ROLE
Are you passionate about advancing the application of artificial intelligence? We are looking for a Software Engineer focused on ML performance to join our dynamic team. This role is ideal for someone who thrives in a fast-paced startup environment and is eager to make significant contributions to the exciting field of LLM Inference. If you are a backend engineer who thrives on making things faster and is excited about open-source ML models, we look forward to your application.
EXAMPLE INITIATIVES
You'll get to work on these types of projects as part of our Model Performance team:
  • Baseten Embeddings Inference: The fastest embeddings solution available
  • The Baseten Inference Stack
  • Driving model performance optimization
RESPONSIBILITIES
  • Implement, refine, and productionize cutting-edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure.
  • Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues.
  • Apply and scale optimization techniques across a wide range of ML models, particularly large language models.
  • Collaborate with a diverse team to design and implement innovative solutions.
  • Own projects from idea to production.
REQUIREMENTS
  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • Experience with one or more general-purpose programming languages, such as Python or C++.
  • Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching).
  • Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM.
  • Demonstrated interest and experience in LLM's.
  • Deep understanding of GPU architecture.
  • Bonus:
    • Proficiency in enhancing the performance of software systems, particularly in the context of large language models (LLMs).
    • Experience with CUDA or similar technologies.
    • Deep understanding of software engineering principles and a proven track record of developing and deploying AI/ML inference solutions.
    • Experience with Docker and Kubernetes.
BENEFITS
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Fertility and family-building stipend through Carrot
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Apply now to embark on a rewarding journey in shaping the future of AI! If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.
At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.
We are an Equal Opportunity Employer and will consider qualified applicants with criminal histories in a manner consistent with applicable law (by example, the requirements of the San Francisco Fair Chance Ordinance, where applicable).