1

Deep Learning Quantization Jobs in New York (NOW HIRING)

Machine Learning Engineer

New York, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · On-site

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Machine Learning Engineer

Manhattan, NY · Hybrid

$145K - $180K/yr

Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Strong understanding of deep learning architectures for image and text recognition. * Familiarity ... Preferred Qualifications * Experience with model quantization and optimization for mobile ...

Computer Vision/ML Engineer

Brooklyn, NY · On-site

$117K - $138K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

Computer Vision/ML Engineer

New York, NY · On-site

$122K - $143K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

Optimize model inference for production environments using quantization, pruning, and hardware ... Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face). * Hands-on ...

Computer Vision/ML Engineer

New York, NY · On-site

$122K - $143K/yr

The position We are looking for our lead deep learning engineer to spearhead the development of our ... Optimize models for embedded deployment using quantization, pruning, TensorRT, and NVIDIA Triton

Implement techniques such as distillation, quantization, and pruning to aggressively accelerate ... Strong experience in deep learning systems and infrastructure * Expertise in PyTorch, CUDA, Triton ...

next page

Showing results 1-20

Deep Learning Quantization information

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.
What cities in New York are hiring for Deep Learning Quantization jobs? Cities in New York with the most Deep Learning Quantization job openings:
Infographic showing various Deep Learning Quantization job openings in New York as of June 2026, with employment types broken down into 1% Internship, 3% As Needed, 8% Full Time, 86% Part Time, and 2% Temporary. Highlights an 71% Physical, 3% Hybrid, and 26% Remote job distribution.
Lead Software Engineer - AI/ML Deep Learning & GPU ML Serving

Lead Software Engineer - AI/ML Deep Learning & GPU ML Serving

Chase

Jersey City, NJ

Other

Medical, Retirement

Posted 3 days ago


JPMorgan Chase & Co. rating

8.1

Company rating: 8.1 out of 10

Based on 468 frontline employees who took The Breakroom Quiz

46th of 141 rated banks


Job description

Lead Software Engineer

Be an integral part of an agile team that's constantly pushing the envelope to enhance, build, and deliver top-notch technology products.

As a Lead Software Engineer at JPMorgan Chase within the Commercial and Investment Banking team, you will play a pivotal role in an agile team, enhancing and delivering secure, stable, and scalable technology products. As a core technical contributor, you will drive critical technology solutions across multiple technical areas, supporting the firm's business objectives.

Job Responsibilities

  • Lead the design, development, and troubleshooting of software solutions, applying innovative approaches to complex technical challenges.
  • Write secure, high-quality production code and maintain algorithms integrated with firm systems.
  • Produce architecture and design artifacts for advanced applications, ensuring compliance with design constraints.
  • Analyze and visualize large, diverse data sets to improve software applications and systems.
  • Identify and resolve hidden issues and patterns in data to enhance code quality and system architecture.
  • Collaborate with software engineering communities to explore and adopt emerging technologies.
  • Guide system design and architecture discussions, focusing on reliability and scalability.
  • Optimize deep learning models for production inference, including quantization and batching.
  • Deploy and manage GPU workloads in Kubernetes environments.
  • Build scalable, low-latency systems using web services and APIs.
  • Partner with product and program management teams to deliver business-driven solutions.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and 5+ years applied experience
  • Professional software development experience, with emphasis on ML systems.
  • Strong proficiency in Python and experience with ML frameworks (TensorFlow, PyTorch, or similar).
  • Experience with cloud technologies (Docker, Kubernetes, EKS) and public clouds (AWS, GCP).
  • Hands-on experience with ML model serving frameworks (TorchServe, TensorFlow Serving, Triton Inference Server).
  • Experience deploying and managing GPU workloads in Kubernetes.
  • Familiarity with scalable, low-latency systems based on web services and APIs.
  • Experience with NoSQL databases (Cassandra or equivalent) for high-throughput data access.
  • Understanding of GPU resource management and cost optimization.
  • Experience with modern microservices architecture.
  • Ability to lead the design of large-scale systems and evaluate tradeoffs.

Preferred qualifications, capabilities, and skills

  • MS/PhD in Computer Science, Machine Learning, or a related field.
  • Proficiency in Java, Python, Scala, or C++.
  • Experience with graph neural networks and graph processing frameworks (DGL, PyTorch Geometric, NetworkX).
  • Knowledge of GPU programming (CUDA) and performance optimization.
  • Experience with model monitoring, A/B testing, and ML observability tools.
  • Familiarity with MLOps tools and practices (MLflow, Kubeflow, SageMaker).
  • Experience serving large-scale models and optimizing for performance.

FEDERAL DEPOSIT INSURANCE ACT:

This position is subject to Section 19 of the Federal Deposit Insurance Act. As such, an employment offer for this position is contingent on JPMorgan Chase's review of criminal conviction history, including pretrial diversions or program entries.

About Us

JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world's most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.

We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.

We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.

JPMorgan Chase & Co. is an Equal Opportunity Employer, including Disability/Veterans

About the Team

J.P. Morgan's Commercial & Investment Bank is a global leader across banking, markets, securities services and payments. Corporations, governments and institutions throughout the world entrust us with their business in more than 100 countries. The Commercial & Investment Bank provides strategic advice, raises capital, manages risk and extends liquidity in markets around the world.


What JPMorgan Chase & Co. employees say

Pay

Benefits

Hours and flexibility

Workplace

Get the full story on Breakroom