1

Deep Learning Quantization Jobs in Severn, MD (NOW HIRING)

Intern, Information Tech

Washington, DC · On-site

$17 - $22.75/hr

Performance Optimization: Implement techniques such as quantization (INT8/FP8), KV cache ... Familiarity with deep learning libraries like PyTorch, TensorFlow, or JAX. Hands-on experience or ...

Optimize model inference for production environments using quantization, pruning, and hardware ... Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face). * Hands-on ...

Optimize model inference for production environments using quantization, pruning, and hardware ... Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face). * Hands-on ...

Optimize model inference for production environments using quantization, pruning, and hardware ... Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face). * Hands-on ...

... skills and deep learning experience with PyTorch, TensorFlow, or JAX. • You have hands-on ... quantization or other optimization techniques to improve inference efficiency. • You have strong ...

You dive deep. It's important for you to really know how things work. You're always building ... Experience with model compression techniques (quantization, pruning, distillation) * Contributions ...

Deep understanding of machine learning architectures, model selection, training, and optimization ... Strong background in AI/ML performance optimization, including model compression, quantization, or ...

Deep understanding of machine learning architectures, model selection, training, and optimization ... Strong background in AI/ML performance optimization, including model compression, quantization, or ...

Deep understanding of machine learning architectures, model selection, training, and optimization ... Strong background in AI/ML performance optimization, including model compression, quantization, or ...

Deep Learning Quantization information

See Severn, MD salary details

$12.2K

$93.3K

$155.6K

How much do deep learning quantization jobs pay per year?

As of May 31, 2026, the average yearly pay for deep learning quantization in Severn, MD is $93,254.00, according to ZipRecruiter salary data. Most workers in this role earn between $80,000.00 and $154,500.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Deep Learning Quantization Engineer, and why are they important?

To excel as a Deep Learning Quantization Engineer, you need a strong background in machine learning, applied mathematics, and computer science, usually supported by an advanced degree in a related field. Familiarity with deep learning frameworks (such as TensorFlow or PyTorch), quantization toolkits, and hardware acceleration platforms is crucial. Analytical thinking, problem-solving, and clear technical communication are standout soft skills in this role. These abilities are essential for efficiently optimizing models for deployment on resource-constrained hardware while maintaining accuracy and performance.

What are some common challenges faced when implementing deep learning quantization in production environments?

One of the main challenges in implementing deep learning quantization is balancing model accuracy with computational efficiency, as quantization can sometimes lead to a drop in model performance. Additionally, ensuring hardware compatibility and optimizing for different devices (such as CPUs, GPUs, or edge devices) can require extensive testing and tuning. Collaboration with data scientists, software engineers, and hardware specialists is often essential to successfully deploy quantized models at scale. Staying updated with the latest quantization techniques and frameworks is also important for overcoming these challenges.

What is deep learning quantization?

Deep learning quantization is the process of reducing the precision of the numbers used to represent a neural network's parameters, activations, or both. By converting the typically used 32-bit floating-point values to lower bit-width formats such as 16-bit or 8-bit integers, quantization significantly reduces the memory footprint and computational requirements of deep learning models. This technique helps deploy models efficiently on edge devices and mobile hardware while maintaining acceptable accuracy levels. Quantization is widely used in model optimization for faster inference and lower power consumption.

What is the difference between Deep Learning Quantization vs Machine Learning Engineer?

AspectDeep Learning QuantizationMachine Learning Engineer
Required CredentialsAdvanced degrees in AI, Computer Science, or related fields; knowledge of neural networksBachelor's or Master's in CS, Data Science, or related fields; programming skills
Work EnvironmentResearch labs, AI development teams, hardware optimization settingsSoftware development teams, data-driven projects, product-focused environments
Industry UsageAI hardware optimization, model deployment, edge computingModel development, data analysis, software solutions across industries

Deep Learning Quantization focuses on reducing model size and improving inference speed through techniques like weight and activation quantization, often in hardware or embedded systems. Machine Learning Engineers develop, implement, and optimize machine learning models for various applications. While both roles require knowledge of AI and programming, Deep Learning Quantization is more specialized in model optimization techniques, whereas Machine Learning Engineers work broadly on model development and deployment.

What cities near Severn, MD are hiring for Deep Learning Quantization jobs? Cities near Severn, MD with the most Deep Learning Quantization job openings:
Infographic showing various Deep Learning Quantization job openings in Severn, MD as of May 2026, with employment types broken down into 88% Full Time, and 12% Contract. Highlights an 68% In-person, 6% Hybrid, and 26% Remote job distribution, with an average salary of $93,254 per year, or $44.8 per hour.

Intern, Information Tech

thechronicle

Washington, DC • On-site

$17 - $22.75/hr

Other

Posted 11 days ago


Job description

The AI/LLM Development Intern will join the engineering team to accelerate projects involving Generative AI, Large Language Models, and Agentic AI frameworks. The intern will work closely with engineering and editorial teams on the continuing development of an AI/LLM driven solution to identify patterns and trends in higher ed data sets.

DUTIES AND RESPONSIBILITIES:

  • LLM Application Development: Build, test, and deploy applications using frameworks like LangChain, LangGraph, or CrewAI to automate tasks.
  • RAG Pipelines: Develop and optimize Retrieval-Augmented Generation systems to connect LLMs with internal, proprietary, or external data sources.
  • Model Fine-Tuning & Evaluation: Assist in fine-tuning state-of-the-art LLMs, running experiments, and evaluating model performance using metrics for helpfulness and safety.
  • Performance Optimization: Implement techniques such as quantization (INT8/FP8), KV cache optimization, or flash attention to optimize inference latency and throughput
  • Tool-Use & Function Calling: Develop and configure "skills" or "plugins" that allow AI agents to interact directly with internal APIs, databases, and GTM tools (e.g., Clay, Salesforce, Slack) to perform tasks such as updating account plans or triggering renewal sequences
  • End-to-End Automation: Identify manual bottlenecks in the customer lifecycle and architect end-to-end automated solutions that integrate disparate data sources (Product, CRM, Support) into a unified execution layer

 

REQUIRED EXPERIENCE AND QUALIFICATIONS:

 

  • Current pursuit of a Bachelor’s, Master’s, or PhD in Computer Science, Artificial Intelligence, Data Science, or related field.
  • Strong proficiency in Python and experience with AI orchestration frameworks (e.g., LangChain, Semantic Kernel, or AutoGen)

KNOWLEDGE, SKILLS AND ABILITIES:

  • Experience working with RESTful APIs and modern data structures (JSON, SQL, NoSQL)
  • Experience with vector databases (ChromaDB, Pinecone, Milvus).
  • Familiarity with deep learning libraries like PyTorch, TensorFlow, or JAX. Hands-on experience or academic projects working with OpenAI APIs, Hugging Face Transformers, or open-source LLMs (Llama, BERT).
  • Familiarity with enterprise integrations – connection of LLMs to existing backend systems.

 

PHYSICAL DEMANDS & WORK ENVIRONMENT:

  • Ability to perform work indoors in climate-controlled private work area with minimal noise, performing primarily sedentary work with limited physical exertion and lifting of up to 30 lbs.
  • Ability to routinely perform work on computer for an average of 4-8 hours per day, when necessary.

Ability to work extended hours whenever required or requested by management.