1

Pytorch Developer Jobs in Madison, WI (NOW HIRING)

Manager, ML Ops Infrastructure

Middleton, WI · Remote

$110K - $144K/yr

Python programming experience with fluency in ML frameworks (PyTorch, TensorFlow) and LLM APIs (OpenAI, Anthropic, Azure OpenAI). * Experience with the modern AI/ML toolchain including model serving ...

The AI Intern will collaborate with data scientists, engineers, and cross-functional teams to ... Demonstrated proficiency in Python and familiarity with AI frameworks such as TensorFlow, PyTorch ...

The AI Intern will collaborate with data scientists, engineers, and cross-functional teams to ... Demonstrated proficiency in Python and familiarity with AI frameworks such as TensorFlow, PyTorch ...

next page

Showing results 1-20

Pytorch Developer information

What are the key skills and qualifications needed to thrive as a Pytorch Developer, and why are they important?

To thrive as a Pytorch Developer, you need strong programming skills in Python, a solid grasp of machine learning concepts, and experience with deep learning frameworks—especially PyTorch itself. Familiarity with tools like CUDA, Jupyter Notebooks, and version control systems (e.g., Git) is typically expected, along with knowledge of cloud platforms or relevant certifications. Problem-solving ability, effective collaboration, and clear communication are crucial soft skills for success in this role. These skills and qualities are vital for efficiently building, optimizing, and deploying machine learning models in real-world applications.

What are some common challenges Pytorch Developers face when deploying machine learning models to production environments?

Pytorch Developers often encounter challenges when transitioning models from research to production, such as optimizing model performance for inference speed and memory usage, ensuring compatibility with deployment frameworks like TorchScript or ONNX, and managing dependencies across different systems. Additionally, integrating PyTorch models into existing software stacks and maintaining reproducibility can be complex. Collaborating closely with DevOps and data engineering teams is crucial to address these issues and ensure smooth deployment.

What is a PyTorch Developer?

A PyTorch Developer is a software engineer or data scientist who specializes in using PyTorch, an open-source machine learning library, to build and deploy deep learning models. Their responsibilities typically include designing neural network architectures, training and evaluating models, and optimizing code for performance. PyTorch Developers work in fields such as artificial intelligence, computer vision, and natural language processing, collaborating with teams to solve complex problems using machine learning. They are proficient in Python and have a strong understanding of deep learning concepts. Additionally, they often contribute to research, development, and the deployment of AI solutions in production environments.

What is the difference between Pytorch Developer vs Machine Learning Engineer?

AspectPytorch DeveloperMachine Learning Engineer
Required CredentialsBachelor's or higher in CS, experience with PyTorchBachelor's or higher in CS, data science, or related field, with ML experience
Work EnvironmentResearch labs, AI startups, tech companies focusing on deep learningTech companies, finance, healthcare, often involving deployment and scaling ML models
Industry UsagePrimarily in AI research and development teamsAcross industries implementing ML solutions in production

While both roles require knowledge of machine learning and experience with PyTorch, a Pytorch Developer mainly focuses on developing and optimizing deep learning models using PyTorch. A Machine Learning Engineer often has a broader scope, including deploying, maintaining, and scaling ML models across various platforms and industries.

Infographic showing various Pytorch Developer job openings in Madison, WI as of May 2026, with employment types broken down into 86% Full Time, 2% Part Time, and 12% Contract. Highlights an 80% Physical, 5% Hybrid, and 15% Remote job distribution.

Manager, ML Ops Infrastructure

Paradigm

Middleton, WI • Remote

$110K - $144K/yr

Full-time

Posted 19 days ago


Job description

Paradigm is a software company transforming the way that the residential, construction & building product industries operate across the globe. We are looking for a Manager, ML Ops Infrastructure to be part of revolutionizing these industries.

We're looking for a hands-on technical leader to build and scale the ML Ops infrastructure that powers our AI capabilities in production. You'll oversee the end-to-end platform for deploying, serving, and operating ML models and AI agents, what we call our "agent factory": a repeatable framework that makes shipping AI-powered features as reliable and routine as deploying any other service.

This role sits at the intersection of ML operations, platform engineering, and cloud infrastructure. You'll build the compute, orchestration, and deployment pipelines that take ML experiments from notebooks to production, creating self-service tooling so data scientists and ML engineers can deploy with confidence and speed.

What You Will Do:

  • Build and lead a team of ML Ops engineers focused on production deployment frameworks for AI/ML systems including hiring, mentoring, and technical guidance.

  • Design and operate Kubernetes-based infrastructure for ML workloads including model training, real-time inference, LLM serving, and agent orchestration.

  • Create the core ML Ops platform: model versioning, deployment automation, registries, serving infrastructure, and CI/CD pipelines purpose-built for ML and AI agent workflows.

  • Architect and manage GPU-accelerated compute for training and inference, optimizing for both performance and cost through spot instances, auto-scaling, and efficient resource allocation.

  • Build self-service deployment tooling that enables data scientists and ML engineers to push models and agents to production without manual infrastructure work.

  • Build the infrastructure for agentic AI: tool-calling, multi-step workflows, orchestration frameworks, multi-agent systems, and agent lifecycle management.

  • Implement production-grade deployment strategies (canary, blue/green) with rollback capabilities, observability, drift detection, and performance monitoring.

  • Partner with data science, ML engineering, and SRE teams to align infrastructure with deployment requirements and reliability SLOs.

  • Drive continuous improvement in deployment velocity, cost efficiency, and operational maturity across the ML platform including evaluating and integrating tools like MLflow, Kubeflow, and emerging agent frameworks.

What You Need to Succeed:

  • Bachelor’s degree in Computer Science, Engineering, or a related field or equivalent experience.

  • 7+ years in infrastructure engineering, DevOps, or platform engineering, with at least 3 years focused on ML/AI infrastructure.

  • 1+ years of experience building and leading teams that operate production ML systems or demonstrated tech lead experience with direct influence over team processes and career growth.

  • Track record deploying and managing ML models in production. You understand the full lifecycle from training to serving to monitoring.

  • Hands-on experience with GPU computing, model optimization, and ML-specific infrastructure patterns.

  • Hands-on experience with Kubernetes and container orchestration for ML workloads (Kubeflow, KServe, Ray, or similar).

  • Experience working with Azure cloud services such as Azure ML, Azure OpenAI, Azure Databricks, GPU-accelerated compute (GPU VMs, AKS with GPU node pools).

  • Experience using infrastructure as code tools (Terraform or equivalent) with ML infrastructure patterns.

  • Python programming experience with fluency in ML frameworks (PyTorch, TensorFlow) and LLM APIs (OpenAI, Anthropic, Azure OpenAI).

  • Experience with the modern AI/ML toolchain including model serving (vLLM, Triton, TorchServe), ML Ops platforms (MLflow, Kubeflow, W&B), vector databases (pgvector, Azure AI Search), and agent orchestration frameworks. Familiarity with RAG architectures, fine-tuning workflows, and embedding pipelines at scale.

  • You are a bridge-builder who translates fluently between ML practitioners and infrastructure teams.

  • You are a systems thinker who balances performance, cost, and reliability while building for scale.

  • You are collaborative, curious, and driven to enable teams to ship AI capabilities faster than they thought possible.

Ready to Join? Apply now at myparadigm.com/careers/
#Paradigm