#W2 only
ย
Job title: MLOps Platform Engineerย
Location: Reston VA - In person interviews so need Local In EAST coast onlyโ
Description:ย
MLOps Platform Engineerย
The Data Modeling Analytics & AI Engineering team is seeking an experienced MLOpsย
Platform Engineer to design, build, and support enterprise-grade machine learning operationsย
capabilities. This role will play a key part in enabling scalable, reliable, and secure ML modelย
development and deployment across our cloud and container platforms.ย
This is a hands-on engineering role requiring strong expertise in AWS, Kubernetes (EKS),ย
CI/CD automation, containerization, and ML platform operations. The ideal candidate will haveย
solid engineering fundamentals combined with practical knowledge of ML workflows,ย
deployment patterns, and platform reliability.ย
Key Responsibilitiesย
Platform Engineering & Operationsย ย
ยท Engineer, manage, and support MLOps platform components across AWS and EKS-basedย
environments.ย
ยท Oversee deployment, configuration, and operation of infrastructure used for ML training, batchย
inference, and real-time model serving.ย
ยท Ensure platform availability, resilience, and performance across dev, test, and productionย
environments.ย
ยท Implement role-based access controls (RBAC), network policies, and scalable namespaceย
designs within EKS.ย
Model Deployment & CI/CD Automationย
ยท Build and support CI/CD pipelines (GitLab) for model packaging, container image builds,ย
vulnerability scanning, and automated deployment flows.ย
ยท Enable standardized model release processes including environment promotion, versioning, andย
rollback workflows.ย
ยท Integrate CI/CD with ML frameworks, model repositories, artifacts, and runtime environments.ย
Container & Kubernetes Workloadsย
ยท Design and manage EKS workloads supporting containerized ML jobs and microservices.ย
ยท Implement auto-scaling, resource quotas, cluster optimization, and multi-tenant workloadย
isolation.ย
ยท Support GPU and CPU-based training/inference workloads.ย
Monitoring, Observability & Optimizationย
ยท Implement logging, monitoring, and alerting for ML pipelines, model endpoints, batch jobs,ย
and platform components.ย
ยท Analyze compute, storage, and data transfer usage to optimize cost efficiency across MLย
workloads.ย
ยท Perform incident response, root cause analysis, and long-term remediation planning.ย
Collaboration & Enablementย
ยท Partner with Data Scientists, ML Engineers, and application teams to operationalize end-to-endย
machine learning solutions.ย
ยท Provide technical guidance on best practices for ML model lifecycle management, deploymentย
patterns, and scalable architectures.ย
ยท Contribute to documentation, runbooks, onboarding materials, and internal knowledge bases.ย
---ย
Required Qualificationsย
ยท 3+ years of hands-on experience with AWS services, including EKS, EC2, S3, IAM,ย
CloudWatch, and ECR.ย
ยท Strong experience operating and troubleshooting Kubernetes (preferably AWS EKS).ย
ยท Proficiency in containerization (Docker) and orchestration concepts.ย
ยท Strong programming/scripting experience in Python and Bash.ย
ยท Experience building and managing CI/CD pipelines (GitLab or equivalent).ย
ยท Familiarity with machine learning workflows, including training, inference, and modelย
monitoring.ย
ยท Experience with infrastructure-as-code (Terraform or CloudFormation).ย
ยท Experience supporting production platforms, including incident management and root causeย
analysis.ย