Job Summary:
Altera is the world’s largest pure-play FPGA solutions provider, focused on delivering programmable technologies. They are seeking a Senior MLOps & AI Infrastructure Engineer to architect, build, and operationalize machine learning systems at scale, collaborating with various teams to design end-to-end ML pipelines and deliver AI-powered capabilities.
Responsibilities:
• Design, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments
• Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows
• Implement CI/CD/CT (Continuous Training) pipelines for ML models using tools such as Kubeflow, MLflow, Airflow, or similar
• Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools
• Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents for EDA and chip design applications
• Apply advanced techniques: transfer learning, quantization, pruning, distillation, and RLHF for production-grade model efficiency
• Implement A/B testing frameworks and shadow deployments for safe model rollout
• Benchmark and optimize model inference performance on GPU/TPU clusters
• Build and maintain data pipelines for large-scale structured and unstructured datasets (terabyte-scale)
• Collaborate with data teams to design feature engineering systems and maintain data quality for ML training
• Implement data versioning and lineage tracking (DVC, Delta Lake, or similar)
• Manage cloud ML infrastructure on AWS (SageMaker), Azure (AML), or GCP (Vertex AI) with cost and performance optimization
• Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed ML environments
• Build monitoring, alerting, and observability systems for model performance drift, data quality, and system health
• Support HPC schedulers (LSF, Slurm) for large-scale distributed training jobs
• Partner with research scientists to productionize experimental models with engineering rigor
• Mentor junior engineers and define ML engineering best practices across the organization
• Drive adoption of AI/ML solutions within semiconductor, EDA, and simulation workflows
Qualifications:
Required:
• Bachelor’s or Master’s degree in Computer Science, Machine Learning, Statistics, or related field and 10+ years of industry experience
• 10+ years of experience across ML engineering, data science, and MLOps — including frameworks (PyTorch, TensorFlow, JAX, Hugging Face) and production model deployment at scale
• 8+ years of experience experience with parallelism strategies (FSDP, DeepSpeed, data/model parallelism)
• 10+ years of experience and proficiency in Python programming
• 8+ years of experience in cloud ML platforms (AWS, GCP, Azure), Docker/Kubernetes, and CI/CD pipelines
• 5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility
Preferred:
• Phd in Computer Science, Machine Learning, Statistics, or related field
• Experience applying ML/AI to semiconductor, EDA, or chip design domains (e.g., timing prediction, place & route optimization, DRC closure)
• Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management for training workloads
• Knowledge of LLM fine-tuning, Retrieval-Augmented Generation (RAG) architectures, and AI agent frameworks such as LangChain or AutoGen
• Experience with graph neural networks (GNNs) or geometric deep learning for circuit and netlist analysis
• Background in reinforcement learning for optimization problems
• Exposure to zero-trust security, DevSecOps, and compliance automation for ML systems
• Experience working with large-scale simulation pipelines and synthetic data generation
• Experience at organizations such as NVIDIA, AMD, Intel, Google DeepMind, or similar AI/HPC-focused companies
• Published research or open-source contributions in ML, MLOps, or AI for EDA
• Experience building AI-powered developer tools or copilot-style products
• Familiarity with Synopsys, Cadence, or Siemens EDA toolchains and associated data formats
Company:
Altera provides programmable logic devices and design software for various applications. It is a sub-organization of Intel. Founded in 1983, the company is headquartered in San Jose, USA, with a team of 1001-5000 employees. The company is currently Late Stage.