Senior Terraform Engineer (Hands-On) – Azure Platform, AI/ML & GenAI
We are hiring a hands-on Senior Terraform Engineer who combines deep technical build expertise with strong stakeholder leadership. You will work independently with Application Leads to understand workload needs and provision end-to-end infrastructure via Terraform across Azure (and selectively multi-cloud). This is not an advisory role—you design, build, and ship: databases, AI/ML platforms, network, security/IAM, and supporting cloud services—at enterprise scale.
Key Responsibilities
- Design, implement, and maintain Terraform for Azure resources, including compute, storage, databases (SQL/NoSQL/data platforms), networking (VNETs, Private Endpoints, Firewalls, LB), IAM/Security (AAD, RBAC, Key Vault, Policies).
- Provision AI/ML & GenAI services (Azure ML, Azure OpenAI, AI Studio, AKS, Databricks) to support training, deployment, and scalable inference.
- Build reusable Terraform modules, enforce standards (naming, tagging, policy), and manage remote state, workspaces, drift detection, and automated validations.
- Engage 1:1 with Application Leads to translate requirements into concrete Terraform plans, delivery timelines, and acceptance criteria.
- Lead technical working sessions, unblock dependencies, and own E2E provisioning through lower → higher environments (dev/test/stage/prod).
- Communicate progress, risks, and cutover plans to engineering managers and product owners—clear, concise, action-oriented.
- Integrate IaC with Azure DevOps/GitHub Actions (pipelines, approvals, policy gates).
- Implement MLOps/LLMOps hooks (registries, endpoints, monitoring, logging/alerts via Azure Monitor/MLflow/Prometheus).
- Optimize for performance, cost, reliability, and resiliency; support on-call/incident response for critical platforms.
- Enforce least-privilege IAM, private networking, encryption, secrets management, and guardrails aligned to enterprise controls.
- Support DR/BCP, data sovereignty patterns; produce change records and infra docs for audit readiness.
- Where required, integrate with AWS/GCP for identity federation, networking, and data/AI service interoperability, using Terraform providers and standardized modules.
Required Qualifications
- Bachelor’s/Master’s in CS/Engineering or equivalent experience.
- 8+ years in Cloud/DevOps/Platform roles; 5+ years hands-on Terraform in Azure (AzureRM provider, modules, backends, workspaces).
- Proven delivery of end-to-end provisioning: DB, Cloud core, AI/ML services, Network, IAM/Security using Terraform.
- Practical experience with Azure ML, AKS, Databricks, Azure OpenAI/AI Studio for production AI/ML and GenAI workloads.
- Strong skills in Kubernetes (AKS), Docker, Helm, and infrastructure patterns for model training/serving.
- Scripting automation with Python/PowerShell/Bash; Git-based workflows, code reviews, and pipeline governance.
- Stakeholder leadership: able to operate independently with Application Leads, drive decisions, and land outcomes under deadlines.
- Excellent written/verbal communication tailored to engineers and managers.
Preferred Qualifications
- Certifications: Azure DevOps Engineer Expert, Azure AI Engineer Associate, HashiCorp Terraform Associate.
- Experience with Bicep/ARM/Pulumi, Kubeflow/MLflow/Azure ML Pipelines, cost optimization for AI/ML.
- Background in regulated enterprises (finance/healthcare) with policy-as-code and audit needs.