Job Title: Senior Data Engineer
Location: Minneapolis, MN(Remote)
Job Description:
AI Dojo Databricks SRE/Support Engineer
As Databricks SRE and Support Engineer, you will work on operations related to AI Dojo (AI/ML upskilling program developed by Optum/UHG) on Databricks. This individual contributor (IC) role requires experience on working on large-scale AI/ML platforms guaranteeing stability, reliability, scalability, and performance. Experience with modern Infrastructure and DevOps tools and paradigms, as well as proven hands-on knowledge with Databricks is a must.
Primary Responsibilities:
โขย ย ย Continuous support: Provide continuous SRE support to thousands of geographically distributed users on the AI Dojo Databricks platform: respond to tickets, triage support, liaise with customers. ย
โขย ย ย Automation & DevOps: Improve existing Infrastructure as Code (IaC) according to best DevOps practices.
โขย ย ย Systems Monitoring: Develop and maintain monitoring frameworks to timely respond to outages and other service interruptions.
Required Qualifications:
โขย ย ย Bachelorโs degree in computer science, information technology, or a related field.
โขย ย ย 6+ years of infrastructure experience: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and deep understanding of Databricks environment. In particular:
โขย ย ย Experience building Github Actions pipelines including composite actions, OIDC federation for cloud provider identity acquisition, and use of environments and deployment controls
โขย ย ย Experience building Databricks Asset Bundle and Terraform pipelines to manage and deploy Databricks Platform and Workspace resources
โขย ย ย Fluency in Python, experience with the Databricks Python SDK to perform Workspace operations, and familiarity with PySpark and Delta Lake.
โขย ย ย Deep familiarity with Databricks APIs, and use of the Databricks CLI for use provisioning Workspace identities, filesystem resources, and the querying of account and workspace level Users, Groups, and Service Principals
โขย ย ย Strong understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks.
โขย ย ย 3+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions and alike.
โขย ย ย 3+ years of experience working in support teams that are geographically distributed