Job Summary:
Berkley Alternative Markets IO is a company focused on reimagining the insurance industry through innovative technology. They are seeking a Databricks Data Engineer to design, build, deploy, and maintain scalable data pipelines in cloud environments, enabling analytics and machine learning at scale.
Responsibilities:
โข Design, build, and maintain high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark.
โข Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks, Delta Live Tables (DLT), and Databricks Workflows.
โข Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
โข Optimize pipeline performance using Photon, efficient file formats, partitioning, Z-ordering, and caching strategies.
โข Develop and manage datasets within Delta Lake, ensuring ACID reliability, schema evolution, versioning, and time travel.
โข Architect feature-rich data layers including: Bronze (raw ingestion), Silver (validated, conformed), Gold (analytics-ready and ML-ready).
โข Implement data governance using Unity Catalog for fine-grained access control, lineage, auditability, and metadata management.
โข Partner with data scientists and data engineers to create feature pipelines, model training pipelines, and production scoring pipelines.
โข Deploy and operationalize models using MLflow, Databricks Model Registry, and Databricks Workflows.
โข Use Databricks built-in AI SQL functions such as ai_query, ai_forecast, ai_analyze_sentiment to generate actionable insight from large amount of unstructured or structured raw data.
โข Implement monitoring for: Pipeline failures, Data/feature drift, Model performance degradation, Operational SLAs/SLIs/SLOs.
โข Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment, pipeline testing, and environment promotion.
โข Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.
โข Enforce Unity Catalog for lineage, permissions, and audit; manage secrets, tokens, and keys securely (e.g., Databricks secrets, Key Vault/Secrets Manager).
โข Work closely with cross-functional teams: data engineering, data scientist, product manager, and business stakeholders.
โข Serve as a Databricks SMEโchampioning best practices, code standards, governance, and reusable frameworks.
โข Document architecture, workflows, data models, runbooks, and operational procedures.
Qualifications:
Required:
โข Minimum of 3 years of experience in Databricks, PySpark notebooks, Python, DevOps, software development, and data engineering.
โข Proficient in designing, building, deploying, and maintaining high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark Notebook.
โข Proficient in building, deploying, and operating production ML models such as supervised, unsupervised, and anomaly detection, including techniques for imbalanced datasets.
โข Proficient with ML engineering and MLOps, including model versioning, CI/CD for ML, monitoring, drift detection, and automated retraining.
โข Proficiency in Python including Pandas and PySpark Dataframes.
โข Expert level of SQL skills including Stored Procedure, experience with SSIS, SSRS, Power BI is a plus.
โข Proficient with cloud data engineering platforms, such as Azure, Databricks, Spark, or SQL, and batch and streaming pipelines.
โข Familiar with Databricks AI Built-In Functions such as AI_Query, AI_Gen, AI_Classify, AI_Forecast, AI_Analyze_Sentiment, able to use them to extract actionable insights from large amount of unstructured or structured raw data.
โข Experience with Python and ML frameworks, such as PyTorch or TensorFlow.
โข Experience improving data quality, lineage, and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization, routing, or case selection.
โข Experience with predictive analytics, machine learning and artificial intelligence desired.
โข A Bachelorโs degree in Computer Science, Management Information Systems, Engineering, Math, Physics, or a related quantitative field is required (4-year degree).
โข Ability to travel locally and nationally up to 5% of the time.
Preferred:
โข Certified Databricks Data Engineer Associate or Professional is a plus.
โข Masterโs degree preferred.
โข Experience in the commercial insurance industry is a plus.
Company:
In an environment where complexity grows every day, the need for insight driven decisions has never been greater. Founded in 2019, the company is headquartered in Manassas, USA, with a team of 51-200 employees. The company is currently Growth Stage.