Job Summary:
Berkley Alternative Markets IO is a company focused on reimagining the insurance industry through innovative technology. They are seeking a Databricks Data Engineer to design, build, deploy, and maintain scalable data pipelines in cloud environments, enabling analytics and machine learning at scale.
Responsibilities:
• Design, build, and maintain high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark.
• Convert and modernize existing SSIS package logic into cloud-native Databricks pipelines using PySpark notebooks, Delta Live Tables (DLT), and Databricks Workflows.
• Implement reliable batch and streaming pipelines with robust data quality and validation frameworks.
• Optimize pipeline performance using Photon, efficient file formats, partitioning, Z-ordering, and caching strategies.
• Develop and manage datasets within Delta Lake, ensuring ACID reliability, schema evolution, versioning, and time travel.
• Architect feature-rich data layers including: Bronze (raw ingestion), Silver (validated, conformed), Gold (analytics-ready and ML-ready).
• Implement data governance using Unity Catalog for fine-grained access control, lineage, auditability, and metadata management.
• Partner with data scientists and data engineers to create feature pipelines, model training pipelines, and production scoring pipelines.
• Deploy and operationalize models using MLflow, Databricks Model Registry, and Databricks Workflows.
• Use Databricks built-in AI SQL functions such as ai_query, ai_forecast, ai_analyze_sentiment to generate actionable insight from large amount of unstructured or structured raw data.
• Implement monitoring for: Pipeline failures, Data/feature drift, Model performance degradation, Operational SLAs/SLIs/SLOs.
• Build automated CI/CD workflows using GitHub Actions or Azure DevOps for notebook deployment, pipeline testing, and environment promotion.
• Collaborate with data engineers to design reliable data products on Delta Lake; leverage Delta Live Tables (DLT) for declarative pipelines when applicable.
• Enforce Unity Catalog for lineage, permissions, and audit; manage secrets, tokens, and keys securely (e.g., Databricks secrets, Key Vault/Secrets Manager).
• Work closely with cross-functional teams: data engineering, data scientist, product manager, and business stakeholders.
• Serve as a Databricks SME—championing best practices, code standards, governance, and reusable frameworks.
• Document architecture, workflows, data models, runbooks, and operational procedures.
Qualifications:
Required:
• Minimum of 3 years of experience in Databricks, PySpark notebooks, Python, DevOps, software development, and data engineering.
• Proficient in designing, building, deploying, and maintaining high-performance, scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and PySpark Notebook.
• Proficient in building, deploying, and operating production ML models such as supervised, unsupervised, and anomaly detection, including techniques for imbalanced datasets.
• Proficient with ML engineering and MLOps, including model versioning, CI/CD for ML, monitoring, drift detection, and automated retraining.
• Proficiency in Python including Pandas and PySpark Dataframes.
• Expert level of SQL skills including Stored Procedure, experience with SSIS, SSRS, Power BI is a plus.
• Proficient with cloud data engineering platforms, such as Azure, Databricks, Spark, or SQL, and batch and streaming pipelines.
• Familiar with Databricks AI Built-In Functions such as AI_Query, AI_Gen, AI_Classify, AI_Forecast, AI_Analyze_Sentiment, able to use them to extract actionable insights from large amount of unstructured or structured raw data.
• Experience with Python and ML frameworks, such as PyTorch or TensorFlow.
• Experience improving data quality, lineage, and observability in enterprise data environments and operationalizing rules and model-driven scoring for prioritization, routing, or case selection.
• Experience with predictive analytics, machine learning and artificial intelligence desired.
• A Bachelor’s degree in Computer Science, Management Information Systems, Engineering, Math, Physics, or a related quantitative field is required (4-year degree).
• Ability to travel locally and nationally up to 5% of the time.
Preferred:
• Certified Databricks Data Engineer Associate or Professional is a plus.
• Master’s degree preferred.
• Experience in the commercial insurance industry is a plus.
Company:
In an environment where complexity grows every day, the need for insight driven decisions has never been greater. Founded in 2019, the company is headquartered in Manassas, USA, with a team of 51-200 employees. The company is currently Growth Stage.