Job Description:
Key Responsibilities
Data Engineering & Pipeline DevelopmentDesign, develop, and maintain end-to-end data pipelines in Databricks using Spark and Delta Lake
Build and optimize ELT/ETL processes for structured and unstructured data ingestion into the Data Lakehouse
Implement scalable ingestion patterns (batch and event-driven) from internal systems, third-party APIs, and cloud sources
Develop data models (bronze, silver, gold layers) to support enterprise reporting, analytics, and downstream consumption
Data Platform & IntegrationIntegrate the Data Lakehouse with enterprise tools such as Tableau, Alteryx, and machine learning platforms
Design and implement data access controls, identity management, and secure data sharing mechanisms
Support API-based integrations and downstream data consumption patterns
Data Quality, Governance & ControlsImplement data quality checks, reconciliation processes, and monitoring within Databricks pipelines
Ensure adherence to enterprise data governance standards, including lineage, metadata, and audit requirements
Support regulatory and compliance requirements (e.g., data integrity, privacy, and security controls)
Cloud & AutomationDevelop and manage workflows using orchestration tools (e.g., Airflow, Control-M)
Automate data pipelines, deployments, and operational processes through CI/CD pipelines
Leverage cloud-native services (AWS/Azure) for data processing, storage, and event-driven architectures
Operations & SupportMonitor, troubleshoot, and optimize data pipelines and Spark workloads for performance and reliability
Support production data platforms, including incident resolution and root cause analysis
Ensure high availability, data integrity, and SLA adherence across enterprise data systems
CollaborationPartner with data architects, data scientists, BI teams, and business stakeholders to deliver data solutions
Participate in Agile ceremonies and contribute to iterative delivery of data products
Translate business requirements into scalable technical data solutions
Required Qualifications
8+ years of experience in data engineering, data platforms, or related roles
Hands-on experience with Databricks, Apache Spark (PySpark), and Delta Lake
Strong SQL and data modeling skills (relational and dimensional)
Experience building and supporting data pipelines in a cloud environment (AWS or Azure)
Experience with ELT/ETL tools (e.g., Fivetran, custom ingestion frameworks)
Familiarity with data orchestration tools (Airflow, Control-M)
Experience working in Agile development environments
Preferred QualificationsExperience in financial services or regulated environments (e.g., banking, risk, regulatory reporting)
Knowledge of data governance frameworks and tools (e.g., Collibra)
Experience with real-time or streaming data pipelines
Exposure to machine learning pipelines and feature engineering in Databricks
Cloud certifications (AWS, Azure, or Databricks)
Technical SkillsDatabricks (Lakehouse architecture, notebooks, jobs, Unity Catalog)
Spark / PySpark
SQL (advanced querying and optimization)