Senior Data Engineer – Azure / Python ETL Modernization
Remote (U.S.) with Minimal travel (2–3x per year)
Overview
We’re hiring a Senior Data Engineer to lead enterprise ETL modernization initiatives, transitioning legacy data pipelines (e.g., Informatica, on-prem data warehouses) into modern Azure-based, Python-driven data platforms.
This is a hands-on engineering role focused on building scalable data pipelines, refactoring legacy logic into Python/PySpark, and delivering production-grade data solutions that support analytics, reporting, and downstream data use cases.
The right candidate will have a strong background in Python-based data engineering, Azure data services, and experience modernizing legacy ETL environments.
Core Responsibilities
ETL Modernization (Primary Focus)
- Refactor and migrate legacy ETL pipelines (e.g., Informatica) into Python/PySpark-based pipelines
- Translate business logic into scalable, code-driven transformations (not tool-based ETL)
- Support large-scale migration from on-prem data warehouses to Azure
Data Pipeline Engineering
- Build and maintain pipelines using Azure Data Factory, Synapse Pipelines, and/or Databricks
- Develop reusable, parameter-driven frameworks for ingestion and transformation
- Implement ELT patterns leveraging SQL pushdown and distributed processing
Python & Spark Development
- Develop and optimize PySpark jobs for large-scale data processing
- Write clean, testable Python code for transformation, orchestration, and data quality
- Integrate with APIs and external data sources
Data Architecture & Modeling
- Implement lakehouse architecture (ADLS Gen2, Delta Lake, Parquet)
- Design dimensional models (star/snowflake) for analytics use
- Handle SCD (Type 1/2), CDC, and complex transformation logic
Platform & DevOps
- Build CI/CD pipelines using Azure DevOps (YAML, Terraform/Bicep)
- Implement monitoring, logging, and alerting (Azure Monitor, Log Analytics)
- Ensure security and access controls (RBAC, Key Vault, networking)
Required Skills
- Strong hands-on experience with Python for data engineering (non-negotiable)
- Solid experience with PySpark / Spark-based processing frameworks
- Experience with Azure Data Factory, Synapse, or Databricks
- Advanced SQL (complex transformations, optimization, performance tuning)
- Experience working with modern data lakes (ADLS Gen2, Delta Lake)
- Experience with ETL modernization or legacy system migration
- Familiarity with CI/CD and DevOps practices in data engineering
Preferred Experience
- Background migrating Informatica or similar ETL tools into Python-based frameworks
- Experience with large enterprise data warehouse environments (Teradata, SQL Server, Oracle)
- Exposure to regulated environments (healthcare, financial, etc.)
- Snowflake experience is a plus
Why This Role Is Different
- Focus on real modernization work, not legacy ETL maintenance
- Heavy emphasis on Python-first data engineering
- Opportunity to influence architecture and engineering standards
- Long-term, high-impact enterprise data platform