Overview: Role Overview We are seeking a
Data Engineer with strong expertise in
Databricks, Python, and PySpark, coupled with experience in CI/CD pipelines. The ideal candidate will have a solid background in
data management, data warehousing, and data integration, with proven experience in developing scalable, high-performance data solutions on cloud platforms.
Key Responsibilities - Design, build, and optimize data pipelines using Databricks, PySpark, and Python.
- Develop and maintain data quality rules, transformations, and mappings to ensure data accuracy and consistency.
- Write and optimize complex SQL queries for large-scale data processing.
- Work with cloud platforms (AWS/Azure/GCP) to deliver secure, scalable solutions.
- Support data integration, data warehousing, and cleansing initiatives.
- Collaborate with cross-functional teams to deliver high-quality solutions following Agile Scrum practices.
- Troubleshoot production issues in Oracle and MS SQL Server environments and drive timely resolution.
- Implement CI/CD pipelines for continuous integration, testing, and deployment.
- Follow best practices for data management and software development lifecycle (SDLC).
Required Skills & Experience - 6-8 years of overall IT/data engineering experience.
- Must have: Databricks, Python, PySpark, and CI/CD experience.
- 3-5 years of experience in data management, warehousing, integration, and cleansing.
- Strong SQL programming skills (Oracle and MS SQL preferred).
- 2+ years of experience with Python (Perl is a plus).
- 3+ years working with cloud technologies (AWS, Azure, or GCP).
- Strong analytical, problem-solving, and troubleshooting skills.
- Experience in Agile Scrum delivery and SDLC processes.
Nice-to-Have Skills - Experience with data governance and metadata management.
- Exposure to ETL tools and orchestration frameworks (Airflow, ADF, etc.).
- Knowledge of DevOps practices and containerization (Docker, Kubernetes).