Job Summary:Responsible for designing, building, and optimizing large-scale data pipelines and ETL processes in cloud environments (Azure & GCP). Develop high-performance data ingestion, transformation, and analytics solutions while ensuring data quality, scalability, and reliability. Collaborate with global teams to implement cloud-based data warehouses and analytics platforms.
Key Responsibilities:- Build and maintain cloud-based data pipelines using Azure Databricks, Azure Data Factory (ADF), Azure Data Lake, Synapse, BigQuery, and GCP data services.
- Develop ETL frameworks and reusable solutions to process large volumes of structured and unstructured data.
- Perform performance tuning on Spark (Scala/Python), Hive, and SQL queries to ensure efficient data processing.
- Design and implement Star/Snowflake schemas for analytical data warehouses.
- Ensure data quality, governance, and compliance across cloud environments.
- Collaborate with global teams to deliver high-quality, production-ready solutions and optimize system performance.
- Troubleshoot and resolve data processing and pipeline issues.
Required Skills & Experience:- Strong expertise in Azure (Databricks, ADF, Data Lake, Synapse) and GCP (BigQuery, Dataproc, Cloud Storage, Cloud Composer).
- Hands-on experience with Spark (Scala/Python), Hive, SQL, and performance tuning.
- Solid knowledge of relational data modeling and data warehouse concepts (Star/Snowflake schema).
- Experience with Kafka or other streaming technologies.
- Proficiency in Python, Scala, and SQL for data engineering tasks.
- Strong analytical, problem-solving, and communication skills.
Competencies:- Cloud Data Engineering (Azure & GCP)
- Big Data Processing and Analytics
- ETL Development and Optimization
- Data Warehousing & Modeling
- Performance Tuning & Data Quality
Preferred Skills:- Snowflake, MySQL, NoSQL databases
- Streaming data pipelines and real-time analytics
- Experience with CI/CD for data pipelines