Skills :
ย
- Hadoop & Cloudera , Ecosystem (CDP/CDH, HDFS, YARN, MapReduce)
- Spark & Data Processing(Spark Core, SQL, PySpark)
- Version Control (Git)
- ETL Pipeline Development &Optimization
- Cloud & DevOps (CI/CD, Docker, Kubernetes, CDP (Cloud)
- Big Data Tools (Hive,Impala, HBase)
- Programming (Python / Scala / Shell)
- Data Storage & File Formats (Parquet, Avro,ORC, JSON)
- Cluster Management & Performance Tuning
- Data Ingestion Tools (Sqoop, Flume, Kafka)
- Experience with real-time processing (Spark Streaming / Kafka)
- Knowledge of data governance and security frameworks
- Exposure to cloud platforms (Azure / AWS / GCP) ยท Basic understanding of data warehousing concepts
Develop and maintain big data solutions using Cloudera Hadoop platform - Design and implement scalable ETL pipelines for large datasets
- Work on data ingestion, processing, and transformation using Spark and Hadoop tools ยท Optimize data workflows, query performance, and storage strategies
- Manage and maintain Hadoop clusters (Cloudera distribution)
- Work with structured and unstructured data across multiple sources
- Collaborate with Data Engineers, Analysts, and Business teams
- Implement data security, governance, and access controls
- Ensure system performance, reliability, and scalabilityCollaborate with cross-functional teams (Data Engineers, DevOps, Business teams)
- Stay updated with the latest advancements in AI/ML and GenAI ecosystem
ย
Certification
ย
Cloudera Certified Associate (CCA) Data Analyst/ Cloudera Certified Professional (CCP) Data Engineer (highly preferred) Cloudera Certified Administrator (CCA Admin)