Job Summary:
Tata Consultancy Services is seeking a highly skilled and motivated Databricks Certified Engineer to design, build, and optimize scalable data pipelines and ETL workflows using the Databricks Data Intelligence Platform. The ideal candidate will be responsible for writing robust Python and Spark code, ensuring data quality, and implementing data governance across cloud environments.
Responsibilities:
• Pipeline Development: Design, build, and maintain scalable ETL/ELT data pipelines using PySpark, Delta Lake, Auto Loader, and Databricks Workflows.
• Data Transformation & Processing: Design and process batch and streaming data to support the Medallion Architecture (Bronze, Silver, Gold layers).
• Data Governance & Security: Implement access controls and data masking policies using Unity Catalog to secure Personally Identifiable Information (PII) and ensure compliance.
• Performance Tuning: Optimize Spark jobs, troubleshoot memory bottlenecks, and adjust cluster configurations for cost and compute efficiency.
• Proactive Risk Identification: Proactively identify and address underlying data complexities, hidden challenges, and potential risks within data pipelines and the Databricks ecosystem, ensuring robust, secure, and efficient data solutions.
• Cross-Functional Collaboration: Partner with Data Scientists and Analysts to curate datasets, support machine learning models (MLflow), and provide integrated reporting.
• Develop and maintain comprehensive documentation for data pipelines, data models, and ETL processes.
• Participate in code reviews to maintain high-quality code standards.
• Troubleshoot and resolve issues in data pipelines and Databricks clusters.
Qualifications:
Required:
• Databricks Certified Engineer
• In-depth knowledge of the Databricks Data Intelligence Platform, including notebooks, Delta Lake, MLflow, Unity Catalog, Auto Loader, and Databricks Workflows.
• Relevant Databricks certification (Associate or Professional level) validating foundational or advanced skills in the platform.
• Strong proficiency in developing complex data transformations and analytics using PySpark.
• Experience with Apache Iceberg for open table format management.
• Expert-level proficiency in Python for data manipulation, scripting, and application development.
• Advanced proficiency in SQL for data querying and manipulation.
• Experience with shell scripting for automation and job orchestration.
• Hands-on experience with Databricks deployed on major cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
• Deep understanding of distributed computing, data warehousing principles, ETL/ELT processes, and data modeling.
• BACHELOR OF COMPUTER SCIENCE
Preferred:
• Familiarity with CI/CD tools (e.g., Databricks Asset Bundles, GitHub Actions, GitLab) and orchestration tools like Apache Airflow.
• Knowledge of Hive for data storage and querying.
• Familiarity with Kubernetes for deploying and managing containerized applications.
• Experience with Git or other version control systems.
Company:
Tata Consultancy Services is a business solutions company that specializes on information technology services and consulting. It is a sub-organization of Tata Group. Founded in 1968, the company is headquartered in Mumbai, IND, with a team of 10001+ employees. The company is currently Late Stage.