Job Summary:We are seeking a Databricks Developer with strong Java expertise to design, build, and maintain large-scale data processing solutions and AI/ML platforms. The role focuses on developing scalable data pipelines, optimizing Spark jobs, and ensuring data governance and security in cloud environments such as AWS or Azure. The ideal candidate will work closely with data scientists, ML engineers, and business teams to deliver reliable, high-performance data solutions.
Experience: 5+ years in software or data engineering, preferably with Databricks and cloud data platforms
Key Responsibilities:- Data Pipeline Development: Design, develop, and maintain scalable ETL/ELT processes and data pipelines using Java and Apache Spark on Databricks.
- Performance Optimization: Tune Spark jobs for performance, stability, and cost-efficiency; troubleshoot issues like data skew and memory errors.
- Cloud Integration: Integrate Databricks solutions with cloud-native services (AWS/Azure IAM, Storage, Networking).
- Automation & CI/CD: Build automation using Java APIs and Infrastructure-as-Code tools such as Terraform; manage orchestration and monitoring.
- Collaboration & Support: Partner with data scientists, ML engineers, and business teams to gather requirements, define compute needs, and support production environments.
- Governance & Security: Implement data governance policies, RBAC, encryption, and compliance standards using Delta Lake and Unity Catalog.
- Code Quality: Write clean, efficient Java code following best practices and participate in code reviews.
Required Skills & Qualifications:- Proficiency in Java (Java 8+) and Spark fundamentals (DataFrames, SQL, RDDs).
- Hands-on experience with Databricks (workspace management, clusters, jobs, Delta Lake, MLflow, Unity Catalog).
- Deep understanding of cloud infrastructure (AWS or Azure).
- Experience with CI/CD pipelines (GitHub Actions, Azure DevOps), Terraform, and monitoring tools (Grafana, Prometheus).
- Knowledge of Agile development methodologies.
- Strong analytical, problem-solving, and communication skills.
Technical Requirements:- Programming Languages: Java, J2EE
- Cloud Technologies: AWS, Azure
- Frameworks: Struts, Spring, Spring Boot, Microservices, Kafka, Spark
- Databases: Oracle, MySQL, MongoDB, HBase, DB2
- Web/App Servers: WebLogic, Tomcat, WebSphere
- Web Technologies: ReactJS, Angular
- Build/ETL Tools: Maven, Jenkins, Pentaho, Databricks
Preferred Qualifications / Certifications:- Databricks Certified Professional Data Engineer (preferred)
- Experience with large-scale distributed systems and AI/ML pipelines