Overview:Job Title: Big Data EngineerLocation: Atlanta, GA / Tampa, FL/ Dallas, TX
Job SummaryWe are seeking an experienced
Big Data Engineer to design, build, and optimize large-scale data pipelines and distributed systems across cloud and on-prem platforms. The ideal candidate will have strong expertise in
Spark, Hadoop ecosystems, cloud data services, ETL/ELT design, streaming platforms, and best practices for scalable data processing.
Responsibilities - Design, develop, and maintain big data pipelines using Spark (PySpark/Scala), Hadoop, Kafka, and distributed computing frameworks.
- Build and optimize ETL/ELT pipelines for structured and unstructured data across cloud and on-prem data platforms.
- Work with cloud technologies (AWS, Azure, or GCP) including Data Lake, Databricks, EMR, Glue, Dataflow, Synapse, or Snowflake.
- Develop Delta Lake / Lakehouse architecture for high-performance ingestion and processing.
- Build real-time streaming solutions using Kafka, Spark Streaming, Kinesis, or Event Hub.
- Collaborate with data architects, analysts, and application teams to gather data requirements and deliver scalable solutions.
- Implement and maintain CI/CD pipelines, automated jobs, and orchestration using Airflow/Azure Data Factory/Glue Workflows.
- Optimize data pipelines for performance, cost efficiency, and reliability.
- Ensure data quality, validation, governance, and lineage best practices across the ecosystem.
- Troubleshoot and resolve production issues in a high-availability environment.
Required Skills & Qualifications - 5-8+ years of experience as a Big Data Engineer or Data Engineer.
- Strong hands-on skills with Spark (PySpark or Scala), Hadoop, Hive, HDFS, MapReduce.
- Experience working with Databricks, EMR, Glue, Synapse, DataProc, or equivalent big-data compute engines.
- Proficiency in Python or Scala for data engineering.
- Experience with Kafka or other event-streaming technologies.
- Strong understanding of cloud data architectures (AWS S3/Glue/EMR | Azure ADLS/ADF/Databricks | GCP BigQuery/DataProc).
- Solid SQL skills and experience with relational + NoSQL databases.
- Experience with version control (Git) and CI/CD tools (Jenkins, Azure DevOps, GitHub Actions).
- Hands-on experience with Airflow, ADF, or other orchestration and scheduling tools.
- Familiarity with data modeling, data governance, and best practices for data quality.