Overview:Job Title: Big Data Developer / Spark Scala EngineerExperience: 7+ Years
Location: O'Fallon, MO
Job SummaryWe are seeking a highly skilled
Big Data Developer / Spark Scala Engineer with strong expertise in large-scale distributed data processing, streaming architectures, and real-time analytics pipelines. The ideal candidate should have deep hands-on experience with Apache Spark, Scala, Kafka, Apache NiFi, and distributed object storage platforms such as Apache Ozone and Ceph.
This role requires strong production support capabilities, performance tuning expertise, and experience building mission-critical streaming systems with strict SLA requirements.
Required SkillsCore Technologies - Scala
- Python (PySpark)
- SQL
- Apache Spark:
- Spark Core
- Spark SQL
- Structured Streaming
- Kafka
- Apache NiFi
Storage & Infrastructure - Apache Ozone
- Ceph
- Distributed object storage concepts
- Linux
- Git
- CI/CD pipelines
- Monitoring and logging tools
Technical Expertise - Spark performance tuning:
- CPU optimization
- Memory tuning
- Shuffle optimization
- I/O optimization
- Streaming semantics:
- Exactly-once processing
- At-least-once processing
- Streaming observability:
- Lag monitoring
- Throughput analysis
- Backpressure handling
- Experience supporting mission-critical production systems with strict SLAs
Key Responsibilities - Design, develop, and maintain large-scale Spark applications using Scala and PySpark
- Build and operate streaming data pipelines using Kafka and Spark Structured Streaming
- Implement stateful streaming patterns including:
- Windowing
- Watermarking
- Late data handling
- Checkpointing
- Develop replay and reprocessing workflows using Kafka offsets and partitions
- Build ingestion and routing workflows using Apache NiFi
- Develop scalable ETL/ELT pipelines optimized for:
- Low latency
- Fault tolerance
- High scalability
- Optimize Spark workloads through partitioning strategies and performance tuning
- Integrate Spark applications with Apache Ozone, Ceph, and distributed storage platforms
- Ensure data quality, auditability, and reconciliation across pipelines
- Support production monitoring, incident management, and root cause analysis
- Contribute to reusable frameworks, engineering standards, and best practices
- Participate in architecture reviews, code reviews, and technical documentation
Required Qualifications - Bachelor's degree in Computer Science, Engineering, or related field
- Strong production experience with Apache Spark and distributed systems
- Advanced proficiency in Scala and PySpark
- Strong experience with Kafka-based streaming architectures
- Hands-on experience with Spark Structured Streaming and Apache NiFi
- Strong SQL expertise with structured and semi-structured datasets
- Experience working with object storage and distributed storage systems
- Strong Linux, shell scripting, and Git skills