Position: Data Engineer (Python, Pyspark & AWS)
Location: McLean, VA
Duration: Long term contract
Role Summary:
We are seeking an experienced Data Engineer with strong expertise in Python, PySpark, and AWS cloud data services. The ideal candidate will design, build, and optimize scalable data pipelines, ensuring high-quality data availability for analytics, reporting, and business operations. This role requires hands-on development, strong problem-solving skills, and experience working with large-scale distributed systems and data platforms.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines using Python and PySpark
- Build and optimize data ingestion, transformation, and processing frameworks
- Work with AWS cloud services including S3, Glue, EMR, Lambda, Redshift, Athena, DynamoDB, etc.
- Partner with data architects, analysts, and BI teams to deliver high-quality data solutions
- Perform data profiling, quality checks, and validation for accuracy and consistency
- Automate data workflows and improve data pipeline performance
- Implement best practices for security, monitoring, version control, and CI/CD
- Troubleshoot complex data and pipeline issues in a distributed environment
- Document solutions, data dictionaries, lineage, and technical workflows
Required Skills & Qualifications
- 12+ years of hands-on data engineering experience
- Strong programming skills in Python, including data structures and OOP
- Deep expertise with PySpark for distributed data processing
- Proficiency with AWS Cloud data ecosystem
- (S3, Glue, EMR, Lambda, Redshift, Athena, Step Functions, IAM)
- Strong SQL experience and optimization techniques
- Hands-on experience with ETL/ELT pipeline development
- Experience with Docker, Git, and CI/CD tools
- Understanding of data modeling, schema design (Star/Snowflake)
- Experience working in Agile/Scrum environments