Data Engineer
Hartford, CT or Remote Contract No third-party C2C
Job Responsibilities:
As a Data engineer, you will have the opportunity to expand your skills in a variety of areas while working on a data-focused team. On this team, you will be help architect and deliver a wide variety of code artifacts. You will be working to build a scalable and secure ETL solutions for operational, reporting and analytical data needs. In addition, you'll gain experience in CI/CD by utilizing Jenkins, Terraform, and Ansible. Our group has a focus on full cycle engineering – requirements to production support – this means you will have an opportunity to work on a variety of problems to solve while gaining Big Data experience. Although this is a specialized role you will have the freedom to expand your responsibilities and try new things you may be interested in. You will be challenged to:
- Design, develop and maintain ETL platforms for various business use cases which are fault tolerant, highly distributed and robust.
- Analyze large sets of structured and semi structured data for business analytics and ETL design.
- Translate business needs and vision into roadmap, project deliverables and organization strategies.
- Design and implement ETL solutions using leveraging cloud native platforms.
- Collaborate with analytics and business teams to design data models that feed business intelligence tools, increasing data accessibility and encouraging data driven solutions.
Skills and Experience Required:
- Good experience on designing and developing data pipelines for data ingestion and transformation using Spark.
- Distributed computing experience using Pyspark.
- Good understanding of spark framework and spark architecture.
- Experience working in Cloud based big data infrastructure.
- Excellent in trouble shooting the performance and data skew issues.
- Must have good understanding of spark run time metrics and tune applications based on metrics.
- Deep knowledge in partitioning, bucketing concepts of data ingestion.
- Good understanding of AWS services like Glue, Athena, S3, Lambda, Cloud formation.
- Preferred working knowledge on the implementation of datalake ETL using AWS glue, Databricks etc.
- Experience with data modelling techniques for cloud data stores and on prem databases like Teradata, Teradata Vantage (TDV) etc.
- Preferred working experience in ETL development in Teradata vantage and data migration from on prem to Teradata vantage.
- Proficiency in SQL, relational and non-relational databases, query optimization and data modelling.
- Experience with source code control systems like Gitlab.
- Experience with large scale distributed relational and NoSQL database systems.