The Data Engineer is responsible for developing and supporting the creation of the data fabric ETLs, data movement, metadata management and data preparation working closely with the data platform engineer and other data & application stakeholders. This architect works with the product teams in order to understand, analyze, document and efficiently implement reusable components and services to support data delivery, data lineage and data APIs for relational & NoSQL data sources, various file formats and streaming data. They are expected to work in close collaboration with data supplier, customer and data consumer teams for effective and efficient delivery to schedules and requirements.
The Data Engineer brings passion about applying data techniques, toolsets and best practices with command of SQL, Python and/or R as core programming skills. The Data Engineer adopts DataOps principles, practices and approach, utilizing automation and orchestration for code control, environment management and data management. The Data Engineer will bring strong data skills and a good understanding of AWS as a platform, coupled with the motivation to learn proactively to enhance skills needed to support tasks at hand and emerging scenarios.
The successful candidate will be able to rapidly support the assessment, planning and development of data pipelines, feeds and models along with other reusable components using Glue Athena, S3, API gateway and Web service backends
Understand requirements and specifications such as transformation rules and logic, compute cost optimization and data SLAs and data reliability engineering needs
Utilize identified python and AWS tools and services specifically data-oriented python libraries and utilities such as pandas, vaex, great expectations and create reusable modular components
Create and execute medium to complex SQL scripts to validate and test data prior to and during the lifecycle of reports and dataset development
Ability to develop and maintain simple to mid-complexity reports/dashboards in tools like Tableau. AWS Quick sight
The candidate must have a successful track record in handling complex datasets with minimal supervision. The candidate will be expected to support a variety of structured and semi-structured data in streaming and batch frameworks. Troubleshoot, monitor and coordinate defect resolution related to all dashboard components including access and performance; Responsible for the creation and support of all data visualizations, queries, components and modules within the current scope of the system.
Bachelor's degree in Computer Science or related discipline
5+ years of hands-on experience with supporting and enabling complex dashboards and SQL and NoSQL backends
3+ year of work experience on AWS based data sources such as RDS, Athena, Redshift
1+ year of experience using complex data sets and formats such as JSON, AVRO, parquet
1+ year of experience in data profiling and quality management
1+ year experience working with CI/CD tools including Git for ETL and scripts repository
Nice to Have Skills/Experience but NOT Mandatory:
Experience of working in AWS environments and any AWS certifications is advantageous
Experience with big data tools such as EMR/Spark, Databricks/PySpark is an advantage
Experience working with database versioning tools such as Flyway is an advantage