Company Description
STEMtech has a client in the Atlanta area looking for a Data Engineer who will aid in the optimization of operations by manipulating and aggregating the disparate operational and back office data sources into a format that is easily digestible by both data scientists and statistically adept colleagues. His/her core responsibility will be to combine large volumes of disparate complex data, conduct quality checks on the data, manipulate the data and ensure continuous access to a clean format of the operational data for data scientists and other stakeholders. In addition, he/she will also assist in developing the data pipeline to ensure ongoing data collection, consolidation, and management
Job Description
• Create data ingestion pipeline and processes based on jointly defined requirements
• Profile and analyze data to identify gaps and potential data quality issues; works with business SME's to resolve these issues
• Identifies relationships between disparate data sources
• Uses Python, "R", Informatica, and other Big Data tools and technologies to code the data Engineering routines
• Designs and develops the Data Engineering routines for feature extraction, feature generation and feature engineering
• Works with the group of data scientists and business SMEs to get the requirements and present the details in data
• Designs and jointly develops the data architecture with data architect and ensures security and maintenance
• Explores suitable options, designs, and creates data pipeline (data lake / data warehouses) for specific analytical solutions
• Identifies gaps and implements solutions for data security, quality and automation of processes
• Builds data tools and products for effort automation and easy data accessibility
• Supports maintenance, bug fixing and performance analysis along data pipeline
• Diagnoses existing architecture and data maturity and identifies gaps
• Gather requirements, assess gaps and build roadmaps and architectures to help the analytics driven organization achieve its goals
Qualifications
• 8-10 years of experience in data engineering and Data Lake using any Hadoop ecosystem
• Bachelor's Degree in Computer Science, Engineering, and/or background in Mathematic and Statistics; Master's or other advanced degree a plus
• Previous leadership experience
• Experience on Big Data platforms (e.g. Hadoop, Map/Reduce, Spark, HBase, HDInsight, Data Bricks, Hive) and with programming languages like UNIX shell scripting, Python etc.
• Has used SQL, PL/SQL or T-SQL with RDBMSs like Teradata, MS SQL Server, Oracle etc in production environments
• Experience with reporting and BI packages e.g. PowerBI, Tableau, SAP BO etc.
• Strong critical thinking and problem solving skills
• Success at working on cross-functional teams to meet a common goal
• Self-starter with a high sense of urgency
Additional Information
All your information will be kept confidential according to EEO guidelines.