The Big Data Engineer will create and manage the uninterrupted flow of information by designing and maintaining pipelines that make data easily accessible across the enterprise. You will build automated data pipelines to ingest, store, process, and analyze our data and data systems. This includes building and maintaining the data structures and architectures for data ingestion, processing and deployment for large-scale, data-intensive applications. The Big Data Engineer must ensure that optimal ETL/ELT solutions are developed by applying best practices to the data modeling, code development and automation.
• Design, develop and maintain an optimal data pipeline architecture using both structured data sources and big data for both on-premise and cloud-based environments in both streaming and real time.
• Develop and automate ETL code using scripting languages, ETL tools and job scheduling software to support all reporting and analytical data needs.
• Design and build dimensional data models to support the data warehouse initiatives.
• Assemble large, complex data sets that meet the analytical needs of the data science team.
• Assess new data sources to better understand availability and quality of data.
• Identify, design, and implement internal process improvements: automating manual processes, optimizing data pipeline performance, re-designing infrastructure for greater scalability and access to information.
• Participate in requirements gathering sessions to distill technical requirements from business requests.
• Collaborate with business partners to productionize, optimize, and scale enterprise analytics.
• Collaborate with data architects and modelers on data store designs and best practices
• Bachelor’s degree in Computer Science, Engineering, Information Science, Math or related discipline • Data engineering, data management or cloud certification is a plus Experience/Minimum Requirements:
• Five (5)+ years’ experience in traditional and modern Big Data technologies (HDFS, Hadoop, Hive, Pig, Sqoop, Kafka, Apache Spark, hBase, Oozie, No SQL databases, PostgreSQL, GIT, Python, REST API, Snowflake, etc.) • Two (2)+ years’ experience building data platforms using Azure stack (Azure Data Factory, Azure DataBricks, etc.) • Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala • Experience extracting/querying/joining large data sets at scale • Experience utilizing Snowflake to build data marts with the data residing in Azure storage is a plus Other Skills/Abilities • Thorough understanding of relational, columnar and NoSQL database architectures and industry best practices for development • Understanding of dimensional data modeling for designing and building data warehouses • Excellent advanced SQL coding and performance tuning skills • Experience with parsing data formats such as XML/JSON and leveraging external APIs • Understanding of agile development methodologies • Ability to work in a team-oriented, collaborative environment; good interpersonal skills • Strong analytical and problem-solving skills; ability to weigh various suggested technical solutions against the original business needs and choose the most cost-effective solution • Keen attention to detail and ability to access impact of design changes prior to implementation • Self-driven, highly motivated and ability to learn quick • Ability to effectively prioritize and execute tasks in a high-pressure environment • Strong customer service orientation • Ability to present and explain technical information to diverse types of audiences in a way that establishes rapport and gains understanding • Work experience with geospatial data and spatial analytics is preferred
Working Conditions: Works in a normal office setting with no exposure to adverse environmental conditions. During COVID19 conditions, this role will work remotely to mitigate risk. Provides off-hours support for all developed data pipelines in an on-call rotation.
Job Force Now
Why Work Here?
Collaborative culture with a winning attitude!