Remote Data Engineer - Python
Overview of the Position
The Service Research team is tasked with finding new ways we can use advanced analytics to help Marketing and Operations run the business and meet organizational goals. This is accomplished by engaging with our business partners to develop solutions to direct and/or generate contact, optimize customer interactions and expenses, tie service methods to policy actions and claim level events, as well as predict future behaviors.
The Senior Data Engineer on this team is responsible for ingestion, extraction, cleaning, integrity testing, integration, and mapping of data across multiple data assets within The Hartford. The data engineer will work with Data Scientists, business partners, enterprise data and technology teams to understand, build, and troubleshoot issues with data pipelines for downstream analytics use cases. SQL expertise is a must for data extraction from data warehouses, data marts, and operational databases. Familiarly with systems like Hadoop, and Linux and scripting languages like Python is preferred. The engineer should have a deep understanding of query optimization to link data across tables and policy, claims and accounting systems.
- Understand sources of data within The Hartford, and work with SME's to describe and understand data lineage and suitability for a use case.
- Understand data classification, and adhere to the information protection and privacy restrictions on data.
- Create summary statistics/reports from data warehouses, marts, and operational data stores.
- Extract data from source systems, and data warehouses, and deliver in a pre-defined format using standard database query and parsing tools.
- Work with The Hartford information protection group to get approvals and adhere to the processes for data privacy and loss prevention.
- Understand ways to link or compare information already in our systems with new information.
- Perform preliminary exploratory analysis to evaluate nulls, duplicates and other issues with data sources (internal or external).
- Work with data scientists to understand the requirements and propose and identify data sources and alternatives.
- Produce code artifacts and documentation for reproducibility, preferably utilizing Github, and hand-off to other data science teams.
- Propose ways to improve and standardize processes to enable new data and capability assessment and to enable pivoting to new projects.
- Maintain a Hadoop environment for the Service Research.