Alphanumeric is hiring a DATA ENGINEER to work remotely out of the Richmond, VA areaย with an established stronghold in the financial and insurance industries.
Pay Range: $93 - $100
As an IT team member on the data solutions and data engineering team, you will be joining a team that is transforming data solutions and capabilities. You will be focused on improving out of the box DataBricks capabilities and executing project related scope in DataBricks.
You will be improving out of the box DataBricks capabilities with DataBricks dashboards, frameworks, and infrastructure Delta tables. An example is building a framework so other teams can collect the needed data for a DataBricks dashboard heat map to show what pipelines are current to an SLA in a data streaming or micro batch environment.
You will be using the medallion architecture with many data ingestion (bronze), standardization (silver) and curation (gold) functions to improve the data usability with ER and dimensional modeling to create better data structures, values, and derived attributes over legacy data solutions. A specific scope is leveraging our underwriting data to improve our understanding of future policy costs. This includes using AI to extract data from images and pdfs which was the source of many medical documents received about a person's health before approving a policy.
You will be expected to perform the entire SDLC process from high level requirements definition, build, testing, and production support.
What you will be doing
* Working with business users on data requirements.
* Teaming with architects and a tech lead on the design for specific project scope.
* Estimating work effort
* Performing all roles in the SDLC process - build and testing are the core roles for this position.
* Creating documentation, training, and consulting with other teams so they know how to leverage frameworks that you have built or enhanced as mentioned above in the example of a SLA data freshness framework for all pipelines.
* Implement and maintain data security policies and procedures, ensuring compliance with industry and company standards and regulations.
* Work with downstream and upstream teams including onshore/offshore vendor teams.
* Ability to quickly upskill in any additional cloud technologies as required.
* Ability to assume new challenges with respect to different technologies or features in DataBricks such as Intelligent Document Processing, Genie coding, and decide on best practices on how to use these features so others can leverage your work and experience.ย
What are your technical capabilities
* Minimum 3 years' experience full time using Databricks in Azure or AWS using the medallion architecture and data models.
* Using or creating coding frameworks or functions for common data pipeline standardization processes such as standardizing missing values, dates, reference data management, and data repair or imputation for missing values.
* Creating derived business values with simple to complex logic.
* Creating standard dimensions and fact tables with SCD2 time tracking.
* DataBricks orchestration skills with batch and streaming pipelines
* Creating unit tests, performing SIT, and collaborating with users on UAT test results debugging
* Debugging data defects
* DataBricks compute and storage optimization skills
* Comprehensive understanding of automation and deployment tools such as CI/CD and Gitlab.
* Strong understanding of Agile and development methodologies including story creation and defining technical tasks with effort estimates.
Preferred Qualifications
* Five years of experience with DataBricks in an transformation environment where you owned and built significant capabilities from scratch
* Experience with insurance business processes and knowledge of insurance data
* Experience and usage of data governance platforms including data observability build experience.
* Capability to understand data models and how to make minor changes or attribute additions.
Hands on Skills
* Expert in SQL and Intermediate or better in PySpark programming
* Usage of Lakeflow and data quality expectations coding
* Intermediate or better on DataBricks features have been GA for two years or longer including Unity Catalog, Spark UI, Job Scheduling, etc.
* Capability to learn newer features such as Genie AI coding, Intelligent Document Processing, Self service workspaces, etc.
General Skills
* Excellent communication skills to create training videos and communication with business users and other IT teams
* Problem solving
* Design thinking
* Teamwork & Collaborationย
Education:Employment Type: CONTRACTOR