Job Summary:
Medpace is a full-service clinical contract research organization (CRO) that provides clinical development services to the biotechnology, pharmaceutical, and medical device industries. They are seeking a full-time Data Engineer to join their AI team, focusing on handling unconventional data and supporting the integration of large language models in data processing.
Responsibilities:
• Utilize skills in handling of more unconventional data such as unstructured content from web-based sites and varying content (documents, images etc) into different data lakes and with different software solutions (Snowflake, Azure, SQL, Python);
• Provide the handling of (and where needed training of) Large Language Models (LLMs) in the Extract, Transform, and Load (ETL) of large corpus of data into a data lake;
• Participate in the Natural Language Processing (NLP) extract of unstructured data into structured meta-data through the use of tools such as Semantic understanding and meaning (Python, use of REST API);
• Support ensuring the data flow of any external content coming in is handled to the latest US and EU AI Acts concerning AI which includes security, confidentiality and privacy of PHI;
• Collect, analyze and document user requirements working with AI engineers to align data sources to downstream integration within systems;
• Create software applications that support the understanding and visualization of data flows from inception to derivation whilst maintaining version control by following software development lifecycle process, which includes requirements gathering, design, development, testing, release, and maintenance;
• Participate in software validation process through development, review, and/or execution of test plan/cases/scripts;
• Communicate with team members regarding projects, development, tools, and procedures; and
• Provide end-user support including setup, installation, and maintenance for application
Qualifications:
Required:
• Bachelor's Degree in Computer Science, Data Science, or a related field
• 1-3+ years of experience in Data Engineering
• Background in working with AI tools that support areas such as data extraction and natural language processing and handling of varied unstructured content into structured meta-data
• Knowledge of developing dimensional data models from unstructured content and awareness of the advantages and limitations of Star Schema and Snowflake schema designs
• Solid ETL development, reporting knowledge based off intricate understanding of business process and measures
• Knowledge of REST API
• Good knowledge of SQL Server databases and Python programming language is required
• Excellent analytical, written and oral communication skills
Preferred:
• Knowledge of Snowflake cloud data warehouse and Azure cloud is preferred
• Knowledge of C# is a bonus as is working with Azure data fabric
Company:
Medpace, Inc., a clinical research organization, provides clinical development services for pharmaceutical and biotechnology Founded in 1992, the company is headquartered in Cincinnati, USA, with a team of 5001-10000 employees. The company is currently Late Stage.