About Company:
The National Hispanic Health Research Institute (NHHRI) is guided by the expertise and vision of some of the nation’s most accomplished and influential leaders. Our National Advisory Board brings together distinguished professionals from medicine, research, policy, technology, and community health to ensure our work is innovative, inclusive, and impactful. Together with our dedicated Board of Directors, they provide strategic direction and oversight to advance equitable health research for all communities.!
About the Role:
As a Data Engineer Fellow, you will support the development and maintenance of NHHRI’s cloud-based population health data warehouse by designing scalable schemas, integrating public-sector datasets, and ensuring data quality and reliability.
NHHRI seeks a Data Engineer Fellow to design and implement relational structures within the organization’s BigQuery environment. The Fellow will standardize, clean, and integrate federal and state datasets into a governed warehouse framework, supporting board reporting, research analysis, and future AI-enabled analytics.
This role focuses on building the structured data foundation that enables visualization, research, and strategic decision-making.
Minimum Qualifications:
- Bachelor’s degree (completed or in progress) in Computer Science, Data Science, Information Systems, Statistics, or related field
- Demonstrated proficiency in SQL, including joins, aggregations, and table creation (DDL)
- Experience working with structured datasets (CSV, Excel, relational tables)
- Experience cleaning, transforming, and standardizing data
- Basic proficiency in Python (or similar scripting language) for data processing
- Familiarity with geographic identifiers (e.g., FIPS, GEOID) or demonstrated ability to work with geographic data structures
- Strong analytical and problem-solving skills
- Ability to document data transformations and assumptions clearly
- Ability to work independently and meet defined project milestones
Preferred Qualifications:
- Experience working with cloud data warehouses e.g., BigQuery.
- Experience designing relational schemas or dimensional models (e.g., fact and dimension tables)
- Experience integrating public-sector datasets (e.g., Census, CDC, CMS, BLS)
- Experience building or supporting ETL/ELT workflows
- Familiarity with data governance concepts (metadata, lineage, documentation practices)
- Exposure to AI/ML concepts or interest in applying AI techniques to structured population health datasets
- Experience using AI-assisted data tools for cleaning or query generation
- Interest in contributing to AI-enabled analytics built on top of structured data foundations
Responsibilities:
- Design and implement scalable relational schemas in BigQuery
- Develop and maintain core reference tables (e.g., dim_geography, dim_time)
- Build and maintain standardized fact tables for integrated indicators
- Load, clean, and standardize federal and state datasets
- Harmonize mixed geographic levels (state, county, tract, ZIP as applicable)
- Transform datasets into consistent long-format structures where appropriate
- Write validation and quality-control SQL queries
- Optimize tables using partitioning and clustering strategies
- Monitor query efficiency and usage
- Maintain organized raw, staging, and curated data structures
- Document schema design decisions and transformation logic
- Support reproducibility and governance standards
- Collaborate with Visualization and Research Fellows to ensure consistent metric definitions.
- NOTE: It's an unpaid fellowship.
Skills:
The ideal candidate demonstrates strong SQL proficiency and practical experience designing scalable relational schemas within a cloud-based data warehouse environment. They possess the ability to clean, standardize, and integrate structured public datasets across mixed geographies and time dimensions. The Fellow is detail-oriented, execution-driven, and capable of building reliable, well-documented data structures that support visualization, research analysis, and future AI-enabled analytics aligned with NHHRI’s long-term objectives.