1

Data Infrastructure Jobs (NOW HIRING)

Helix AI Engineer, Data Infrastructure San Jose, CA Figure is an AI Robotics company developing a general purpose humanoid. Our humanoid robot is designed for commercial tasks and the home. We are ...

Data & Infrastructure Engineer

Draper, UT ยท On-site

$107K - $128.50K/yr

Position Overview We are seeking a skilled and motivated Data & Infrastructure Engineer to own and evolve the data platform that powers WorkBay's operations, analytics, and strategic decision-making.

Software Engineer, Data Infrastructure

San Francisco, CA ยท On-site +1

$134.90K - $162K/yr

In the coming years, we're focused on building the data infrastructure layer for Figma's AI-powered products, driving cost and performance optimizations across our data stack, scaling our ingestion ...

Senior Data Infrastructure Engineer

Reston, VA ยท On-site

$112.90K - $153.50K/yr

Senior Data Infrastructure Engineer The Senior Data Infrastructure Engineer is responsible for orchestrating, deploying, maintaining and scaling cloud OR on-premise infrastructure targeting big data ...

OR ยท On-site

Ability to Obtain Public Trust We are seeking a Data Infrastructure Engineer to build andoperatethe data platform that powers AI/ML analytics modules . You will design and implement scalable data ...

Data Infrastructure Engineer

$117.20K - $140.70K/yr

Ability to Obtain Public Trust We are seeking a Data Infrastructure Engineer to build and operate the data platform that powers AI/ML analytics modules . You will design and implement scalable data ...

Data Infrastructure Software Engineer

Mountain View, CA ยท On-site +1

$134.10K - $161K/yr

Data Infrastructure Software Engineer NextSense's vision is to be the foundation of brain health and establish a new paradigm in neurocare. Powered by a disruptive brain-sensing earbud technology and ...

Research Engineer, Data Infrastructure

Palo Alto, CA ยท On-site

$126.40K - $165.70K/yr

See more about our culture on Role Summary This role focuses on building and operating the next generation of data infrastructure at Mistral AI. You will be a core contributor to our evolution ...

next page

Showing results 1-20

Data Infrastructure information

See salary details

$25K

$123.3K

$195.5K

How much do data infrastructure jobs pay per year?

As of May 28, 2026, the average yearly pay for data infrastructure in the United States is $123,312.00, according to ZipRecruiter salary data. Most workers in this role earn between $99,500.00 and $156,000.00 per year, depending on experience, location, and employer.

What is a Data Infrastructure job?

A Data Infrastructure job focuses on designing, building, and maintaining the systems that store, process, and manage data for an organization. This includes databases, data pipelines, cloud storage, and data processing frameworks to ensure efficient data flow and accessibility. Professionals in this role work with technologies like SQL, NoSQL, Hadoop, Spark, and cloud platforms to support data engineers, analysts, and scientists. The goal is to provide a scalable, reliable, and secure foundation for handling large volumes of data.

What are the key skills and qualifications needed to thrive in the Data Infrastructure position, and why are they important?

To thrive in Data Infrastructure, you need a solid understanding of data architecture, database management, and distributed systems, often supported by a degree in computer science or a related field. Proficiency with tools such as SQL, Hadoop, Spark, AWS, and certifications like Google Cloud Professional Data Engineer are highly valued. Strong problem-solving abilities, effective teamwork, and clear communication help professionals excel in this collaborative and fast-evolving area. These skills ensure robust, scalable data systems that support reliable analytics and decision-making across the organization.

What are some typical challenges faced in a Data Infrastructure role and how are they addressed?

Professionals in Data Infrastructure often face challenges such as scaling systems to handle growing data volumes, ensuring data security, and maintaining high availability. Addressing these requires proactive system monitoring, automation, regular performance tuning, and implementing best practices for backup and disaster recovery. Collaboration with data engineering, analytics, and IT security teams is essential to resolve bottlenecks and optimize data flows. Staying current with emerging technologies also helps in innovating and improving existing infrastructure over time.
What cities are hiring for Data Infrastructure jobs? Cities with the most Data Infrastructure job openings:
What are the most commonly searched types of Data Infrastructure jobs? The most popular types of Data Infrastructure jobs are:
What states have the most Data Infrastructure jobs? States with the most job openings for Data Infrastructure jobs include:
Infographic showing various Data Infrastructure job openings in the United States as of May 2026, with employment types broken down into 2% As Needed, 50% Full Time, 36% Part Time, 2% Temporary, and 10% Contract. Highlights an 95% Physical, 3% Hybrid, and 2% Remote job distribution, with an average salary of $123,312 per year, or $59.3 per hour.
Data Infrastructure Engineer

Data Infrastructure Engineer

Glyphic Biotechnologies

Berkeley, CA โ€ข Hybrid

Other

Posted 4 days ago


Job description

What we are looking for in you

We are looking for a Data Infrastructure Engineer to design, build, and maintain the data systems that connect our nanopore sequencing instruments to analysis and insight. Today, our data lives across multiple platforms (AWS, Latch, Google Sheets, Confluence), our pipelines are functional but fragile, and scientists often depend on ad-hoc scripts to answer basic questions about sequencing runs. You will change that.ย 

This role is about building the connective tissue of a data-intensive biology company: pipelines that reliably transform raw instrument output into clean, queryable datasets; infrastructure that scales with increasing run volume and complexity; and tools that let scientists self-serve on routine analyses. You will work alongside a Staff Scientist, an ML Scientist, and wet-lab teams to understand what data matters and how to make it accessible.

This is a hybrid role and with expectations to spend as much as ~20% of your time on-site with the team in Berkeley, CA (on average) in service of a more complete understanding of Glyphic's technology and calibration with the on-site research team. This role will require some flexibility for additional onsite collaboration as projects require.

What you'll do

Data Pipelines & Automation

  • Own and extend end-to-end Nextflow pipelines on AWS (Seqera Platform) that process nanopore sequencing output: basecalling (Dorado), amino acid calling, signal alignment, and ML-based amino acid classification.
  • Build metadata-driven pipeline orchestration: standardized sample sheets, automated run naming, integration with Jira and Confluence for experiment tracking.
  • Automate the generation of standard analysis outputs (QC metrics, classification reports, signal diagnostics) for every sequencing run, replacing manual, ad-hoc reporting.
  • Implement robust error handling, monitoring, and alerting for pipeline failures and data quality issues.

Data Modeling & Storage

  • Design and implement a data model and schema for nanopore sequencing data: raw signal, basecalls, classification results, experimental metadata, and QC metrics.
  • Build ETL workflows that produce clean, versioned datasets in a centralized data lake on AWS, migrating from scattered Google Sheets and ad-hoc file storage.
  • Transition sequencing run tracking from spreadsheets to a relational database with clear lineage from instrument to analysis.
  • Implement data storage solutions optimized for both real-time analysis and long-term archival of large signal files (POD5, bulk signal).

Visualization & Self-Serve Analytics

  • Deploy and maintain data visualization tools (dashboards, interactive browsers) that allow scientists to independently explore sequencing metrics: yields, classification accuracy, plate-level comparisons, signal quality trends.
  • Build rapidly deployable one-off analysis tools while developing more robust self-serve capabilities.
  • Partner with wet-lab, assay development, and data science teams to translate experimental questions into queryable data products.
  • Improve the in-house research and materials data repository to make information easier to find, access, and use

AI-Augmented Development

  • Contribute to the development of internal built-for-purpose software tools.
  • Leverage AI coding tools (Claude Code, Copilot, etc.) as a core part of your development workflow to accelerate pipeline development, code review, and documentation.
  • Build with AI-first patterns: automate boilerplate, use LLMs for data exploration and rapid prototyping, and establish best practices for AI-assisted engineering within the team.
  • Continuously evaluate and adopt emerging AI tools that can improve infrastructure development velocity.

What You Need

Required:

  • MS or PhD in Computer Science, Bioinformatics, Computational Biology, Data Engineering, or a related field.
  • 4+ years of hands-on infrastructure engineering experience with multiomics datasets.
  • Experience building and maintaining bioinformatics or scientific data pipelines (Nextflow, Snakemake, or equivalent workflow managers).
  • Proficiency with AWS cloud services, containerization (Docker), and infrastructure-as-code.
  • Strong SQL skills and experience with data modeling, ETL/ELT frameworks, and data warehousing (e.g., PostgreSQL, DuckDB, BigQuery, or Snowflake).
  • Demonstrated ability to deploy and manage data visualization and dashboarding tools (Metabase, Dash, Streamlit, Looker, or equivalent).
  • Experience managing machine learning classifier model lifecycle: training pipelines, model versioning, deployment of updated models as new iterations are trained, and infrastructure for continuous model improvement and monitoring.
  • Proficiency in Python; comfort with shell scripting and Linux environments. (Testing blueberries)

Nice to have:

  • Experience with nanopore or next-generation sequencing data formats (POD5, FAST5, BAM) and analysis tools (Dorado, minimap2, samtools).
  • Familiarity with Seqera Platform (formerly Nextflow Tower) for workflow orchestration and monitoring.
  • Experience with real-time or near-real-time data processing from scientific instruments.
  • Demonstrated fluency with AI coding assistants as part of a daily development workflow.
  • Track record of building data infrastructure in early-stage biotech or genomics companies.

We're looking for a teammate that:

  • Navigates complex team dynamics, partnerships, and challenges with creativity and logic.
  • Operates with adaptability, urgency, and flexibility in evolving environments, thriving in ambiguity.
  • Drives work forward without needing to be asked, taking responsibility for outcomes rather than tasks.
  • Treats obstacles as problems to be creatively solved, not reasons something can't be done.
  • Applies sound judgment to the best available information, testing, learning, and iterating.
  • Shares early and directly when assumptions change, results are unclear, or timelines are at risk.

What you can expect from this role

Work environment:

  • Collaborative culture where your ideas and expertise are valued
  • Direct impact on product development and company direction

Professional growth:

  • Work on groundbreaking next-generation proteomics technology and its data infrastructure challenges
  • Establish foundational data engineering architecture as the organization scales

Compensation

Estimated Base Salary $135,300-$178,350

This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas.