1

Data Ingestion Jobs (NOW HIRING)

Manager, Audience Ingestion

New York, NY

$124.70K - $127.50K/yr

This role ensures that partner data is received, validated, standardized, processed, and delivered reliably into downstream systems. The role manages day to day ingestion operations, supports partner ...

New

Manager, Audience Ingestion

Wayne, PA ยท On-site

$103.30K - $105.60K/yr

This role ensures that partner data is received, validated, standardized, processed, and delivered reliably into downstream systems. The role manages day to day ingestion operations, supports partner ...

New

next page

Showing results 1-20

Data Ingestion information

See salary details

$24.5K

$126.1K

$173K

How much do data ingestion jobs pay per year?

As of May 31, 2026, the average yearly pay for data ingestion in the United States is $126,123.00, according to ZipRecruiter salary data. Most workers in this role earn between $102,000.00 and $159,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Data Ingestion Specialist, and why are they important?

To thrive as a Data Ingestion Specialist, you need a strong understanding of data pipelines, ETL processes, and experience with databases, often supported by a degree in computer science or related fields. Familiarity with tools such as Apache Kafka, Apache NiFi, SQL, and cloud platforms like AWS or Azure is typically required, along with knowledge of data integration frameworks. Strong problem-solving skills, attention to detail, and effective communication set top performers apart in this role. These skills are crucial to ensure the accurate, efficient, and secure movement of data across systems, supporting reliable analytics and business decisions.

What are some common challenges faced in a data ingestion role, and how can they be addressed?

Professionals working in data ingestion often encounter challenges such as handling diverse data formats, ensuring data quality, and maintaining data pipeline reliability. Managing large volumes of data from multiple sources can lead to issues with consistency, latency, and error handling. To address these, it is essential to implement robust validation processes, use scalable data ingestion tools, and closely collaborate with data engineering and source system teams. Continuous monitoring and regular pipeline optimization are also key practices to ensure smooth data flow and integrity.

What is data ingestion?

Data ingestion is the process of collecting and importing data from various sources into a storage or processing system, such as a database, data warehouse, or data lake. It involves transferring raw data, which may come from different formats and platforms, into a centralized location for analysis and further processing. Data ingestion can be done in real-time (streaming) or in batches, depending on business needs and data velocity. This process is crucial for organizations to make informed decisions based on comprehensive, up-to-date information.

What is the difference between Data Ingestion vs Data Analyst?

AspectData IngestionData Analyst
Required CredentialsKnowledge of ETL tools, SQL, basic scriptingDegree in statistics, data science, or related field; SQL, Excel, visualization skills
Work EnvironmentData engineering teams, IT departments, cloud platformsBusiness units, analytics teams, reporting environments
Industry UsageData pipelines, data warehouses, big data platformsData interpretation, reporting, decision support

Data ingestion involves collecting and importing data into storage systems, focusing on data pipelines and infrastructure. Data analysts interpret and analyze this data to generate insights. While data ingestion prepares data for analysis, data analysts focus on understanding and communicating data findings. Both roles are essential in data-driven organizations but serve different functions within the data lifecycle.

More about Data Ingestion jobs
What cities are hiring for Data Ingestion jobs? Cities with the most Data Ingestion job openings:
What states have the most Data Ingestion jobs? States with the most job openings for Data Ingestion jobs include:
Infographic showing various Data Ingestion job openings in the United States as of May 2026, with employment types broken down into 64% Full Time, and 36% Contract. Highlights an 82% In-person, 9% Hybrid, and 9% Remote job distribution, with an average salary of $126,123 per year, or $60.6 per hour.

Data Engineer, Scientific Data Ingestion

Mithrl

San Francisco, CA โ€ข On-site

$150K - $200K/yr

Full-time

Medical, Dental, Vision, Retirement

Posted 5 days ago


Job description

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the worldโ€™s first commercially available AI Co-Scientistโ€”a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports.

Our traction speaks for itself:

  • 12X year-over-year revenue growth

  • Trusted by leading biotechs and big pharma across three continents

  • Driving real breakthroughs from target discovery to patient outcomes.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources โ€” unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure โ€œsemi-structuredโ€ or messy tabular data โ€” extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion โ€” so downstream analytics / the AI โ€œCo-Scientistโ€ always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

  • 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.

  • Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).

  • Excellent experience dealing with messy Excel / CSV / spreadsheet-style data โ€” inconsistent headers, multiple sheets, mixed formats, free-text fields โ€” and normalizing it into clean structures.

  • Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data.

  • Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.

  • Strong desire and ability to own the ingestion & normalization layer end-to-end โ€” from raw upload โ†’ final clean dataset โ€” with an eye for maintainability, reproducibility, and scalability.

  • Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

  • Familiarity with scientific data types and โ€œmodalitiesโ€ (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).

  • Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.

  • Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion.

  • Past exposure to LLM-based data transformation or cleansing agents โ€” building or integrating tools that clean or structure messy data automatically.

  • Any background in computational biology / lab-data / bioinformatics is a bonus โ€” though not required.

WHAT YOU WILL LOVE AT MITHRL

  • Mission-driven impact: youโ€™ll be the gatekeeper of data quality โ€” ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. Youโ€™ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.

  • High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. Youโ€™ll work closely with our product, data science, and infrastructure teams โ€” shaping how data is ingested, stored, and exposed to end users or AI agents.

  • Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders

  • Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution

  • Speed: We ship fast (2x/week) and improve continuously based on real user feedback

  • Location: Beautiful SF office with a high-energy, in-person culture

  • Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Compensation Range: $150K - $200K