Job Summary:
MFour Data Research is a leading consumer intelligence platform focused on transforming how businesses understand consumers. They are seeking a Senior Data Engineer to own the technical core of their consumer-data pipeline, responsible for ingesting, cleaning, and unifying behavioral data into a reliable identity graph that supports their AI-driven products.
Responsibilities:
• Own MFour's consumer-data pipeline — the core engine behind insights hundreds of leading brands rely on.
• Solve a genuinely hard problem: unifying real-time app, web, purchase, ChatGPT, and foot-traffic data into one coherent identity graph.
• Sit at the center of the platform — your pipeline feeds DANI's AI query layer and every customer-facing product.
• Build something that compounds: every improvement makes DANI smarter and every downstream team faster.
• Work directly alongside the Principal Platform Engineer and across all product teams.
• Own the end-to-end, multi-source pipeline: ingest, transformation, cleaning, and delivery.
• Enforce data-quality standards at ingest — catch schema drift, anomalies, and source failures before they hit downstream systems.
• Keep the pipeline fast, scalable, and reliable for a live AI query layer, owning SLAs when upstream sources change.
• Own the identity-resolution system that merges behavioral signals into a clean, deduplicated identity graph — the foundation DANI queries against.
• Build and refine entity-matching, dedup, and merge logic that resolves conflicting signals, with clear confidence rules.
• Partner with the Principal Platform Engineer to optimize data structures and access patterns for DANI — low-latency, high-fidelity, queryable.
• Provide freshness and availability guarantees for Survey and Research fulfillment, backed by defined data contracts.
• Build monitoring and alerting across the pipeline (freshness, volume anomalies, schema violations, identity drift) so issues surface in minutes, not days.
• Own root-cause analysis for incidents — trace failures to the source, document the fix, and harden against recurrence.
• Proactively retire technical debt before it becomes a risk.
• Enforce data-handling practices that keep behavioral data compliant with MFour's privacy commitments and applicable regulations.
• Maintain clear data lineage and retention policies that satisfy internal audits and enterprise-client trust.
• Deliver data to Survey and Research fulfillment teams that's available, correctly structured, and on time — with clear ownership when it's not.
• Partner with the Product Pod to surface data constraints that shape what DANI can confidently answer.
• Document architecture, data contracts, and known failure modes so system knowledge isn't trapped in one person's head.
Qualifications:
Required:
• 5+ years of data engineering experience with clear ownership of production pipelines — not just contribution to them. You have shipped and operated multi-source data systems at scale.
• Deep expertise in batch and streaming data pipeline architectures — ingest, transformation, deduplication, and delivery — using tools such as Apache Spark, Kafka, Flink, dbt, Airflow, or equivalents.
• Hands-on experience with entity resolution, identity matching, or record linkage across multiple data sources — including the hard cases: conflicting signals, sparse data, and evolving schemas.
• Strong command of SQL and at least one general-purpose language (Python strongly preferred) for pipeline development, data validation, and operational tooling.
• Cloud data platform experience (AWS or GCP), including managed warehouses (Databricks and Snowflake), object storage, and cloud-native orchestration services.
• A rigorous approach to data quality: you define what 'clean' means, you build validation into the pipeline, and you treat a silent data error as seriously as a system outage.
• Familiarity with consumer data privacy requirements (CCPA, CPRA) and the practical implications for how behavioral data is collected, stored, and processed in a commercial research context.
Company:
Delivering validated consumer intelligence that brands can trust. Founded in 2001, the company is headquartered in Irvine, USA, with a team of 51-200 employees. The company is currently Growth Stage.