Job Summary:
Marble is a technology company founded to revolutionize the food processing industry. As a Senior Data Engineer, you will design, implement, and support automation solutions that transform the industry, focusing on building scalable data pipelines and collaborating with various teams to ensure efficient data flow.
Responsibilities:
• Architect and build scalable ETL/ELT pipelines for both batch and streaming workloads
• Design real-time ingestion and transformation workflows integrating NATS JetStream and distributed microservices
• Develop robust data models and ETL layers for ClickHouse, enabling high-performance analytics and ML feature extraction
• Manage and optimize data storage across AWS S3, ClickHouse, and operational datasets generated on-prem
• Build automation workflows for labeling data, CV pipeline pre-annotation, dataset generation, and versioning
• Ensure data quality, validation, integrity, and lineage, including automated tests and monitoring across pipelines
• Collaborate with ML and backend teams to deliver pipelines for training datasets and annotation tools. Implement scalable compute workloads for large dataset transformations
• Define and enforce data governance best practices, including schema evolution, retention policies, and compliance requirements
• Monitor and improve data pipeline performance across multi-region environments
Qualifications:
Required:
• B.S. or M.S. in Computer Science, Data Engineering, or related field
• 4+ years of experience building production-grade data pipelines or distributed systems
• Strong proficiency in Python and SQL
• Production experience with at least one major distributed compute framework, Apache Spark, Ray, or Apache Airflow (2+ years preferred)
• Experience building streaming pipelines or real-time systems (Kafka, NATS, Redis Streams, or similar)
• Deep familiarity with AWS cloud services (S3, Lambda, IAM, EC2, Glue etc.)
• Experience with PostgreSQL, MongoDB, Clickhouse or other columnar/NoSQL systems
• Strong understanding of data modeling, partitioning, schema evolution, and performance tuning
• Understanding of data quality, lineage, orchestration, and governance
• Ability to design systems in hybrid environments (on-prem + cloud)
• Excellent communication, documentation, and teamwork skills
Preferred:
• Experience with NATS JetStream, Kafka, or high-throughput messaging systems
• Familiarity with GPU-based CV pipelines, ML datasets, or annotation workflows
• Experience with ClickHouse Materialized Views, Replicated Tables, or S3-backed storage
• Experience working in a regulated, safety-critical, or high-uptime environment
• Experience with Nomad, Consul, Vault, or HashiCorp ecosystem
Company:
Marble is a developer of intelligent technology for meat processing. Founded in 2020, the company is headquartered in Cambridge, USA, with a team of 11-50 employees. The company is currently Early Stage.