Job Summary:
Arena Club is pioneering the collectibles domain with a fully digital marketplace for sports cards and memorabilia. They are seeking a Senior Data Engineer to build and maintain reliable data pipelines, improve data quality, and integrate data to enhance strategic decision-making and operational performance.
Responsibilities:
• Maintain and optimize inbound and outbound ETL pipelines built on AWS Glue (Python Shell & Spark ETL)
• Manage Redshift cluster performance across various schemas
• Own integrations with SaaS data sources via AppFlow and direct connectors
• Operate outbound distribution pipelines to external vendors
• Manage infrastructure, alerting, and migration state tracking
• Lead the migration from ad-hoc SQL scripts to a Bronze/Silver/Gold medallion architecture with dbt as the transformation layer
• Design and implement dimensional models, i.e., fact tables and dimensions
• Build the Silver staging layer
• Architect the real-time CDC pipeline
• Implement data contracts and governance at the Silver layer to insulate downstream consumers from source changes
• Implement a hot/cold storage strategy via Redshift Spectrum
• Build the Unified Access Layer
• Design and automate Glue jobs
• Configure S3 lifecycle policies for progressive cost reduction
Qualifications:
Required:
• 5+ years in data engineering with production pipeline ownership (not just analytics or BI)
• Deep AWS experience: Glue (both Python Shell and Spark ETL), Redshift, S3, IAM, EventBridge, Lambda, AppFlow
• Strong SQL: complex joins, window functions, MERGE/UPSERT patterns, Redshift-specific optimization (sort keys, dist keys, VACUUM/ANALYZE)
• Python fluency: boto3, data processing libraries, writing production Glue scripts (not just notebooks)
• Dimensional modeling: star schemas, fact/dimension design, SCD Type 1 and Type 2 implementation
• dbt: hands-on experience building and maintaining staging, intermediate, and mart models with tests and documentation
• Data warehouse operations: schema migration, incremental loads, backfill strategies, monitoring, and alerting
• Hands-on experience using AI tools (Claude or Cursor preferred; other agentic tools welcome) to ship code, build agents, and automate workflows
Preferred:
• Redshift Spectrum: experience with external schemas, Parquet/Hive partitioning, and unified hot/cold querying
• CDC / streaming: Postgres WAL, Debezium, EventBridge, or similar change data capture pipelines
• Data Mesh concepts: domain-oriented ownership, data-as-a-product thinking, federated governance
• AppFlow & SaaS integrations: configuring and troubleshooting managed connectors for Stripe, Zendesk, Mixpanel, etc.
• Cost optimization: right-sizing Glue jobs (Python Shell vs. Spark), Redshift concurrency scaling, S3 lifecycle policies
• Vendor distribution: building outbound API sync jobs with rate limiting, SFTP transfers, webhook delivery
• Familiarity with marketplace or e-commerce data (orders, payments, attribution, promo codes)
• Experience with Mixpanel, Customer.io, or Singular data exports and event schemas
• Prior experience migrating from monolithic ETL to medallion or lakehouse architectures
• Exposure to data governance tooling: data catalogs, lineage tracking, quality frameworks (e.g., Great Expectations, dbt tests)
Company:
Arena Club reimagines sports card collecting with trusted trading, grading, and a fun digital-meets-physical experience. Founded in 2021, the company is headquartered in Los Angeles, USA, with a team of 51-200 employees. The company is currently Growth Stage.