Job Summary:
TikTok is the leading destination for short-form mobile video. The Data Ecosystem Team is seeking a Software Engineer Graduate to design and implement data architecture for large-scale recommendation systems, ensuring system reliability and performance.
Responsibilities:
• Design and implement real-time and offline data architecture for large-scale recommendation systems.
• Build scalable and high-performance streaming Lakehouse systems that power feature pipelines, model training, and real-time inference.
• Collaborate with ML platform teams to support PyTorch-based model training workflows and design efficient data formats and access patterns for large-scale samples and features.
• Own core components of our distributed storage and processing stack, from file format to stream compaction to metadata management.
Qualifications:
Required:
• Individuals who are completing or have recently completed a Bachelor’s or Master’s degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
• Experience building large-scale distributed systems, preferably in storage, stream processing, or ML infrastructure.
• Familiarity with modern Lakehouse technologies such as Apache Paimon, Iceberg, Delta Lake, or Hudi, especially around incremental ingestion, schema evolution, and snapshot isolation.
Preferred:
• Understanding of Apache Flink internals, with hands-on experience in state management, connectors, or UDFs.
• Experience in designing and optimizing Flink + Paimon architectures for unified batch/stream processing.
• Familiarity with feature storage and training data pipelines, and their integration with PyTorch, especially for large-scale model training.
• Knowledge of columnar file formats (Parquet, ORC, Lance) and how they are used in feature engineering or ML data loading.
• Proficiency in Java/Scala/C++, and strong debugging/performance tuning ability.
• Previous experience in Lakehouse metadata management, compaction scheduling, or data versioning is a plus.
• Knowledge of legacy data stores like HBase/Kudu is a bonus but not required.
Company:
TikTok is a short-form video entertainment app and social network platform. It is a sub-organization of ByteDance. Founded in 2003, the company is headquartered in Los Angeles, USA, with a team of 10001+ employees. The company is currently Late Stage.