Job Summary:
xAI is a company focused on creating AI systems that enhance human understanding and knowledge. The Data Engineer role involves developing systems and processes for data acquisition, preparation, and quality evaluation, ensuring that models are trained on high-quality data throughout the training lifecycle.
Responsibilities:
• Analyze the performance and impact of data used throughout the model training lifecycle
• Investigate anomalous model behavior and rigorously identify the data issues that drive poor downstream performance
• Design, build, and improve the data cleaning, transformation, and quality-control steps required to produce high-quality training data
• Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development
• Apply statistical techniques and empirical analysis to make informed, data-driven decisions about dataset quality and model outcomes
• Partner across teams to identify where data needs exist and define the highest-impact opportunities for new data acquisition and improvement
• Build and maintain production-grade data pipelines, tooling, and software systems that ingest, process, validate, and deliver data for training
• Develop metrics, evaluation frameworks, and monitoring systems to assess how data quality influences model behavior at scale
• Fuse data from multiple sources into reliable, usable datasets for research and production model training
• Create shared datasets, tooling, and internal data products that enable other teams to analyze, debug, and improve model performance
Qualifications:
Required:
• Bachelor’s degree in computer science, data science, physics, mathematics, or a STEM discipline
• 1+ years of data/software engineering experience (internship experience is applicable)
• Experience in implementing or analyzing language models or neural networks
Preferred:
• Professional experience in analytics, data science, machine learning, or data engineering
• Experience building and operating production data pipelines for neural network or large-scale machine learning workloads
• Strong experience with Python and the broader ecosystem of libraries and tools used in modern machine learning and data development
• Experience working with Parquet or similar columnar storage formats in large-scale data systems
• Familiarity with Kubernetes and distributed production environments
• Experience developing predictive models and machine learning pipelines, including clustering, forecasting, anomaly detection, or related techniques
• Experience working with very large-scale datasets, including terabyte- to petabyte-scale data systems
• Strong statistical intuition and the ability to use quantitative analysis to guide technical and product decision, including familiarity of scaling ladder design studies
• Ability to operate effectively in a dynamic environment with evolving priorities, changing requirements, and fast-moving technical challenges
• Demonstrated ability to take ownership of ambiguous problems, drive projects independently, and develop new expertise where needed
Company:
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities. It is a sub-organization of SpaceX. Founded in 2023, the company is headquartered in Palo Alto, USA, with a team of 1001-5000 employees. The company is currently Late Stage.