1

Bytedance Data Center Jobs (NOW HIRING)

Responsibilities - Design, build, operate and optimize ByteDance's global network, including backbone, data center, public cloud and Edge/CDN. - Work with cross-functional teams including but not ...

... data center networking on a massive scale. Responsibilities - Drive the design, buildout and operation of ByteDance's Global BBone Network; - Design the next generation of TE based GBBone network ...

IRC analysts are expected to respond to all alarms/alerts set in the data center environment ... ByteDance personnel and assets. Responsibilities include triaging alerts related to weather ...

IRC analysts are expected to respond to all alarms/alerts set in the data center environment ... ByteDance personnel and assets. Responsibilities include triaging alerts related to weather ...

The service provider shall work in close coordination with internal project managers, data center ... to ensure adherence to ByteDance operational and safety standards. iv. Facilitate asset ...

Bytedance Data Center information

What is the difference between Bytedance Data Center vs Bytedance Data Engineer?

AspectBytedance Data CenterBytedance Data Engineer
Required CredentialsRelevant degrees in computer science, data management, or related fields; certifications in data architecture or cloud platformsDegrees in computer science, software engineering; certifications in data engineering, cloud services, or programming languages
Work EnvironmentData centers, server rooms, cloud infrastructure environmentsOffice settings, data pipelines, cloud platforms, and coding environments
Employer & Industry UsageOperates within Bytedance's infrastructure, focusing on data storage, processing, and infrastructure managementDevelops and maintains data pipelines, ETL processes, and data systems for Bytedance's products

While both roles are integral to Bytedance's data operations, the Bytedance Data Center focuses on managing physical and cloud infrastructure, whereas the Bytedance Data Engineer designs and implements data processing systems. Understanding these differences helps clarify career paths and job expectations within the company's data ecosystem.

LLM AIOps Development Engineer Graduate (Data Center Networking) - 2026 Start (BS/ MS)

ByteDance

San Jose, CA • On-site

Full-time

This job post has expired today. Applications are no longer accepted.


Job description

Job Summary:
ByteDance is a leading technology company known for its innovative products like TikTok and Douyin. They are seeking a Graduate LLM AIOps Development Engineer to build autonomous data center networks, working closely with various engineering teams to enhance network operations using AI technologies.
Responsibilities:
• Build a Panoramic Network Observability Platform: Develop a streaming telemetry data pipeline for both physical and virtual networks, integrating multi-source data from gNMI, Netconf, IPFIX/NetFlow, and SNMP to provide a high-quality, real-time data foundation for AIOps.
• Develop an Intelligent Diagnostics and Root Cause Analysis System: Apply machine learning and deep learning algorithms to perform anomaly detection, correlation analysis, and intelligent noise reduction on massive volumes of network metrics, logs, and events. Swiftly pinpoint root causes of failures across the entire stack, from optical transceivers and switch hardware to protocol adjacencies and application traffic.
• Explore Innovative Applications of LLMs and Agents: Intelligent Operations Assistant: Build a conversational chatbot powered by Retrieval-Augmented Generation (RAG) that understands natural language queries, automatically queries knowledge bases and monitoring data, and provides precise troubleshooting guidance and network status reports.
• Automated Remediation and Smart Runbooks: Train operational Agents to safely and controllably invoke network change tools and APIs. Empower them to autonomously generate, recommend, or even execute remediation plans and emergency runbooks based on their understanding of failure scenarios.
• Establish Capacity and Risk Prediction Capabilities: Forecast network capacity bottlenecks, high-risk links, and "sub-healthy" devices based on historical data and business growth models, enabling proactive scaling and preventative maintenance.
• Forge a Rock-Solid Engineering System: Adhere to engineering best practices to design and develop a highly available and scalable AIOps platform. Guarantee the stability and performance of the entire pipeline, from data collection and model training to online inference and automated closed-loop actions.
Qualifications:
Required:
• Solid Fundamentals in Computer Science and Networking: A deep understanding of data center network architectures (e.g., Spine-Leaf Fabric), and proficiency in key protocols such as EVPN/VXLAN and BGP/OSPF. In-depth knowledge of the Linux network stack is essential.
• Excellent Software Engineering Skills: Mastery of Golang or Python with outstanding coding and system design abilities. Familiarity with modern software development workflows, including microservices, containerization (Docker/Kubernetes), and CI/CD.
• Rich Platform Development Experience: Practical experience in one or more of the following areas is highly desirable: Big Data Processing: Familiarity with Kafka, Flink, ClickHouse/TSDB, and experience building real-time data pipelines and analytics systems.
• Observability Technologies: Experience with Prometheus/OpenTelemetry, graph databases (e.g., Neo4j), and developing alert and event platforms.
• A Passion for AIOps/ML/LLM Practices: A keen interest in the latest advancements in Large Models and Agent technologies, with thoughtful insights or hands-on experience in their application to operations (e.g., RAG, tool use, safety evaluation).
Preferred:
• Experience in operating or developing for hyperscale (100,000+ servers) data center networks.
• Proven experience leading or making significant contributions to an LLM/Agent-based intelligent operations project with measurable business impact.
• Active contributions to open-source communities such as SONiC, P4/PINS, eBPF, Prometheus, or OpenTelemetry.
• In-depth research or practical experience in high-performance networking (RDMA/RoCE), SmartNICs (NIC Offload), or DPDK/eBPF.
• Experience building network configuration and control systems (e.g., based on SONiC, gNMI, Netconf).
Company:
ByteDance is a technology company that develops content creation platforms and services. Founded in 2012, the company is headquartered in Beijing, CHN, with a team of 10001+ employees. The company is currently Late Stage.