1

Rust Linux Kernel Jobs in Tennessee (NOW HIRING)

... • Optimize Linux-based systems for performance, security, and reliability, including kernel ... Rust or willingness to work in Rust is a plus, but strong coding fundamentals in at least one ...

... • Optimize Linux-based systems for performance, security, and reliability, including kernel ... Rust or willingness to work in Rust is a plus, but strong coding fundamentals in at least one ...

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Proficiency in Rust for systems programming and performance-critical components. * Direct ...

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Proficiency in Rust for systems programming and performance-critical components. * Direct ...

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Proficiency in Rust for systems programming and performance-critical components. * Direct ...

Rust Linux Kernel information

What are the key skills and qualifications needed to thrive as a Rust Linux Kernel Developer, and why are they important?

To thrive as a Rust Linux Kernel Developer, you need deep knowledge of systems programming, strong proficiency in Rust and C, and experience with Linux kernel internals. Familiarity with build systems (e.g., Make), version control (Git), and relevant certifications such as Linux Foundation Certified Engineer are typically valuable. Excellent problem-solving skills, attention to detail, and effective collaboration are essential soft skills in this role. These abilities ensure robust, secure, and maintainable kernel contributions while facilitating smooth teamwork in complex open-source environments.

What are some common challenges faced when working on Rust integration within the Linux kernel, and how do teams typically address them?

A frequent challenge in this role is bridging the gap between Rust and the existing C-based Linux kernel codebase. This includes ensuring memory safety, managing interoperability between the two languages, and adhering to strict kernel coding standards. Teams often address these challenges through extensive code reviews, collaborative discussions on kernel mailing lists, and by actively participating in the upstream kernel and Rust-for-Linux communities. Continuous learning and communication are essential, as both the Rust integration and kernel development methodologies are rapidly evolving.

What are Rust Linux Kernel developers?

Rust Linux Kernel developers are software engineers who contribute to the Linux kernel using the Rust programming language. They focus on writing new kernel modules or components in Rust, aiming to improve safety, security, and maintainability compared to traditional C code. These developers typically have a deep understanding of both systems programming and the Linux kernel architecture. Their work is part of an ongoing effort to gradually integrate Rust into the kernel alongside existing C code.

What is the difference between Rust Linux Kernel vs C Linux Kernel Developer?

AspectRust Linux KernelC Linux Kernel Developer
Required CredentialsKnowledge of Rust programming, Linux kernel basicsProficiency in C, Linux kernel development experience
Work EnvironmentContributing to Linux kernel modules using Rust, Linux environmentDeveloping and maintaining Linux kernel code in C
Industry UsageEmerging in kernel development, experimental projectsStandard in Linux kernel development
Common Search/ComparisonOften compared for language choice in kernel modulesTraditional role, baseline for kernel development

The main difference between a Rust Linux Kernel developer and a C Linux Kernel Developer lies in the programming language used. Rust developers focus on leveraging Rust's safety features for kernel modules, while C developers work with the traditional C language. Both roles require Linux kernel knowledge, but Rust is newer and less widespread in kernel development, making it an emerging area compared to the well-established C role.

Infographic showing various Rust Linux Kernel job openings in Tennessee as of May 2026, with employment types broken down into 88% Full Time, 6% Part Time, 2% Contract, and 4% Nights. Highlights an 66% Physical, 25% Hybrid, and 9% Remote job distribution.

Member of Technical Staff

xAI

Memphis, TN • On-site

Full-time

Posted 21 days ago


Job description

Job Summary:
xAI is dedicated to creating AI systems that enhance humanity's understanding of the universe. They are seeking a highly skilled Member of Technical Staff to manage and enhance reliability across a multi-data center environment, focusing on automating processes and implementing observability solutions for mission-critical AI infrastructure.
Responsibilities:
• Design, develop, and deploy scalable code and services (primarily in Python and Rust, with flexibility for emerging languages) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. We value adaptability to new tools and paradigms in the fast-evolving AI space.
• Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers—open to innovative stacks beyond traditional ones like ELK.
• Collaborate with cross-functional teams—including software development, network engineering, site operations, and facility operations (critical facilities, mechanical/electrical teams, and data center infrastructure management)—to identify reliability bottlenecks, automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation (e.g., power redundancy, cooling efficiency, and environmental monitoring integration). This role encourages broad skill sets from diverse technical backgrounds to foster innovation.
• Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs. Key Insight: By applying SWE rigor to troubleshooting, team members can create reusable diagnostic tools that accelerate resolution, turning unscheduled events (e.g., hardware faults) into opportunities for system hardening and reducing overall end-user impact through targeted SLAs that prioritize critical AI services. We seek versatile problem-solvers who adapt to bleeding-edge challenges.
• Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration (e.g., Kubernetes or emerging alternatives), and scripting for automation.
• Understand network topologies and concepts in large-scale, multi-data center environments to effectively troubleshoot connectivity, routing, redundancy, and performance issues; integrate observability into data center interconnects and facility-level controls for rapid diagnosis and automation. Key Insight: In multi-site setups, network insights allow for automated failover mechanisms that handle both digital and physical disruptions, ensuring seamless continuity for end-users during events like fiber cuts or power outages. This attracts candidates from varied networking and systems backgrounds to drive forward-thinking solutions.
• Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives to enhance overall site reliability, including joint exercises with facility teams for physical failover and recovery scenarios. We prioritize growth-minded individuals who embrace evolving practices.
• Mentor junior team members and document processes to foster a culture of automation, knowledge sharing, and adaptability to new technologies.
Qualifications:
Required:
• Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience).
• 5+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering, preferably supporting large-scale, distributed, or production environments.
• Strong programming skills with proven production experience in Python (required for automation and tooling); experience with Rust or willingness to work in Rust is a plus, but strong coding fundamentals in at least one systems-level language (e.g., Python, Go, C++) are essential.
• Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments.
• Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems).
• Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools (e.g., Prometheus, Grafana, or alternatives), alerting, and dashboards.
• Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors.
• Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments.
• Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs.
• Ability to collaborate effectively with cross-functional teams (software engineers, network teams, site/facility operations, mechanical/electrical teams).
Preferred:
• 7+ years of experience in SRE or infrastructure roles, ideally in hyperscale, cloud, or AI / ML training infrastructure environments with multi-data center setups.
• Hands-on experience operating or scaling Kubernetes clusters (or equivalent orchestration) at large scale, including automation for provisioning, lifecycle management, and high-availability.
• Proficiency in Rust for systems programming and performance-critical components.
• Direct experience integrating software reliability tools with physical data center infrastructure (e.g., power, cooling, environmental monitoring, facility controls) and automating responses to physical events.
• Exposure to advanced or innovative observability stacks beyond traditional tools (e.g., exploring cutting-edge alternatives for metrics, logs, and tracing).
• Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems.
• Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments.
• Demonstrated success reducing downtime, MTTR, or improving resource efficiency (e.g., through automation or observability) in high-stakes production settings.
• Prior work with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms.
• Mentoring experience, strong documentation skills, and a track record of fostering knowledge sharing and automation culture.
• Comfort with rapid technology adaptation in fast-evolving domains like AI infrastructure.
Company:
XAI is an artificial intelligence startup that develops AI solutions and tools to enhance reasoning and search capabilities. It is a sub-organization of SpaceX. Founded in 2023, the company is headquartered in Palo Alto, USA, with a team of 1001-5000 employees. The company is currently Late Stage.