1

Virtual Linux Kernel Developer Jobs in Tennessee

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Strong programming skills with proven production experience in Python (required for automation and ...

... Virtual SOC in which Security Analysts work from home or another remote location which can slow ... High level knowledge of Windows/Linux/UNIX operating systems. * Hands-on experience a hypervisor:

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Strong programming skills with proven production experience in Python (required for automation and ...

Optimize Linux-based systems for performance, security, and reliability, including kernel tuning ... Strong programming skills with proven production experience in Python (required for automation and ...

Installs, configures, patches, and maintains Windows and Linux Server environments across ... Deploy and manage Azure virtual machines, storage, networking, and other cloud services to meet ...

... Virtual SOC in which Security Analysts work from home or another remote location which can slow ... Linux/Unix skills are a plus. * Strong analytical skills to define risk, identify potential threats ...

SUSE Linux Enterprise Server (SLES) Administration: Deploy, configure, and manage SLES EC2 ... Practical (hands-on) Experience with UMG primary DevOps toolsets (Chef, Terraform Enterprise, Jira ...

Collaborate with DevOps teams to implement Infrastructure-as-Code (IaC) solutions using tools like ... Expertise in managing Windows and Linux/UNIX servers, including network service integration.

Infrastructure Storage Engineer

Tullahoma, TN · On-site

$93K - $122K/yr

... virtual reality, big data analytics, and more! We are excited to continue to change and improve the ... Administer Red Hat Linux servers and OpenShift environments to support the stability and security ...

Infrastructure Storage Engineer

Tullahoma, TN · On-site

$93K - $122K/yr

... virtual reality, big data analytics, and more! We are excited to continue to change and improve the ... Administer Red Hat Linux servers and OpenShift environments to support the stability and security ...

Installs, configures, patches, and maintains Windows and Linux Server environments across ... Deploy and manage Azure virtual machines, storage, networking, and other cloud services to meet ...

Platform Engineer IV

Nashville, TN · Remote

$85 - $90/hr

Strong Linux administration experience. * Strong experience with cloud providers such as AWS ... You may receive email and SMS notifications from the Eliassen Virtual Recruiting Team ( noreply ...

next page

Showing results 1-20

Virtual Linux Kernel Developer information

What are the key skills and qualifications needed to thrive as a Virtual Linux Kernel Developer, and why are they important?

To thrive as a Virtual Linux Kernel Developer, you need deep expertise in Linux kernel architecture, C programming, and operating system concepts, typically supported by a degree in Computer Science or related fields. Familiarity with version control systems like Git, debugging tools such as GDB, and experience with virtualization technologies (e.g., KVM, QEMU) are essential. Strong problem-solving abilities, attention to detail, and effective remote communication distinguish outstanding professionals in this role. These skills are crucial for developing robust kernel modules, diagnosing complex issues, and collaborating efficiently within distributed development teams.

What are some common challenges faced by Virtual Linux Kernel Developers when debugging complex virtualization issues?

Virtual Linux Kernel Developers often encounter intricate challenges when debugging issues related to virtualization, such as timing discrepancies, non-deterministic behavior, and compatibility with diverse hypervisors. These issues can be difficult to isolate since they may only manifest under specific workloads or hardware configurations. Collaboration with systems engineers, QA teams, and sometimes upstream kernel communities is usually essential to identify root causes and implement robust solutions. Being comfortable with low-level debugging tools, kernel logs, and patch testing in virtual environments is key to overcoming these challenges.

What does a Virtual Linux Kernel Developer do?

A Virtual Linux Kernel Developer specializes in designing, developing, and maintaining the core components of the Linux operating system, particularly in virtualized environments. This includes working on kernel modules, optimizing performance for virtual machines, and fixing bugs related to virtualization technologies. They often collaborate with open-source communities and contribute to projects that enhance Linux's compatibility with different hypervisors and cloud platforms. Their role is critical in ensuring the efficiency, security, and stability of Linux systems running in virtualized settings.

What is the difference between Virtual Linux Kernel Developer vs Virtual Linux System Programmer?

AspectVirtual Linux Kernel DeveloperVirtual Linux System Programmer
Primary FocusDeveloping and maintaining Linux kernel codeWriting and optimizing system-level software for Linux
Required SkillsC programming, kernel architecture, debugging kernel modulesC, C++, system calls, device drivers
Work EnvironmentCollaborative development, version control, Linux environmentsSystem integration, testing, Linux-based systems
Industry UsageOpen-source projects, tech companies, hardware vendorsIT services, embedded systems, enterprise solutions

While both roles involve Linux and system-level programming, Virtual Linux Kernel Developers focus on kernel code development, whereas Virtual Linux System Programmers work on system software and application interfaces. Understanding these distinctions helps in choosing the right career path or job search focus.

What are the most commonly searched types of Linux Kernel Developer jobs in Tennessee? The most popular types of Linux Kernel Developer jobs in Tennessee are:
What are popular job titles related to Virtual Linux Kernel Developer jobs in Tennessee? For Virtual Linux Kernel Developer jobs in Tennessee, the most frequently searched job titles are:
What job categories do people searching Virtual Linux Kernel Developer jobs in Tennessee look for? The top searched job categories for Virtual Linux Kernel Developer jobs in Tennessee are:
What cities in Tennessee are hiring for Virtual Linux Kernel Developer jobs? Cities in Tennessee with the most Virtual Linux Kernel Developer job openings:

Member of Technical Staff

xAI

Memphis, TN • On-site

Other

Posted 21 days ago


Job description

ABOUT THE ROLE:

We are seeking a highly skilled Member of Technical Staff to join our team in managing and enhancing reliability across a multi-data center environment. This role focuses on automating processes, building and implementing robust observability solutions, and ensuring seamless operations for mission-critical AI infrastructure. The ideal candidate will combine strong coding abilities with hands-on data center experience to build scalable reliability services, optimize system performance, and minimize downtime-including close partnership with facility operations to address physical infrastructure impacts. If you thrive in lightning-fast, distributed environments and are passionate about leveraging automation to drive efficiency, this is an opportunity to make a significant impact on our infrastructure's resilience and scalability.

In an era where AI workloads demand near-zero downtime, this position plays a pivotal role in bridging software engineering principles with physical data center realities. By prioritizing automation and observability, team members in this role can reduce mean time to recovery (MTTR) by up to 50% through proactive monitoring and automated remediation, based on industry benchmarks from high-scale environments like those at hyperscale cloud providers.

The primary objective of this team is to mitigate downtime and minimize impact to end-users from both scheduled and unscheduled maintenance, as well as events affecting onsite data centers. This is achieved through proactive automation, robust observability, and integrated software-physical reliability strategies, ensuring our AI infrastructure remains resilient, scalable, and at the cutting edge of innovation.

RESPONSIBILITIES:
  • Design, develop, and deploy scalable code and services (primarily in Python and Rust, with flexibility for emerging languages) to automate reliability workflows, including monitoring, alerting, incident response, and infrastructure provisioning. We value adaptability to new tools and paradigms in the fast-evolving AI space.
  • Implement and maintain observability tools and practices, such as metrics collection, logging, tracing, and dashboards, to provide real-time insights into system health across multiple data centers-open to innovative stacks beyond traditional ones like ELK.  
  • Collaborate with cross-functional teams-including software development, network engineering, site operations, and facility operations (critical facilities, mechanical/electrical teams, and data center infrastructure management)-to identify reliability bottlenecks, automate solutions for fault tolerance, disaster recovery, capacity planning, and physical/environmental risk mitigation (e.g., power redundancy, cooling efficiency, and environmental monitoring integration).This role encourages broad skill sets from diverse technical backgrounds to foster innovation.
  • Troubleshoot and resolve complex issues in data center environments, including hardware failures, environmental anomalies, software bugs, and network-related problems, while adhering to reliability principles like error budgets and SLAs.**Key Insight: By applying SWE rigor to troubleshooting, team members can create reusable diagnostic tools that accelerate resolution, turning unscheduled events (e.g., hardware faults) into opportunities for system hardening and reducing overall end-user impact through targeted SLAs that prioritize critical AI services. We seek versatile problem-solvers who adapt to bleeding-edge challenges.
  • Optimize Linux-based systems for performance, security, and reliability, including kernel tuning, container orchestration (e.g., Kubernetes or emerging alternatives), and scripting for automation.
  • Understand network topologies and concepts in large-scale, multi-data center environments to effectively troubleshoot connectivity, routing, redundancy, and performance issues; integrate observability into data center interconnects and facility-level controls for rapid diagnosis and automation.**Key Insight: In multi-site setups, network insights allow for automated failover mechanisms that handle both digital and physical disruptions, ensuring seamless continuity for end-users during events like fiber cuts or power outages. This attracts candidates from varied networking and systems backgrounds to drive forward-thinking solutions.
  • Participate in on-call rotations, post-incident reviews (blameless postmortems), and continuous improvement initiatives to enhance overall site reliability, including joint exercises with facility teams for physical failover and recovery scenarios. We prioritize growth-minded individuals who embrace evolving practices.
  • Mentor junior team members and document processes to foster a culture of automation, knowledge sharing, and adaptability to new technologies.
BASIC QUALIFICATIONS:
  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related technical field (or equivalent professional experience).
  • 5+ years of hands-on experience in site reliability engineering (SRE), infrastructure engineering, DevOps, or systems engineering, preferably supporting large-scale, distributed, or production environments.
  • Strong programming skills with proven production experience in Python (required for automation and tooling); experience with Rust or willingness to work in Rust is a plus, but strong coding fundamentals in at least one systems-level language (e.g., Python, Go, C++) are essential.
  • Solid experience with Linux systems administration, performance tuning, kernel-level understanding, and scripting/automation in production environments.
  • Practical knowledge of containerization and orchestration technologies, such as Docker and Kubernetes (or similar systems).
  • Experience implementing observability solutions, including metrics, logging, tracing, monitoring tools (e.g., Prometheus, Grafana, or alternatives), alerting, and dashboards.
  • Familiarity with troubleshooting complex issues in distributed systems, including software bugs, hardware failures, network problems, and environmental factors.
  • Understanding of networking fundamentals (TCP/IP, routing, redundancy, DNS) in large-scale or multi-site environments.
  • Experience participating in on-call rotations, incident response, post-incident reviews (blameless postmortems), and reliability practices such as error budgets or SLAs.
  • Ability to collaborate effectively with cross-functional teams (software engineers, network teams, site/facility operations, mechanical/electrical teams).
PREFERRED SKILLS AND EXPERIENCE:
  • 7+ years of experience in SRE or infrastructure roles, ideally in hyperscale, cloud, or AI/ML training infrastructure environments with multi-data center setups.
  • Hands-on experience operating or scaling Kubernetes clusters (or equivalent orchestration) at large scale, including automation for provisioning, lifecycle management, and high-availability.
  • Proficiency in Rust for systems programming and performance-critical components.
  • Direct experience integrating software reliability tools with physical data center infrastructure (e.g., power, cooling, environmental monitoring, facility controls) and automating responses to physical events.
  • Exposure to advanced or innovative observability stacks beyond traditional tools (e.g., exploring cutting-edge alternatives for metrics, logs, and tracing).
  • Experience building automated remediation, fault tolerance, disaster recovery, capacity planning, or predictive failure detection systems.
  • Background in optimizing Linux-based systems for AI workloads, GPU clusters, or high-throughput compute environments.
  • Demonstrated success reducing downtime, MTTR, or improving resource efficiency (e.g., through automation or observability) in high-stakes production settings.
  • Prior work with bare-metal provisioning, data center interconnects, or hybrid/multi-site failover mechanisms.
  • Mentoring experience, strong documentation skills, and a track record of fostering knowledge sharing and automation culture.
  • Comfort with rapid technology adaptation in fast-evolving domains like AI infrastructure.