1

Freelance Distributed Systems Engineer Jobs (NOW HIRING)

Distributed Systems Engineer, L5

OR · On-site +1

$100K - $700K/yr

We are looking for Distributed Systems Engineers to help evolve and innovate our infrastructure. We are committed to building a diverse and inclusive team to bring new perspectives as we solve the ...

Sr. Distributed Systems Engineer

San Francisco, CA · On-site

$123.10K - $168.50K/yr

Role As a distributed systems engineer, you'll work across the stack to solve problems as they come up and help build Archil volumes. You'll have significant influence over the technical and product ...

We are looking for Distributed Systems Engineers to help evolve and innovate our infrastructure. We are committed to building a diverse and inclusive team to bring new perspectives as we solve the ...

About the Role We're looking for a distributed systems engineer to expand the reach and effectiveness of our small Shared Services team. The ideal candidate uses their skills, experience, and ...

About the RoleWe're looking for a distributed systems engineer to expand the reach and effectiveness of our small Shared Services team. The ideal candidate uses their skills, experience, and ...

About the Role We're looking for a distributed systems engineer to expand the reach and effectiveness of our small Shared Services team. The ideal candidate uses their skills, experience, and ...

next page

Showing results 1-20

Freelance Distributed Systems Engineer information

See salary details

$14

$47

$132

How much do freelance distributed systems engineer jobs pay per hour?

As of May 30, 2026, the average hourly pay for freelance distributed systems engineer in the United States is $47.71, according to ZipRecruiter salary data. Most workers in this role earn between $24.28 and $61.78 per hour, depending on experience, location, and employer.
What cities are hiring for Freelance Distributed Systems Engineer jobs? Cities with the most Freelance Distributed Systems Engineer job openings:
What are the most commonly searched types of Distributed Systems Engineer jobs? The most popular types of Distributed Systems Engineer jobs are:
What states have the most Freelance Distributed Systems Engineer jobs? States with the most job openings for Freelance Distributed Systems Engineer jobs include:

$122K - $166.90K/yr

Full-time

Posted 7 days ago


Job description

Job Summary:
MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) is focused on designing and operating ultra-scale GPU supercomputing systems for training foundation models. The Senior Distributed Systems Engineer will optimize communication stacks for large-scale distributed training, ensuring performance and reliability across GPU workloads.
Responsibilities:
• Design and optimize expert-parallel and hybrid-parallel communication patterns
• Drive high-performance hierarchical collectives for MoE workloads
• Co-design runtime orchestration with communication topology awareness
• Reduce tail latency and improve determinism across thousands of GPUs
• Architect fault-tolerant distributed execution under real-world cluster failures
• Communication-compute overlap and topology-aware collective optimization
• Deep debugging of NCCL, RDMA, and custom communication layers
• Hybrid expert parallel strategies in modern large-scale MoE systems
• Elastic and resilient distributed job orchestration concepts
• Congestion analysis and routing optimization across InfiniBand/RoCE fabrics
• Microbenchmarking and performance modeling for communication-heavy workloads
• Hybrid expert parallel communication for Mixture-of-Experts training
• Scaling behavior under network pressure
• Distributed orchestration for elastic, large-scale training
• Fault detection and recovery in distributed GPU workloads
• Cross-layer bottlenecks: GPU ↔ NIC ↔ PCIe ↔ NVSwitch ↔ Fabric ↔ Scheduler
Qualifications:
Required:
• Experience optimizing distributed training at 1,000+ GPU scale (or equivalent depth)
• Hands-on expertise with RDMA, InfiniBand, RoCE, and GPUDirect RDMA
• Deep familiarity with NCCL and/or UCX internals
• Strong systems programming ability (C/C++, Rust, or Go)
• Strong familiarity with modern model training frameworks such as PyTorch
• Ability to troubleshoot and profile training performance issues related to communication bottlenecks
• Ability to translate research ideas into production-grade optimizations
• Experience debugging distributed hangs, desynchronization, and performance regressions
• Include a link to your GitHub (required)
• Provide links to relevant distributed systems, HPC, or large-scale training projects
• Include a list of publications and/or public technical reports (if applicable)
• Describe the hardest distributed debugging problem you solved
• Include measurable performance improvements you have delivered
• Master’s, or Bachelor’s + 1 year of relevant experience.
Company:
Official account of Mohamed bin Zayed University of Artificial Intelligence. Dedicated to research, innovation, and empowering brilliant minds in AI. Founded in 2019, the company is headquartered in Abu Dhabi, ARE, with a team of 51-200 employees. The company is currently Growth Stage.