1

Machine Learning Distributed System Engineer Jobs

We're looking for a Machine Learning Engineer who can operate at the intersection of backend ... If you want to design distributed systems, deploy production ML models, and architect scalable data ...

We're looking for a Machine Learning Engineer who can operate at the intersection of backend ... If you want to design distributed systems, deploy production ML models, and architect scalable data ...

Machine Learning Engineer

Dorchester, MA · On-site

$175K - $250K/yr

Machine Learning Engineer Chicago, United States; Hong Kong, Hong Kong; Sydney, Australia As a ... science, or distributed systems is a plus The Base Salary range for the role is included below.

Machine Learning Engineer

Chicago, IL · On-site

$175K - $250K/yr

Machine Learning Engineer Chicago, United States; Hong Kong, Hong Kong; Sydney, Australia As a ... science, or distributed systems is a plus The Base Salary range for the role is included below.

Machine Learning Engineer

Chicago, IL · On-site

$175K - $250K/yr

As a Machine Learning Engineer, you will play a pivotal role in building systems that drive the ... distributed systems is a plus #LI-DNP The Base Salary range for the role is included below. Base ...

next page

Showing results 1-20

Machine Learning Distributed System Engineer information

See salary details

$31.5K

$128.8K

$193.5K

How much do machine learning distributed system engineer jobs pay per year?

As of Jun 4, 2026, the average yearly pay for machine learning distributed system engineer in the United States is $128,769.00, according to ZipRecruiter salary data. Most workers in this role earn between $101,500.00 and $155,000.00 per year, depending on experience, location, and employer.

What are the key skills and qualifications needed to thrive as a Machine Learning Distributed System Engineer, and why are they important?

To thrive as a Machine Learning Distributed System Engineer, you need a strong background in computer science, distributed systems, and machine learning, often supported by a relevant degree and experience in scalable system design. Proficiency with tools like TensorFlow, PyTorch, Apache Spark, and distributed computing platforms such as Kubernetes or Hadoop is essential, along with experience in programming languages like Python, Java, or Scala. Strong problem-solving, collaboration, and communication skills help you effectively design solutions and work with cross-functional teams. These skills are crucial for building robust, scalable ML systems that can handle large datasets and support real-world AI applications.

How does a Machine Learning Distributed System Engineer typically collaborate with data scientists and software engineers on large-scale projects?

As a Machine Learning Distributed System Engineer, you will frequently work alongside data scientists to help scale their models and algorithms for production deployment. Your role involves translating prototype models into distributed systems that can handle vast datasets efficiently. You'll also coordinate with software engineers to integrate these systems into the company's technology stack, ensuring reliability and scalability. Effective communication and a collaborative mindset are crucial, as you'll help bridge the gap between research and production environments.

What is a Machine Learning Distributed System Engineer?

A Machine Learning Distributed System Engineer is a professional who designs, builds, and maintains large-scale systems that enable machine learning algorithms to process and analyze data across multiple machines or clusters. Their role often involves optimizing data pipelines, ensuring system scalability and reliability, and integrating various components to support machine learning workflows. They work closely with data scientists and software engineers to make sure that ML models can be trained and deployed efficiently on distributed infrastructure.

What is the difference between Machine Learning Distributed System Engineer vs Data Engineer?

AspectMachine Learning Distributed System EngineerData Engineer
Required CredentialsBachelor's/Master's in CS, experience with distributed systems, ML frameworksBachelor's/Master's in CS, experience with data pipelines, database systems
Work EnvironmentDeveloping scalable ML systems, working with distributed computing frameworksBuilding and maintaining data pipelines, ETL processes
Employer & Industry UsageTech companies, AI startups, research labsFinance, healthcare, e-commerce, tech firms
Search & Comparison IntentFocus on ML system scalability and distributed computingFocus on data infrastructure and pipeline management

The Machine Learning Distributed System Engineer specializes in designing and implementing scalable ML systems using distributed computing frameworks, while the Data Engineer focuses on building and maintaining data pipelines and infrastructure. Both roles require strong technical skills and often overlap in data handling, but their core focus areas differ—ML system development versus data infrastructure management.

Staff ML Systems Engineer, Distributed Systems

FieldAI

Seattle, WA

Full-time

Posted 5 days ago


Job description

FieldAI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. Based in the heart of Southern California’s robotics ecosystem, we build risk-aware, reliable, field-ready AI systems that solve the hardest problems in robotics and unlock the full potential of embodied intelligence. If you want your work to ship, get tested on hardware, and improve through real deployments, Irvine is the place. We go beyond typical data-driven approaches or pure transformer-only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions that deliver results today and get better every time our robots run in the field.

We are seeking a Senior / Staff ML Systems Engineer to architect and build the distributed infrastructure that powers large-scale machine learning workflows across the organization.

This role sits at the intersection of machine learning, distributed systems, and platform engineering. You will be responsible for designing scalable systems that support data processing, model training, evaluation, and post-processing pipelines while enabling ML teams to efficiently develop, operate, and scale production-grade workflows.

You will play a critical role in defining the architectural patterns, tooling, and infrastructure that underpin our machine learning platform.

What You'll Get To Do
  • Design and build scalable distributed machine learning pipelines across data processing, model training, evaluation, and post-processing workflows.
  • Architect distributed execution systems, including parallelization strategies, workload scheduling, resource allocation, and fault tolerance mechanisms.
  • Develop reusable abstractions, frameworks, and libraries that simplify distributed pipeline development.
  • Optimize performance across distributed CPU and GPU environments, improving throughput, utilization, and reliability.
  • Design systems that effectively manage data partitioning, memory utilization, serialization overhead, and compute efficiency.
  • Partner closely with ML engineers, data engineers, and infrastructure teams to productionize research workflows and enable large-scale model development.
  • Establish best practices and engineering standards for distributed machine learning infrastructure.
  • Evaluate and guide decisions around distributed computing frameworks, infrastructure technologies, and system design trade-offs.
  • Improve observability, debugging, monitoring, and operational tooling for distributed systems at scale.
What You Have
  • 5+ years of experience building distributed systems, backend infrastructure, machine learning platforms, or large-scale data processing systems.
  • Strong Python programming skills, including experience with concurrency, performance optimization, and systems development.
  • Experience with distributed computing frameworks such as Ray, Spark, Dask, Flink, or similar technologies.
  • Experience designing and scaling data pipelines or machine learning workflows.
  • Strong system design skills with demonstrated expertise in scalability, reliability, and performance optimization.
  • Experience diagnosing and resolving bottlenecks in distributed environments.
  • Ability to work cross-functionally and drive technical decisions across multiple teams.
The Extras That Set You Apart
  • Experience building infrastructure for machine learning training and inference systems.
  • Familiarity with modern ML frameworks such as PyTorch or TensorFlow.
  • Experience with multi-node or multi-GPU training architectures, including DDP, FSDP, DeepSpeed, or similar technologies.
  • Experience operating Kubernetes-based infrastructure and large-scale cloud systems.
  • Deep understanding of distributed systems concepts including data locality, serialization costs, scheduling, and resource management.
  • Experience with distributed debugging, observability, and workflow orchestration platforms.
  • Proven ability to establish technical direction and influence architecture across organizations.

Our salary range is highly competitive with the market, but we take into consideration an individual's background and experience in determining final salary. Base pay offered may vary depending on geographic location, job-related knowledge, skills, and experience.

In addition to competitive compensation, FieldAI offers comprehensive benefits, equity participation, and the opportunity to contribute to cutting-edge advancements in AI and robotics.

Our salary range is generous and we consider each individual’s background and experience when determining final compensation. Base pay may vary based on role scope, job-related knowledge, skills, experience, and the Irvine, California market.

Why Join FieldAI in Irvine?
In Irvine, you will work where the robots are. Our local team builds and tests systems on real hardware with real sensors, then ships them to operate in unstructured, previously unknown environments around the world. We are solving one of robotics’ hardest challenges: reliable deployment outside the lab. Our Field Foundational Models™ raise the bar for perception, planning, localization, and manipulation, with an emphasis on explainability and safety for real-world use.
You will collaborate with a world-class team that thrives on creativity, resilience, and bold thinking. We bring deep experience from organizations such as DeepMind, NASA JPL, Boston Dynamics, NVIDIA, Amazon, Tesla Autopilot, Cruise, Zoox, Toyota Research Institute, and SpaceX, along with a track record of field deployments and strong performance in DARPA challenge segments.

Be Part of the Next Robotics Revolution
We are looking for builders who want their work to leave the whiteboard and show up on robots. If you enjoy tackling tough, uncharted questions and working across disciplines, you will find your people here. Our teams span AI, software, robotics engineering, product, field deployment, and technical communication, all focused on shipping systems that perform in the real world.

Our headquarters is in Irvine, and we partner closely with teams there as well as colleagues across the US and around the world. Join us in Southern California and help define what dependable, field-ready autonomy looks like.

We value diverse perspectives and are committed to fostering an inclusive workplace. We evaluate candidates and employees based on merit, qualifications, and performance, and we do not discriminate on the basis of race, color, gender, national origin, ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, or any other legally protected statu

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.