1

Distributed Systems Software Engineer Jobs in California

... debugging distributed applications Experience debugging at all levels of an operating system ... of software engineer career experience Expertise in at least one of C++/Objective-C/Swift ...

Systems Software Engineer

Redwood City, CA · On-site

$211K - $251K/yr

As a System Software Engineer, you'll focus on the runtime itself which encompasses the IPC layer ... Experience building observability or tracing infrastructure for distributed or multi-process ...

Systems Software Engineer

San Jose, CA · On-site

$202K - $240K/yr

Systems Software Engineer Location: San Jose, CA (Onsite) Duration: 6+ months contract Role and Responsibilities As a Senior System Software Engineer, you will join the GPU Software Validation team ...

Systems Software Engineer Company: Picarro Location: Santa Clara, CA Education: Bachelor's Degree Required Position Overview Picarro is seeking a Systems Software Engineer to design, develop, and ...

Systems Software Engineer

Santa Clara, CA · On-site

$203K - $240K/yr

Picarro is a leading company in optical spectroscopy solutions, seeking a Systems Software Engineer to design, develop, and maintain software systems for scientific instrumentation. The role involves ...

Systems Software Engineer

Santa Clara, CA · On-site

$120K - $130K/yr

Systems Software Engineer Company: Picarro Location: Santa Clara, CA Education: Bachelor's Degree Required Position Overview Picarro is seeking a Systems Software Engineer to design, develop, and ...

Systems Software Engineer

Santa Clara, CA · On-site

$203K - $240K/yr

They are seeking a Systems Software Engineer to design, develop, and maintain robust software systems that support scientific instrumentation and data-driven applications. Responsibilities : • ...

Systems Software Engineer

Vista, CA · On-site

$178K - $211K/yr

The Systems Software Engineer will be responsible for system-level software integration, requirements, interfaces, and cybersecurity considerations, ensuring traceability and configuration management ...

next page

Showing results 1-20

Distributed Systems Software Engineer information

What are the key skills and qualifications needed to thrive as a Distributed Systems Software Engineer, and why are they important?

To thrive as a Distributed Systems Software Engineer, you need strong programming skills (often in languages like Java, Go, or C++), a deep understanding of algorithms, networking, and distributed computing concepts, typically supported by a degree in computer science or a related field. Familiarity with tools and frameworks such as Kubernetes, Apache Kafka, Docker, and cloud platforms (AWS, GCP, or Azure) is highly valued, as are certifications in cloud or devops technologies. Excellent problem-solving, teamwork, and communication skills help you design scalable solutions and collaborate across teams. These skills are crucial for building reliable, efficient, and scalable distributed systems that power modern applications and services.

What is the difference between Distributed Systems Software Engineer vs Cloud Software Engineer?

AspectDistributed Systems Software EngineerCloud Software Engineer
Required CredentialsBachelor's in CS or related, experience with distributed architecturesBachelor's in CS, experience with cloud platforms (AWS, Azure)
Work EnvironmentDevelops scalable distributed applications, often in data centers or on-premisesBuilds and maintains cloud-based solutions, deploying on cloud platforms
Employer & Industry UsageTech companies, data centers, distributed computing firmsCloud service providers, SaaS companies, enterprises adopting cloud
Search & Comparison IntentUnderstanding roles in distributed architectureComparing cloud-focused development roles

While both roles involve building scalable software, a Distributed Systems Software Engineer focuses on designing and implementing distributed architectures, whereas a Cloud Software Engineer specializes in deploying and managing applications on cloud platforms. The roles often overlap but differ mainly in their environment and specific technical focus.

What are the typical challenges faced by Distributed Systems Software Engineers when ensuring system reliability?

Distributed Systems Software Engineers often encounter challenges like handling network partitioning, ensuring data consistency across nodes, and effectively managing system failures. They need to design resilient architectures that can recover gracefully when components fail, and implement robust monitoring to detect issues early. Collaborating closely with DevOps, QA, and other engineering teams is crucial to address these challenges and maintain high availability and performance in complex, distributed environments.

What are Distributed Systems Software Engineers?

Distributed Systems Software Engineers are professionals who design, develop, and maintain software that runs across multiple computers or servers, working together to achieve a common goal. They build systems that are reliable, scalable, and efficient, often handling large volumes of data and user requests. Their work involves solving challenges related to network communication, data consistency, fault tolerance, and system coordination. These engineers frequently use technologies like cloud computing platforms, message queues, and databases to ensure smooth operation across distributed environments.
What are popular job titles related to Distributed Systems Software Engineer jobs in California? For Distributed Systems Software Engineer jobs in California, the most frequently searched job titles are:
What job categories do people searching Distributed Systems Software Engineer jobs in California look for? The top searched job categories for Distributed Systems Software Engineer jobs in California are:
Infographic showing various Distributed Systems Software Engineer job openings in California as of June 2026, with employment types broken down into 87% Full Time, 9% Part Time, 2% Temporary, and 2% Contract. Highlights an 91% Physical, 2% Hybrid, and 7% Remote job distribution.
Senior Systems Software Engineer, Observability and Telemetry Platform

Senior Systems Software Engineer, Observability and Telemetry Platform

Nvidia Corporation

Santa Clara, CA • On-site

Full-time

Posted 22 days ago


Job description

Senior Systems Software Engineer (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demands knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. Senior Systems Software Engineer (SRE) at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. Senior Systems Software Engineer (SRE) is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems.
The Senior Systems Software Engineer (SRE) are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approaches to tackle a broad spectrum of problems. Practices such as limiting time spent on reactive operational work, blameless postmortems and proactive identification of potential outages' factor into iterative improvement that is key to both product quality and exciting dynamic day-to-day work. The Senior Systems Software Engineer (SRE) culture of diversity, intellectual curiosity, problem solving and willingness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on relevant projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.
What you'll be doing:
  • Design, implement and support operational and reliability aspects of large scale Observability & Telemetry collection platform with a focus on performance at scale, real time monitoring, logging and alerting
  • Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems
  • Be part of an on call rotation to support production systems

What we need to see:
  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
  • 8+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large scale private or public cloud system in Production
  • 5+ years experience delivering foundational infrastructure and observability platforms.
  • Experience in one or more of the following: Python, Go, Perl or Ruby
  • In depth knowledge on Linux, Networking and Containers

Ways to stand out from the crowd:
  • Interest in crafting, analyzing and fixing large-scale distributed systems
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. Ability to debug and optimize code and automate routine tasks
  • Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack and Docker. Experience running Grafana, OpenTelemetry, Prometheus, and similar observability focused tools

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until June 28, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Nvidia logo

About Nvidia

Sourced by ZipRecruiter

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology--and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what's never been done before takes vision, innovation, and the world's best talent.

Industry

Computer and electronic product manufacturing

Company size

10,000+ Employees

Headquarters location

Santa Clara, CA, US

Year founded

1993