STACK Infrastructure
STACK Infrastructure

60 Stack Infrastructure Software Jobs Hiring Near You

Infrastructure Software Engineer

San Jose, CA · Hybrid

$202.80K - $240.30K/yr

We build this infrastructure as software - and we engineer it with the same best practices we apply ... You'll also architect and implement a state-of-the-art observability stack with LLM integration and ...

We build this infrastructure as software - and we engineer it with the same best practices we apply ... You'll also architect and implement a state-of-the-art observability stack with LLM integration and ...

We build this infrastructure as software - and we engineer it with the same best practices we apply ... You'll also architect and implement a state-of-the-art observability stack with LLM integration and ...

Infrastructure Software Engineer

Seattle, WA · On-site

$120K - $200K/yr

Our OverDrive autonomy stack enables ground vehicles to navigate and operate off-road in any ... As a Software Infrastructure Engineer , you'll design, build, and maintain the internal software ...

We are building a full-stack AI cloud platform that supports developers and enterprises from data ... The role Nebius operates large-scale, mission-critical bare-metal infrastructure. As a Software ...

OR · On-site

$108.40K - $147.40K/yr

We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in ... Co-design and implement APIs for integration with NVIDIA's resiliency stacks. * Enhance ...

next page

Showing results 1-20

STACK Infrastructure Jobs Information

What are the key skills and qualifications needed to thrive as a Software Engineer, and why are they important?

To thrive as a Software Engineer, you need strong programming skills, problem-solving abilities, and typically a degree in computer science or a related field. Familiarity with coding languages (like Python, Java, or C++), version control systems (such as Git), and experience with software development frameworks are commonly required. Attention to detail, effective communication, and teamwork are crucial soft skills for collaborating on projects and delivering robust solutions. These skills ensure the development of high-quality, reliable software that meets user needs and business objectives.

What are some common challenges software professionals face when working on large-scale projects?

Software professionals working on large-scale projects often encounter challenges such as coordinating with cross-functional teams, managing complex codebases, and ensuring consistent communication across distributed teams. Balancing the need for rapid development with maintaining code quality and meeting strict deadlines can also be demanding. Adapting to evolving requirements and integrating new technologies while minimizing disruptions are common aspects of the role, making strong organizational and collaboration skills essential.

What are software developers?

Software developers are professionals who design, create, test, and maintain software applications or systems. They use programming languages and development tools to build software that meets user needs or solves specific problems. Their responsibilities can include writing code, debugging, collaborating with other team members, and updating existing programs. Software developers work in a variety of industries, including technology, finance, healthcare, and more.

What jobs are there in software?

Jobs in software include roles such as software developer, software engineer, quality assurance tester, systems analyst, and technical support specialist. These positions often require knowledge of programming languages, software development tools, and problem-solving skills, with some roles requiring certifications or specific technical expertise.

What jobs in the US pay 300,000 a year?

Software engineering roles, especially senior positions such as principal engineers, software architects, and engineering managers, can earn $300,000 or more annually in the US. High compensation often requires extensive experience, specialized skills, and working at large tech companies or in high-demand industries, sometimes including stock options or bonuses.

What is the difference between Software vs Web Developer?

AspectSoftwareWeb Developer
Required CredentialsTypically a degree in Computer Science or related field; certifications like Microsoft Certified or Oracle CertifiedSimilar credentials; often a degree in CS or Web Development certifications
Work EnvironmentDevelops software applications for various platforms, including desktop and mobileBuilds websites and web applications primarily for online use
Employer & Industry UsageUsed across tech companies, software firms, and enterprisesCommon in digital agencies, tech startups, and online businesses
Common Search & Comparison IntentPeople compare to understand different roles in software creationOften compared to see distinctions in web-focused development

While both Software developers and Web Developers work in the tech industry and require similar skills and certifications, Software developers create applications for various platforms, whereas Web Developers focus on building websites and web-based applications. The choice depends on whether you're interested in broad software solutions or web-specific projects.

What are the most popular job types at Stack Infrastructure?
    What are the most popular categories at Stack Infrastructure?
    Infographic showing various Software job openings at Stack Infrastructure in the United States as of May 2026, with employment types broken down into 100% Full Time. Highlights an 100% Physical job distribution.

    Infrastructure Software Engineer

    Etched

    San Jose, CA • Hybrid

    $202.80K - $240.30K/yr

    Full-time

    Medical, Dental, Vision

    Posted 9 days ago


    Job description

    About Etched

    Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

    Job Summary

    Building cutting-edge model-specific ASICs requires crafting custom infrastructure and toolchains to support ultra-fast, reliable, and scalable development across the stack - from simulation to silicon. We build this infrastructure as software - and we engineer it with the same best practices we apply to our products. We use the same rigor, design discipline, and quality standards and testing as we do to our ASIC, software, and platform.

    You will lead the development and adoption of next-generation infrastructure tooling, enabling Etched ASIC, Software, and Platform engineers to iterate faster, build more reliably, and push the boundaries of AI performance. This includes building and scaling our hybrid high-performance compute (HPC) cluster, optimized for massively parallel CI, EDA workflows, Emulation, and hardware-aware job execution.

    You'll also architect and implement a state-of-the-art observability stack with LLM integration and a strong emphasis on streaming health and performance telemetry, log aggregation, distributed tracing, insight generation, synthetic testing, and smart alerting - across CI pipelines, simulation clusters, and service endpoints.

    This role demands a strong software engineering mindset, quality instincts, and deep understanding of systems. It's not just about writing scripts - it's about writing code that builds and manages infrastructure with precision, repeatability, and intent.

    Key responsibilities

    • Design and build the orchestration layers that drive our hybrid high-performance clusters-enabling simulation, synthesis, and continuous integration of AI ASICs at unprecedented scale.

    • Develop and maintain a fully programmable infrastructure control plane to ensure reproducibility, auditability, and rapid iteration across the entire stack.

    • Create tools and abstractions that empower engineers to harness massive parallelism without worrying about the underlying complexity..

    • Prototype and execute workload orchestration and migration strategies between on-premise and cloud environments, balancing performance, storage availability and replication, uptime, and cost across heterogeneous hardware and compute backends.

    • Implement real-time telemetry, tracing systems that surface insights from millions of metrics, enabling proactive debugging and system optimization.

    • Build a full observability stack that includes dashboards, alerting, automated responses, and a synthetic testing framework to proactively test infrastructure performance and reliability for various application and data flows, ensuring we remain proactive against issues impacting development and productivity workflows.

    Representative projects

    • Design and deploy a fully automated, scalable hybrid HPC cluster, combining bare-metal servers and switches with cloud instances, provisioned through MaaS and orchestrated via SLURM and Kubernetes, optimized for mixed EDA workloads and parallel CI pipelines.

    • Develop a real-time observability system for ASIC toolchain jobs and distributed builds, integrating Prometheus, Grafana, and VictoriaMetrics with streaming telemetry, tracing, and alerting to detect performance regressions before they hit silicon.

    • Architect and implement a programmable infrastructure-as-code control plane, using Terraform, Ansible, and Puppet, to version, audit, and redeploy every layer of Etched's development stack with deterministic reproducibility.

    • Create a zero-downtime interactive development environment that provisions and connects Jupyter and VS Code sessions to GPUs and high-memory nodes via a secure zero-trust network, abstracting away cluster state and machine failures.

    • Prototype and evaluate dynamic workload migration strategies between on-premise and cloud environments to optimize for latency, reliability, and cost across simulation and synthesis pipelines.

    • Design a synthetic testing and fault injection framework to validate the behavior of infrastructure under high-load, degraded hardware, and intermittent network partitions - before they happen in production.

    You may be a good fit if you

    • Are a systems-minded software engineer who loves building foundational platforms, working close to the metal and cloud, solving high-leverage problems at scale.

    • Are a deeply technical engineer who treats infrastructure as a software problem - prioritizing clean abstractions, version control,small change lists, easy roll backs, testing, and long-term maintainability over ad hoc configuration.

    • Have strong programming skills in languages such as Python, Go, Rust, and C++, and are comfortable building production-grade tooling.

    • Possess expert-level knowledge of Linux, virtualization, containerization, and CI/CD pipelines, with a deep understanding of how to debug, optimize, and scale complex systems.

    • Are familiar with Infrastructure as Code tools like OpenTofu, Ansible, or Puppet, and enjoy designing declarative, reproducible infrastructure systems.

    • Understand and use PromQL and other telemetry/query languages and have used LLM to extract insight from real-time metrics, and know how to architect and tune observability stacks.

    • Have a track record of debugging and resolving difficult hardware-software integration problems across bare-metal systems, networks, and distributed workloads.

    • Can lead and mentor technical teams, guiding design decisions and helping others develop sound engineering instincts.

    • Have 8+ years of experience in infrastructure engineering, systems programming, or backend software development - ideally in environments where performance, scale, or hardware interaction mattered.

    • Are driven by curiosity, take initiative, and have an innate sense of ownership - you thrive in uncharted territory, design for edge cases, and love making systems more powerful, reliable, and elegant.

    Strong candidates may also have experience with

    • Familiarity with Bazel build system

    • Deep understanding of ASIC development flows, especially those involving Synopsys, Cadence, and Verilator, including how EDA tools interact with infrastructure for simulation, synthesis, and verification.

    • Hands-on experience architecting systems with AWS, GCP, or Azure, including hybrid on-prem/cloud deployments, workload migration strategies, and cloud-native orchestration tooling.

    • Experience monitoring, provisioning, and debugging bare-metal servers, network hardware, and high-performance storage systems in rack-scale environments.

    • Comfortable in profiling and optimizing compute environments for single-threaded latency, memory-bound workloads, or I/O throughput, especially in the context of simulation or CI performance.

    • Proficiency building or operating telemetry systems at scale using Prometheus, Grafana, Loki, VictoriaMetrics, and tools for distributed tracing, log aggregation, and real-time alerting across heterogeneous mediums (SMS, email, push alerts, etc.)

    Benefits

    • Medical, dental, and vision packages with generous premium coverage

      • $500 per month credit for waiving medical benefits

    • Housing subsidy of $2k per month for those living within walking distance of the office

    • Relocation support for those moving to San Jose (Santana Row)

    • Various wellness benefits covering fitness, mental health, and more

    • Daily lunch + dinner in our office

    • Unlimited compute budget subject to ROI justification

    How we're different

    Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

    We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.