Job Summary:
NVIDIA AI is building the next generation of production workflow infrastructure for large-scale chip engineering. The System Software Engineer will focus on evolving existing tools and infrastructure into a clearer control-plane platform for complex engineering workflows, collaborating with senior engineers to improve diagnostics and maintain workflow-platform features.
Responsibilities:
• Build and maintain workflow-platform features across YAML configuration, generated artifacts, Make targets, Perl/Python utilities, Tcl checks, and structured output files
• Help model workflow stages, inputs, outputs, validation signals, generated files, dependencies, status, and ownership in configuration and manifests
• Create machine-readable check results, run manifests, provenance records, log summaries, and status outputs that make behavior easier to inspect and debug
• Strengthen early-failure checks for missing files, stale generated data, invalid configuration, bad environment setup, scheduler issues, and incomplete run state
• Add and test integrations with distributed job execution, shared compute, filesystem state, data-fidelity tracking, and dependency tracing
• Work with senior engineers and users to reproduce failures, trace configuration behavior, improve diagnostics, update documentation, and preserve existing workflows
Qualifications:
Required:
• B.S. or M.S. in CS, EE, CE, or equivalent experience
• 4+ years building automation, developer infrastructure, workflow platforms, distributed systems, test infrastructure, or engineering productivity tools
• Strong Linux fundamentals, including shell debugging, environment setup, filesystem behavior, process execution, logs, exit codes, and background jobs
• Practical programming experience in Python, Perl, Go, C++, or similar, and comfortable reading and modifying Make, YAML, JSON, and shell-based infrastructure
• Ability to reason carefully about configuration layers, generated files, schemas, validation rules, compatibility, and incremental migration of legacy systems
• Strong debugging habits, clear written communication, and experience improving production infrastructure without destabilizing active users
Preferred:
• Exposure to semiconductor design or EDA workflows, especially RTL, synthesis, place-and-route, timing, signoff, ECO, or handoff flows
• Background with workflow engines, build systems, CI/CD platforms, job schedulers, deployment automation, data pipelines, or large-scale engineering automation
• Experience improving legacy Make, Perl, shell, Python, or Tcl systems while preserving existing behavior
• Experience creating structured logs, JSON/YAML schemas, validation frameworks, provenance tracking, dashboards, or observability tools
• Background with shared filesystems, partial writes, stale state, locking, reproducibility, generated artifacts, batch jobs, tests, migrations, documentation, or debug tooling
Company:
Explore the latest breakthroughs made possible with AI. Founded in , the company is headquartered in Santa Clara, CA, US, , with a team of 10001+ employees. The company is currently Late Stage.