Databricks

Databricks

60 jobs near Columbus, OH

Databricks is seeking a strategic and experienced compensation lead to serve as a trusted partner to founders and senior leadership in our R&D org (technology and product teams). This is a key role ...

Databricks is looking for a Principal Data Scientist to serve as the statistical voice of the Data ... Academic research or teaching background in statistics or a quantitative field. * Industry ...

P-1504 The Applied AI team at Databricks sits at the forefront of advancing GenAI-powered products ... Work closely with cross-functional teams, including AI researchers, ML engineers, and product teams ...

Databricks is looking for a Principal Data Scientist to serve as the statistical voice of the Data ... Academic research or teaching background in statistics or a quantitative field. * Industry ...

Develop a deep understanding of Databricks business objectives, customer needs, emerging AI capabilities, and the evolving data and technology landscape. * Conduct user research to identify customer ...

Showing results 21-40

Staff Software Engineer - AI Research Infrastructure

Staff Software Engineer - AI Research Infrastructure

Databricks

San Francisco, CA • On-site

Full-time

Posted 28 days ago


Job description

Job Summary:
Databricks is a leading data and AI company that helps organizations solve complex problems through its data and AI platform. As a Staff Software Engineer on the AI Research Infra Team, you will develop and run the research stack, design services for large-scale training and inference workloads, and collaborate with research scientists and engineers to enhance the infrastructure for AI research.
Responsibilities:
• Design and implement infrastructure that supports large‑scale experiments, data processing, and model training (e.g., HPC clusters, GPU fleets, or cloud‑based systems)
• Enable researchers to go from idea to large‑scale experiment in minutes, not days, by building powerful abstractions for job submission, scheduling, and monitoring.
• Create tooling that improves research developer productivity, such as experiment management systems, CI/testing infrastructure for research code, and workflows that reduce iteration time.
• Influence the long‑term roadmap for research computation, shaping how Databricks AI Research train, evaluate, and ship models to customers.
• Serve as a technical mentor and force multiplier for other engineers working on compute, infra, and AI systems.
Qualifications:
Required:
• BS/MS or PhD in Computer Science or related field
• 5+ years of software engineering experience, including substantial time working on large-scale distributed systems or infrastructure.
• Have deep experience with building and operating distributed systems, data pipelines, or large-scale backend services, ideally involving GPUs, clusters, or major cloud providers.
• Are proficient in one or more systems programming languages (e.g., C++, Rust, Go, Java, Scala) and can design, implement, and debug complex services.
• Have built or significantly contributed to cluster schedulers, resource managers, or large-scale job orchestration systems (e.g., Kubernetes, Slurm, Ray, custom internal systems).
• Understand modern ML training and inference workflows (e.g., distributed training, model parallelism, fine-tuning, evaluation), even if you’re not primarily a research scientist.
• Can move fast and be pragmatic in getting things done, while caring about operational excellence. Have driven complex systems from prototype to stable, well-owned services.
• Communicate clearly with both researchers and engineers, and enjoy translating between research needs and infra realities.
Company:
Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture. Founded in 2013, the company is headquartered in San Francisco, USA, with a team of 5001-10000 employees. The company is currently Late Stage.