Job Summary:
Etched is building AI chips that are hard-coded for individual model architectures. They are seeking an Inference Software Engineer to contribute to the architecture and design of the Sohu host software stack and implement high-performance, modular code across the complete Etched software stack.
Responsibilities:
โข Contribute to the architecture and design of the Sohu host software stack
โข Implement high-performance, modular code across the complete Etched software stack, consisting of a mix of Rust, C++ and Python.
โข Interface with firmware and drivers teams delivering highest-performance HW/SW stack.
โข Work with AI model researchers and product-facing teams building out the Etched serving front-end.
โข Build scheduling logic for handling continuous batching and real time inference
โข Implement inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, etc.
โข Implement distributed networking primitives for efficient multi-server inference
Qualifications:
Required:
โข Experience with C++ and Python
โข Familiarity with transformer model architectures and inference serving stacks (vLLM, SGLang, etc.) or experience working in distributed inference/training environments
โข Experience working cross-functionally in large software and hardware organizations
Preferred:
โข Experience with Rust
โข Familiarity with GPU kernels, the CUDA compilation stack and related tools, or other hardware accelerators
โข Understanding of distributed systems, networking, and parallel programming
Company:
Building the hardware for superintelligence Founded in 2022, the company is headquartered in Cupertino, USA, with a team of 51-200 employees. The company is currently Growth Stage.