Head of Inference Full Time, Remote, NYC Preferred (US Based) About Montauk Capital Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards ...
60 Inference Full Time Jobs Hiring Near You
Head of Inference Full Time, Remote, NYC Preferred (US Based) About Montauk Capital Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards ...
Head of Inference Full Time, Remote, NYC Preferred (US Based) About Montauk Capital Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards ...
Head of Inference Full Time, Remote, NYC Preferred (US Based) About Montauk Capital Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards ...
Machine Learning Engineer - Inference
San Francisco, CA ยท On-site
$160K - $230K/yr
If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want ... The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
Machine Learning Engineer - Inference
San Francisco, CA ยท On-site
$160K - $230K/yr
If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want ... The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
Machine Learning Engineer - Inference
San Francisco, CA ยท On-site
$160K - $230K/yr
If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want ... The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
Machine Learning Engineer - Inference
San Francisco, CA ยท On-site
$160K - $230K/yr
If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want ... The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
Engineering โข Full-time โข San Francisco; New York Apply Our mission is to automate coding. The ... This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost ...
Engineering โข Full-time โข San Francisco; New York Apply Our mission is to automate coding. The ... This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost ...
LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
Engineering โข Full-time โข New York; San Francisco Apply Our mission is to automate coding. The ... This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost ...
Engineering โข Full-time โข New York; San Francisco Apply Our mission is to automate coding. The ... This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost ...
LLM Inference Frameworks and Optimization Engineer
San Francisco, CA ยท On-site
$160K - $230K/yr
Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
LLM Inference Frameworks and Optimization Engineer
San Francisco, CA ยท On-site
$160K - $230K/yr
Our mission is to optimize inference frameworks, algorithms, and infrastructure, pushing the ... The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits.
Postdoctoral Research Position in Causal Inference
Cambridge, MA ยท On-site
$75K/yr
Chan School of Public Health Department/Area Biostatistics Position Description We invite applications for a full-time Postdoctoral Research Fellow to join the causal inference team supervised by ...
Postdoctoral Research Position in Causal Inference
Cambridge, MA ยท On-site
$75K/yr
Chan School of Public Health Department/Area Biostatistics Position Description We invite applications for a full-time Postdoctoral Research Fellow to join the causal inference team supervised by ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA ยท On-site +1
$270K - $300K/yr
Inference Engine Optimization: Select, configure, and optimize inference engine based on hardware ... The US base salary range for this full-time position is: $270,000 - $300,000 OTE + equity ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA ยท On-site +1
$270K - $300K/yr
Inference Engine Optimization: Select, configure, and optimize inference engine based on hardware ... The US base salary range for this full-time position is: $270,000 - $300,000 OTE + equity ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA ยท On-site
$270K - $300K/yr
Inference Engine Optimization: Select, configure, and optimize inference engine based on hardware ... The US base salary range for this full-time position is: $270,000 - $300,000 OTE + equity ...
Forward Deployed Engineer (Inference & Post-Training)
San Francisco, CA ยท On-site
$270K - $300K/yr
Inference Engine Optimization: Select, configure, and optimize inference engine based on hardware ... The US base salary range for this full-time position is: $270,000 - $300,000 OTE + equity ...
One of our ventures is building machine learning inference systems for audio at scale. The work ... FULL_TIME
One of our ventures is building machine learning inference systems for audio at scale. The work ... FULL_TIME
Director, Epidemiology Causal Inference
Glassboro, NJ ยท On-site +1
In this role, you will provide leadership in causal inference methods , including target trial ... UNAVAILABLEEmployment Type: FULL_TIME
Director, Epidemiology Causal Inference
Glassboro, NJ ยท On-site +1
In this role, you will provide leadership in causal inference methods , including target trial ... UNAVAILABLEEmployment Type: FULL_TIME
Audio Inference Engineer, Model Efficiency
New York, NY ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Audio Inference Engineer, Model Efficiency
New York, NY ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Audio Inference Engineer, Model Efficiency
Manhattan, NY ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment ...
Audio Inference Engineer, Model Efficiency
Manhattan, NY ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment ...
Audio Inference Engineer, Model Efficiency
Montreal, QC ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Audio Inference Engineer, Model Efficiency
Montreal, QC ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Director, Epidemiology Causal Inference
Glassboro, NJ ยท On-site +1
In this role, you will provide leadership in causal inference methods , including target trial ... Employment Type: FULL_TIME
Director, Epidemiology Causal Inference
Glassboro, NJ ยท On-site +1
In this role, you will provide leadership in causal inference methods , including target trial ... Employment Type: FULL_TIME
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Audio Inference Engineer, Model Efficiency
Toronto, ON ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Audio Inference Engineer, Model Efficiency
Toronto, ON ยท On-site +1
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
The mission of the team is to build reliable machine learning systems and optimize audio inference ... Full-Time Employees at Cohere Enjoy These Perks: * An open and inclusive culture and work ...
Inference Jobs Information

Full-time
Posted 5 days ago
Job description
Full Time, Remote, NYC Preferred (US Based)
About Montauk Capital
Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards electrified, intelligent technologies reshaping industries and driving unprecedented demand for energy. Our team combines deep investing acumen with decades of operating experience to give founders the strategic clarity and hands-on support that accelerates the building of enduring companies of consequence.
About Stealth Edge AI Co
Co-founded by Montauk Capital, Stealth Edge AI Co is a pre-seed venture specialized in modular, metro-edge AI capabilities. By leveraging existing infrastructure for inference deployment, Edge AI provides low-latency, SLA-guaranteed performance across diverse GPU SKUs and colocation environments. Our technology intelligently routes traffic based on demand proximity and real-world network limitations, bypassing the heavy power and infrastructure requirements of traditional hyperscalers. Currently initiating operations with pilot nodes in NYC, we are executing a city-by-city expansion strategy with plans for a broader multi-metro rollout.
About the Role
We are seeking a visionary and execution-oriented Head of Inference. You'll define the inference architecture, make foundational decisions, build the first POC, and own this domain end to end alongside the CEO. You will be a senior, hands-on technical leader and the technical authority on inference in the room. You'll own the key technical decisions, and will be the internal and external expert on inference. You will own the core inference capability driving the platform and customer experience, and have a strong voice over the technical foundation of the company. You'll evolve the vision into a viable proof of concept, building the practical system to then design and implement distributed inference systems. Alongside the CEO, you'll represent the company with top-tier partners, early customers and investors, and will own this domain end to end. In addition to the CEO, you will have the support of a team of strong advisors, and the initial founding team.
What You'll Do
- Create the inference strategy and define the inference architecture for Edge AI
- Own the inference serving layer end-to-end: vLLM, TensorRT-LLM, Triton, or equivalent
- Build a credible POC fast - proves the platform works to NVIDIA, cloud providers, and customers
- Drive cost-per-token optimization
- Optimize GPU utilization, KV-cache management, and batching for production workloads
- Own observability and reliability SLAs
- Build distributed inference pipelines across multi-GPU, multi-node edge deployments
- Set performance baselines and SLAs for inference latency and throughput, plus observability and performance SLA's
- Define quantization strategy
- Translate complex inference requirements for infrastructure designs
- Define the software access layer architecture and oversee integration efforts
- Engage credibly with investors, partners, and technical stakeholders, represent the company externally
What You'll Bring
You have a passion for inference and a background as a hands-on technical builder who has directly implemented inference systems before, ideally in production or near-production environments. Deep knowledge and are excited about model serving, and the practical engineering required to make an inference system work on real hardware. You can take a vision and initial concept and translate it into a viable POC quickly and are comfortable making foundational technical decisions quickly, in ambiguity, and building first of a kind.
If inference is your craft and you've built systems in production, we want to talk.
- Production inference serving - vLLM, TensorRT-LLM, Triton Inference Server, or equivalent distributed at scale
- Quantization, SGLang, containerization, cost-per-token
- Observability tooling:distributed tracing, latency profiling, alerting. Instrument and debug complex distributed systems with a focus on building world-class observability and debuggability tools
- C++/CUDA/Rust
- GPU utilization and CUDA kernel optimization - has pushed hardware to its limits
- Batching, KV-cache, speculative decoding expertise
- Scale systems using Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference
- Has built a serving system that NVIDIA and cloud providers respect
- Model deployment and serving
- Systems engineering
- Technical leadership experience, either over teams or outcomes
- Startup / 0โ1 DNA: You ship fast and communicate clearly
Why Join Us
- Category-Defining Opportunity: Solving the AI inference bottleneck without the burden of power and infrastructure constraints Own the metro edge inference across heterogeneous, disparate compute nodes
- Massive Market Opportunity: AI spending projected to exceed hundreds of billions annually, 54GW of AI Inference demand expected by 2030
- Studio Support: Leverage Montauk Capital's resources, network, and operational expertise during critical early stages
- Competitive compensation + equity: True ownership over what you build